Day 10 part 1

Protein structure

🧬 Big Picture: From Sequence → Structure

Key idea

  • We have far more protein sequences than known structures
  • Experimental structures (NMR spectroscopy, X-ray crystallography) are slow and expensive
  • Therefore, we try to predict structure from sequence

❓ Can we get structure from sequence alone?

👉 In theory: yes 👉 In practice: very difficult

Why?

Because protein folding follows:

  • Proteins fold toward a minimum free energy state
  • The correct structure is the global energy minimum

But:

  • The number of possible conformations is enormous
  • This is known as Levinthal’s paradox

➡️ A protein cannot brute-force all conformations ➡️ Nature uses guided folding pathways ➡️ Computers struggle with this


⚡ Does a folded protein always stay folded?

👉 Your statement: “folded protein has minimum energy so it cannot unfold”

Correction:

  • Folded protein = lowest energy under given conditions
  • BUT:
    • Can unfold with:
      • heat
      • pH change
      • denaturants
  • So it is stable, not permanent

⚠️ Misfolded proteins (important correction)

👉 Your statement: “prime protein?”

❌ Incorrect term ✅ Correct term: prion

What happens:

  • Misfolded protein gets stuck in a local energy minimum
  • Cannot escape → energy trap
  • Example: prions → disease-causing

🧠 Why folding is computationally hard

  • Backbone flexibility = only φ (phi) and ψ (psi) angles
  • But combinations grow exponentially

➡️ Even a 100 amino acid protein:

  • Would take longer than the age of the universe to brute-force fold

🔬 MAIN METHODS FOR STRUCTURE PREDICTION


1. Homology Modeling (Comparative Modeling)

Your understanding:

find similar sequence → align → model

✅ Correct, but refine it:

Steps:

  1. Find homologous protein with known structure
  2. Align sequences
  3. Replace residues (point mutations)
  4. Energy minimize → final structure

⚠️ Important limitation

👉 Your example about protease vs structural protein is correct

  • Same sequence similarity ≠ same structure/function
  • Example:
    • protease → globular enzyme
    • structural protein → fibrous

➡️ Function + environment matters


🔍 Identity vs Homology

Your definitions:

✔️ Mostly correct, refine:

Identity

  • Same amino acid at same position
  • e.g. Leu = Leu

Homology (better term: similarity)

  • Different amino acids, same properties
  • e.g.
    • Tyr, Trp, Phe → aromatic
    • Ile, Leu → hydrophobic

❓ “Is identity always homology?”

✔️ Yes

  • Identity ⊂ Homology

⚠️ BLAST limitation

BLAST

👉 Your statement: “BLAST is local alignment”

✔️ Correct and important

  • BLAST finds local matches
  • Example:
    • 90% identity over 12 residues ≠ meaningful

➡️ Always check full-length alignment


🧬 Multiple Sequence Alignment (MSA)

👉 Your understanding: ✔️ Correct

What it does:

  • Align many sequences
  • Identify:
    • conserved residues
    • evolutionary patterns

➡️ Helps infer:

  • structure
  • function
  • domains

🧩 Domain concept (important insight)

  • Different parts of a protein can:
    • match different proteins
  • Meaning: → protein may be multi-domain

🔧 Comparative modeling = point mutations

👉 Your statement: ✔️ Correct

  • You take known structure
  • Mutate residues to match your sequence

🧠 Secondary Structure Prediction


Why multiple programs?

✔️ You were right to question this

Key point:

  • All programs agree on:
    • core of helices/sheets
  • Disagree on:
    • boundaries (~15%)

❓ “15% hard to predict?”

✔️ Correct

  • Especially:
    • helix start/end
    • loop transitions

👉 Proline:

  • often disrupts helices
  • introduces bends

🧬 Protein types (your statement)

👉 “Globular: hydrophobic inside, membrane: outside”

✔️ Correct but refine:

Globular proteins

  • hydrophobic core
  • hydrophilic surface

Membrane proteins

  • hydrophobic regions face membrane lipids
  • polar parts face inside/outside cell

🧵 Threading (Fold Recognition)


👉 Your idea: ✔️ Correct but incomplete

What it does:

  • Uses structure templates even with low sequence similarity
  • Aligns sequence onto known folds

➡️ Useful when:

  • identity < 20%

Key concept:

  • Structure is more conserved than sequence

🏛️ Fold Library

👉 Your statement: ✔️ Correct

  • Database of known protein folds
  • Threading tries to match your sequence to these

⚡ Contact Potentials (very important concept)


👉 Your understanding: ✔️ Good intuition, refine:

What it is:

  • A scoring/energy function
  • Based on: → which amino acids prefer to be near each other

Examples:

  • hydrophobic + hydrophobic → favorable
  • charged + hydrophobic → unfavorable

Why needed?

  • Computers need numbers, not concepts
  • So interactions are converted into scores

👉 Your question:

aromatic and fatty acids like each other?

✔️ Sometimes yes

  • aromatic + hydrophobic → often favorable
  • depends on orientation

🤖 AlphaFold (modern approach)

AlphaFold


Your understanding:

✔️ Mostly correct

It combines:

  • MSA (evolutionary info)
  • threading-like logic
  • deep learning

MSA in AlphaFold

✔️ Correct

  • captures evolutionary constraints
  • residues that co-evolve → likely close in structure

Evoformer

👉 Your idea: ✔️ Partially correct

What it does:

  • processes relationships between residues
  • integrates:
    • sequence info
    • pairwise interactions

➡️ Not just validation → core reasoning engine


Iterative refinement

  • model is improved repeatedly
  • energy-like optimization

⚠️ Weaknesses of AlphaFold

👉 Your statement: ✔️ Correct observation

Common issues:

  1. Disordered regions
    • appear as long tails
  2. Misplaced helices
    • helix sticking out when it should be inside
  3. Flexible regions
    • poorly predicted

Key takeaway:

➡️ Always critically evaluate predictions


🔐 Important practical warning

  • Public servers (e.g. ColabFold)
  • Your sequence = uploaded data

⚠️ Can affect:

  • patents
  • unpublished work

🔁 Final Conceptual Summary

Hierarchy of methods:

  1. Ab initio
    • physics-based
    • very expensive
  2. Homology modeling
    • needs similar sequence
  3. Threading
    • uses fold library
  4. AlphaFold
    • combines all + AI

✅ Quick corrections to your list

Your statementCorrected
prime protein❌ → prion
folded = cannot unfold❌ → conditionally stable
BLAST enough❌ → only local
identity = homology❌ → identity ⊂ homology
AlphaFold = just threading❌ → threading + MSA + deep learning
Evoformer = validation❌ → main processing block

Quiz

Score: 0/30 (0%)