Lecture 4 Video 4

Protein structure

🧬 Structure Determination of Proteins by NMR – Structure Calculation & Validation

(From Lecture 4 – Video 4)

This lecture walks through the heart of NMR structure determination:

  • How structures are calculated
  • Why we calculate ensembles (not single models)
  • How to evaluate and validate structures
  • How structures are presented in publications

This is where experimental data becomes a 3D protein model.


1️⃣ Structure Calculation: Turning NMR Data into 3D Models

💻 Software Used

Several dedicated programs exist:

  • CYANA (commonly used)
  • CNS
  • XPLOR

They use different algorithms but follow the same core logic.


📥 Input to Structure Calculation

The primary input is:

🔹 NOESY Peak List

Each cross peak must be:

  • Assigned to two atoms
  • Given an intensity

Modern programs can sometimes:

  • Start from unassigned peak lists
  • Simultaneously assign peaks while calculating structure

📏 Converting NOEs to Distance Constraints

NOE intensity → converted to distance constraint.

The software:

  1. Finds a calibration constant
  2. Converts cross peak intensities to upper distance limits

Example:

NOE intensity corresponds to max distance of 3 Å → The two hydrogens must be ≤ 3 Å apart.


📚 Additional Constraints

Besides distances, you may include:

  • 🌀 Dihedral angles (from TALOS)
  • 🧲 Orientational restraints
  • 🧷 Metal binding constraints
  • Any experimentally derived structural information

All constraints are fed into the algorithm.


2️⃣ The Core Problem: Multidimensional Minimization

A protein has enormous conformational freedom.

Each residue:

  • ϕ (phi)
  • ψ (psi)

For 100 residues: → 200 backbone degrees of freedom → Plus side chains

The algorithm performs:

🎯 Energy Minimization

Starting from a random conformation, the structure is adjusted to:

Minimize deviation between structure and experimental constraints

This is a fitting problem in a massive multidimensional energy landscape.


⚠️ Local Minimum Problem

You might find a local minimum, not the global one.

Solution:

👉 Start from many random structures 👉 Minimize each independently

Typically:

  • Calculate 100 structures
  • Keep best 20

This gives you an ensemble.


3️⃣ Why an Ensemble? 🧩

You never report one structure.

Instead:

You report the best 20 (or 30) structures.

Why?

Because:

  • Proteins are flexible
  • Data may not fully define all regions
  • One structure may be misleading

The ensemble shows:

  • Well-defined regions
  • Flexible or poorly defined regions

4️⃣ Evaluating Structures

After calculation, you evaluate:


🔴 Violations

A violation = structure does not satisfy constraint.

Example:

  • NOE says ≤ 3 Å
  • Structure shows 4 Å → Violation

Possible reasons:

  • Misassignment
  • Overlapping peaks
  • Incorrect calibration
  • Wrong dihedral angle prediction

Violations must be investigated.


🎯 Target Function

Sum of violation penalties = target function.

But beware:

Low target function ≠ correct structure It could also mean:

  • Weak or insufficient data

5️⃣ RMSD – Structural Convergence

Root Mean Square Deviation (RMSD)

Measures:

How well the ensemble structures overlap

Low RMSD:

  • Good convergence
  • Well-defined structure

High RMSD:

  • Poorly defined
  • Flexible region
  • Insufficient data

Example: Small 42-residue Protein

Early stage:

  • RMSD ≈ 2.3 Å
  • Poor overlap

After refinement:

  • Much lower RMSD
  • Secondary structure well defined

But loops & termini often remain flexible.


6️⃣ Superposition Matters! 🎭

RMSD depends on which atoms you superimpose.

Same 20 structures:

  • RMSD = 5.9 Å (all atoms)
  • RMSD = 1.3 Å (exclude flexible tail)
  • RMSD = 0.6 Å (only secondary structure)

Same data — different numbers.

Conclusion:

RMSD can be manipulated by choice of alignment region.

Interpret carefully.


7️⃣ Example: Calmodulin 🧲

Calmodulin has:

  • N-lobe
  • C-lobe

When superimposing:

  • Entire structure → poor overlap
  • Only N-lobe → excellent overlap
  • Only C-lobe → excellent overlap

Meaning:

✔ Each lobe is well defined ✖ Their relative orientation is not

Conclusion:

The lobes are flexible relative to each other in solution.

This is structural information.


8️⃣ Representation in Publications 📄

Two typical visualizations:

🔹 Structure Bundle

Shows all 20 conformers overlaid. → Reveals precision & flexibility.

🔹 Cartoon Model

Shows secondary structure arrangement. → Easier to interpret → Less scientifically informative than ensemble


9️⃣ Method Dependence

Different methods give different results:

  • Modeling → one structure
  • X-ray crystallography → rigid crystal conformation
  • NMR → ensemble in solution

NMR uniquely captures flexibility.


🔟 Validation

Structural validation is challenging.


🔬 Internal Validation

Check against known protein geometry:

  • Ramachandran plot
  • Bond lengths
  • Bond angles
  • Van der Waals radii
  • Hydrogen bond geometry
  • Side chain rotamers

⚠️ But much of this was already used in refinement.


📊 Ramachandran Plot

Shows distribution of ϕ/ψ angles.

Categories:

  • Most favored
  • Allowed
  • Generously allowed
  • Disallowed

You should have:

  • Very few residues in disallowed regions.

📈 Validation Tables in Papers

Typical table contains:

Input

  • Total NOEs
    • Intra-residual
    • Medium-range
    • Long-range
  • Metal restraints
  • Torsion angle restraints

Output

  • Target function
  • Max distance violation (e.g., 0.14 Å)
  • Max dihedral violation (e.g., 4°)
  • Force field energies
  • Ramachandran statistics
  • RMSD values

🔎 External Validation

If you have data not used in structure calculation:

  • Residual dipolar couplings
  • Unusual chemical shifts

These are strong validation tools.


🖥 Validation Servers

For NMR structures:

  • Validation servers can check:
    • Chemical shifts
    • Distance restraints
    • Geometry

They identify structural inconsistencies.


🧠 Key Takeaways

1️⃣ Structure calculation is iterative

You repeatedly refine and re-check.

2️⃣ Always calculate multiple structures

Single models are misleading.

3️⃣ RMSD must be interpreted carefully

Depends on alignment region.

4️⃣ Violations must be investigated

They often indicate input errors.

5️⃣ Flexible regions show up as high RMSD

NMR captures solution dynamics.

6️⃣ Validation is complex

Internal checks can be circular.


🎯 Big Picture

NMR structure determination is not:

“The software gives me the structure.”

It is:

An iterative fitting process in a massive conformational space, constrained by experimental data, evaluated statistically, and interpreted biologically.

The ensemble is not a weakness — it is the strength of NMR.

It shows what the protein really looks like in solution:

  • Rigid cores
  • Flexible loops
  • Dynamic domain orientations

That concludes the structure calculation and validation process from Lecture 4 Video 4.

Quiz

Score: 0/30 (0%)