Lecture 4 Video 4

Protein structure

ChatGPT conversation

🧬 Structure Determination of Proteins by NMR – Structure Calculation & Validation

(From Lecture 4 – Video 4)

This lecture walks through the heart of NMR structure determination:

How structures are calculated
Why we calculate ensembles (not single models)
How to evaluate and validate structures
How structures are presented in publications

This is where experimental data becomes a 3D protein model.

1️⃣ Structure Calculation: Turning NMR Data into 3D Models

💻 Software Used

Several dedicated programs exist:

CYANA (commonly used)
CNS
XPLOR

They use different algorithms but follow the same core logic.

📥 Input to Structure Calculation

The primary input is:

🔹 NOESY Peak List

Each cross peak must be:

Assigned to two atoms
Given an intensity

Modern programs can sometimes:

Start from unassigned peak lists
Simultaneously assign peaks while calculating structure

📏 Converting NOEs to Distance Constraints

NOE intensity → converted to distance constraint.

The software:

Finds a calibration constant
Converts cross peak intensities to upper distance limits

Example:

NOE intensity corresponds to max distance of 3 Å → The two hydrogens must be ≤ 3 Å apart.

📚 Additional Constraints

Besides distances, you may include:

🌀 Dihedral angles (from TALOS)
🧲 Orientational restraints
🧷 Metal binding constraints
Any experimentally derived structural information

All constraints are fed into the algorithm.

2️⃣ The Core Problem: Multidimensional Minimization

A protein has enormous conformational freedom.

Each residue:

ϕ (phi)
ψ (psi)

For 100 residues: → 200 backbone degrees of freedom → Plus side chains

The algorithm performs:

🎯 Energy Minimization

Starting from a random conformation, the structure is adjusted to:

Minimize deviation between structure and experimental constraints

This is a fitting problem in a massive multidimensional energy landscape.

⚠️ Local Minimum Problem

You might find a local minimum, not the global one.

Solution:

👉 Start from many random structures 👉 Minimize each independently

Typically:

Calculate 100 structures
Keep best 20

This gives you an ensemble.

3️⃣ Why an Ensemble? 🧩

You never report one structure.

Instead:

You report the best 20 (or 30) structures.

Why?

Because:

Proteins are flexible
Data may not fully define all regions
One structure may be misleading

The ensemble shows:

Well-defined regions
Flexible or poorly defined regions

4️⃣ Evaluating Structures

After calculation, you evaluate:

🔴 Violations

A violation = structure does not satisfy constraint.

Example:

NOE says ≤ 3 Å
Structure shows 4 Å → Violation

Possible reasons:

Misassignment
Overlapping peaks
Incorrect calibration
Wrong dihedral angle prediction

Violations must be investigated.

🎯 Target Function

Sum of violation penalties = target function.

But beware:

Low target function ≠ correct structure It could also mean:

Weak or insufficient data

5️⃣ RMSD – Structural Convergence

Root Mean Square Deviation (RMSD)

Measures:

How well the ensemble structures overlap

Low RMSD:

Good convergence
Well-defined structure

High RMSD:

Poorly defined
Flexible region
Insufficient data

Example: Small 42-residue Protein

Early stage:

RMSD ≈ 2.3 Å
Poor overlap

After refinement:

Much lower RMSD
Secondary structure well defined

But loops & termini often remain flexible.

6️⃣ Superposition Matters! 🎭

RMSD depends on which atoms you superimpose.

Same 20 structures:

RMSD = 5.9 Å (all atoms)
RMSD = 1.3 Å (exclude flexible tail)
RMSD = 0.6 Å (only secondary structure)

Same data — different numbers.

Conclusion:

RMSD can be manipulated by choice of alignment region.

Interpret carefully.

7️⃣ Example: Calmodulin 🧲

Calmodulin has:

N-lobe
C-lobe

When superimposing:

Entire structure → poor overlap
Only N-lobe → excellent overlap
Only C-lobe → excellent overlap

Meaning:

✔ Each lobe is well defined ✖ Their relative orientation is not

Conclusion:

The lobes are flexible relative to each other in solution.

This is structural information.

8️⃣ Representation in Publications 📄

Two typical visualizations:

🔹 Structure Bundle

Shows all 20 conformers overlaid. → Reveals precision & flexibility.

🔹 Cartoon Model

Shows secondary structure arrangement. → Easier to interpret → Less scientifically informative than ensemble

9️⃣ Method Dependence

Different methods give different results:

Modeling → one structure
X-ray crystallography → rigid crystal conformation
NMR → ensemble in solution

NMR uniquely captures flexibility.

🔟 Validation

Structural validation is challenging.

🔬 Internal Validation

Check against known protein geometry:

Ramachandran plot
Bond lengths
Bond angles
Van der Waals radii
Hydrogen bond geometry
Side chain rotamers

⚠️ But much of this was already used in refinement.

📊 Ramachandran Plot

Shows distribution of ϕ/ψ angles.

Categories:

Most favored
Allowed
Generously allowed
Disallowed

You should have:

Very few residues in disallowed regions.

📈 Validation Tables in Papers

Typical table contains:

Input

Total NOEs
- Intra-residual
- Medium-range
- Long-range
Metal restraints
Torsion angle restraints

Output

Target function
Max distance violation (e.g., 0.14 Å)
Max dihedral violation (e.g., 4°)
Force field energies
Ramachandran statistics
RMSD values

🔎 External Validation

If you have data not used in structure calculation:

Residual dipolar couplings
Unusual chemical shifts

These are strong validation tools.

🖥 Validation Servers

For NMR structures:

Validation servers can check:
- Chemical shifts
- Distance restraints
- Geometry

They identify structural inconsistencies.

🧠 Key Takeaways

1️⃣ Structure calculation is iterative

You repeatedly refine and re-check.

2️⃣ Always calculate multiple structures

Single models are misleading.

3️⃣ RMSD must be interpreted carefully

Depends on alignment region.

4️⃣ Violations must be investigated

They often indicate input errors.

5️⃣ Flexible regions show up as high RMSD

NMR captures solution dynamics.

6️⃣ Validation is complex

Internal checks can be circular.

🎯 Big Picture

NMR structure determination is not:

“The software gives me the structure.”

It is:

An iterative fitting process in a massive conformational space, constrained by experimental data, evaluated statistically, and interpreted biologically.

The ensemble is not a weakness — it is the strength of NMR.

It shows what the protein really looks like in solution:

Rigid cores
Flexible loops
Dynamic domain orientations

That concludes the structure calculation and validation process from Lecture 4 Video 4.

Quiz

Score: 0/30 (0%)

Q0. What is the primary input for NMR structure calculation software?

Chemical shift list

NOESY peak list

Ramachandran plot

Force field parameters

Q1. Modern structure calculation software can sometimes perform which additional task?

Automatic crystallization

Simultaneous peak assignment during calculation

Direct measurement of RDCs

Automatic protein expression

Q2. NOE cross peak intensities are converted into what type of structural restraint?

Dihedral angle restraints

Hydrogen bond energies

Distance constraints

Electrostatic potentials

Q3. Why are multiple starting conformations used in structure calculation?

To increase computation time

To avoid local minima

To reduce RMSD artificially

To eliminate dihedral angles

Q4. What is typically done after calculating 100 structures?

Publish all 100 structures

Discard all structures

Keep the best-fitting subset (e.g., 20)

Average them into one structure

Q5. A violation occurs when:

Two atoms are too close compared to van der Waals radii

A calculated structure does not satisfy an experimental restraint

The RMSD is below 1 Å

The Ramachandran plot is perfect

Q6. The target function in NMR structure calculation represents:

Total protein energy in solution

Sum of restraint violations

Number of NOEs used

Temperature factor

Q7. A low target function alone guarantees a correct structure.

True

False

Q8. RMSD measures:

Total NOE intensity

Overlap between structures in an ensemble

Number of hydrogen bonds

Energy of the protein

Q9. High RMSD in a specific region often indicates:

Perfect structural definition

Experimental error only

Flexibility or lack of restraints

Incorrect amino acid sequence

Q10. The value of RMSD depends on:

Protein temperature only

Which atoms are chosen for superposition

Number of NOEs only

Protein molecular weight

Q11. In the calmodulin example, each lobe individually superimposed well.

True

False

Q12. Poor superposition of calmodulin as a whole suggests:

Incorrect data

Poor resolution NMR instrument

Flexible orientation between lobes

Wrong amino acid sequence

Q13. Which of the following is NOT typically used as a structural restraint?

NOE distances

Dihedral angles

Metal binding constraints

Ramachandran statistics

Q14. Structures are usually presented in publications as:

Single averaged model only

Energy landscape plot only

Ensemble bundle and cartoon representation

Only Ramachandran plots

Q15. Ramachandran plots are used to validate:

NOE intensities

Backbone dihedral angle distributions

Metal coordination geometry

Protein solubility

Q16. A large number of residues in disallowed regions of the Ramachandran plot suggests structural problems.

True

False

Q17. Why is validation against standard geometry sometimes circular?

Because geometry changes during refinement

Because these parameters were already used in refinement

Because proteins are random coils

Because NOEs override geometry

Q18. Residual dipolar couplings can serve as:

Primary sequence data

Independent validation data

Force field energies

Temperature controls

Q19. Flexible terminal regions often show higher RMSD.

True

False

Q20. Energy minimization in structure calculation aims to:

Maximize RMSD

Increase violation scores

Minimize deviation from experimental restraints

Eliminate backbone angles

Q21. If two hydrogen atoms are constrained to be ≤3 Å apart but are 4 Å in the model, this is:

Acceptable fluctuation

A calibration artifact

A violation

Normal flexibility

Q22. NMR structure determination uniquely captures:

Crystal packing forces

Static atomic coordinates only

Solution-state flexibility

Covalent bond energies only

Q23. Calculating only one structure would adequately represent protein flexibility.

True

False

Q24. The number of NOEs is often categorized into:

Alpha, beta, gamma

Intra-residual, medium-range, long-range

Primary and secondary

Rigid and flexible

Q25. Force field energies reported after refinement are part of:

Input restraints

Validation output statistics

Sequence analysis

Spectrometer calibration

Q26. High convergence in secondary structure elements typically indicates:

Experimental artifact

Good structural definition

Poor NOE coverage

Incorrect alignment

Q27. The orientation between calmodulin lobes is poorly defined because:

Each lobe is unfolded

There are no NOEs at all

Insufficient restraints define their relative orientation

The protein is misfolded

Q28. Validation servers can check chemical shifts and distance restraints.

True

False

Q29. The ensemble representation is considered a weakness of NMR compared to crystallography.

True

False