Lecture 4 Book Chapter 4.3 & 4.4

Protein structure

🧬 Chapter 4.3 – Structure Calculations (Except 4.3.4)

Protein structure determination by NMR is essentially a constrained optimization problem: We want to find 3D structures that satisfy all experimental restraints while remaining chemically realistic.

It happens in two major phases:

Structure generation
Energy refinement

Let’s break it down.

🔹 4.3.1 Traditional Structure Generation

🧩 Step 1: Generate Conformers

The goal is to create a bundle of structures that all satisfy the experimental restraints (NOEs, dihedral angles, RDCs, etc.).

Historically, different computational strategies were developed:

🏗 1️⃣ Distance Geometry (Early Method)

Instead of working in Cartesian coordinates, this method works in distance space.

Uses interatomic distances as primary variables.
Computational cost scales with N³ (very heavy in the 1980s).
Conceptually elegant but computationally demanding.

Programs: early implementations before more modern methods.

🔄 2️⃣ Dihedral Angle Space (Huge Improvement)

Key idea: Bond lengths and bond angles barely fluctuate → fix them.

Only allow:

ϕ, ψ backbone angles
Side-chain χ angles

Result:

~10× fewer degrees of freedom.
Much more efficient sampling.

Programs:

DISMAN (variable target function)
DIANA
Later: DYANA → CYANA
XPLOR → CNS

🔥 3️⃣ Simulated Annealing + Torsion Angle Dynamics

This became the dominant approach.

Why?

Simple minimization gets stuck in local minima.

Simulated annealing:

Add kinetic energy
Heat system
Cool slowly
Escape local minima
Reach global minimum

Modern example: CYANA

⚙️ How CYANA Represents the Protein

From page 3 (Figure 4.9 in your file):

Protein = rigid bodies connected by rotatable bonds
Only dihedral angles are variable
Tree-like structure
Conformation uniquely defined by torsion angles

Covalent structure:

Derived automatically from sequence
Uses libraries (cyana.lib)
Based on ECEPP/2 (proteins)
AMBER (nucleotides)

🧱 Steric Repulsion Model

Instead of full Lennard–Jones potential:

Each atom assigned a core radius
Lower bounds on distances = sum of core radii
Prevent steric clashes

Example values (Table 4.3):

H (amide): 0.95 Å
Carbon: ~1.4 Å
Nitrogen: 1.3 Å
Oxygen: 1.2 Å
Sulfur: 1.6 Å

Disulfide bonds enforced manually via restraints.

🎯 Target Function (Core Concept!)

The target function acts like the potential energy.

It penalizes:

Distance violations
Dihedral angle violations
Steric clashes

If all restraints satisfied → V = 0

Distance penalty usually quadratic:

f(d,b) = (d-b)^2

Only applied when violated.

🔥 Simulated Annealing Protocol (Figure 4.10)

From page 4 in your file:

Step-by-step overview:

Generate random torsion angles.
100 minimization steps (local restraints).
100 minimization steps (all restraints).
MD with reduced atom radii (high temperature).
MD with normal radii.
Cool to 0 K.
Final minimization (1000 steps).

The cooling process:

Allows escape from local minima.
Ends in a stable energy minimum.

🎲 Why Multiple Conformers?

Calculation repeated 50–150 times.

Typically:

Keep ~20–30 best structures
Select lowest target function

Rule of thumb: Final bundle ≈ 25% of generated conformers.

If fold is well-defined:

RMSD stable
Target function stable

🔹 4.3.2 Automated NOESY Assignment

Major bottleneck: manual NOESY assignment 😵

Problem:

Overlapping peaks
Ambiguities
Incomplete data

Solution: integrate assignment into structure calculation.

🔁 Iterative Strategy

Use chemical shifts → tentative NOE assignments
Convert to distance restraints
Calculate structure
Remove incompatible assignments
Reduce ambiguity
Repeat

This mimics how experts do it manually — but automated.

⚠️ Major Challenges

Many peaks are ambiguous.
Dataset contains:
- Mis-picked peaks
- Inaccurate shifts
- Missing data

Solution:

Ambiguous distance restraints
Statistical filtering
Iterative refinement

Automation is improving, especially with RDC integration.

🔹 4.3.3 Energy Refinement (Very Important!)

Structure generation uses simplified force fields.

Result:

Fold correct
Local geometry imperfect

Refinement step:

Use full force field
Run restrained MD (rMD)
In Cartesian space
Include solvent (water + ions!)

Huge improvement since mid-1990s:

Explicit water significantly improves stereochemistry.

🔥 rMD Protocol (Figure 4.11)

From page 6:

Solvate protein
Heat to 300–500 K
Short MD at high temperature
Slowly cool to 0 K
Final minimization

Why heat?

Improve side-chain rotamers
Fix local geometry

Restraints active during entire simulation:

Prevent unfolding

Programs:

GROMACS
AMBER

⚠️ Important Warning

If refinement introduces new violations: → likely incorrect assignments → must re-examine restraints

🧪 Chapter 4.4 – Validation of Protein Structures

A structure is a model, not truth.

Validation asks:

Does it fit experimental data?
Is geometry physically realistic?

🔹 4.4.1 Agreement with Experimental Data

📏 1️⃣ Restraint Violations

Commonly reported:

violations > 0.3 Å
0.1–0.3 Å
< 0.1 Å
Dihedral violations > 5°

🚨 Consistent violations (>75% conformers) are serious.

📐 2️⃣ RMSNOE

Measures overall fit to distance restraints.

Lower RMS → better agreement.

📊 3️⃣ R-Factor for NOESY

Inspired by crystallography.

Compare:

Experimental NOESY
Back-calculated NOESY

Full relaxation matrix used.

Modified R-factor reduces bias from short-distance interactions.

Program: RFAC.

📦 4️⃣ Completeness / RPF Scores

Instead of intensity:

Ask:

If two protons <5–6 Å → do we see peak?
If peak exists → are protons close?

RPF gives:

Recall
Precision
F-measure
DP score (discriminating power)

Very powerful validation metric.

🧲 5️⃣ RDC Validation

Rdip factor:

0 = random
1 = perfect agreement

Cross-validation (Rdip free) possible but computationally heavy.

🔹 4.4.2 Geometric Quality

Because NMR restraints are sparse, geometry depends strongly on force field.

Validation uses reference databases.

📊 Z-Scores

Z = rac{x - ext{mean}}{ ext{std}}

Interpretation:

±1 → normal
±2 → borderline
±4 → outlier

🧬 Geometry Checks

Bond lengths
Bond angles
Chirality
Planarity

Usually tightly constrained.

📈 Ramachandran Plot

Most important backbone validation tool.

Categories:

Favored
Allowed
Generously allowed
Disallowed

High-quality structures: → Almost all residues in favored regions.

Programs:

PROCHECK
WHAT-IF

🔄 Side-Chain Rotamers

Side chains prefer:

Staggered conformations
Avoid eclipsed

Compare to PDB statistics.

Programs:

PROCHECK
WHAT-IF
MolProbity

💥 Steric Clashes (Bumps)

Nonbonded atoms too close.

Should not occur.

Detected by:

WHAT-IF
MolProbity

🤝 Hydrogen Bonds

Important but hard in NMR:

Rarely measured directly (through-bond J)
Often inferred indirectly

Validation:

Count unsatisfied donors/acceptors
Evaluate hydrogen-bond energy

🎯 Big Picture Takeaways

🧩 Structure calculation = search problem

Find conformations minimizing target function.

🔥 Simulated annealing is central

Escapes local minima.

💧 Refinement improves realism

Explicit solvent is critical.

📊 Validation is multi-layered

Must check:

Restraint fit
NOESY consistency
RDC agreement
Geometric plausibility
Rotamers
Hydrogen bonding
Steric clashes

Quiz

Score: 0/30 (0%)

Q0. What is the main reason dihedral angle space is preferred over Cartesian space in modern NMR structure generation?

It increases computational complexity

It reduces the number of degrees of freedom by fixing bond lengths and angles

It eliminates the need for experimental restraints

It automatically ensures correct hydrogen bonding

Q1. Why does simple minimization often fail in protein structure calculation?

Because NMR restraints are too strong

Because bond lengths fluctuate too much

Because the target function contains many local minima

Because dihedral angles cannot be optimized

Q2. What is the primary purpose of simulated annealing in CYANA?

To increase restraint violations

To add kinetic energy allowing escape from local minima

To eliminate torsion angle restraints

To calculate chemical shifts

Q3. In CYANA, what happens if all experimental restraints are perfectly satisfied?

The RMSD becomes zero

The potential energy equals zero

The target function V equals zero

The system temperature becomes zero

Q4. Why are disulfide bonds explicitly restrained in torsion-angle dynamics?

They are not part of the amino acid sequence

They are automatically included by NOE restraints

They are special covalent bonds not defined by standard dihedral sampling

They prevent simulated annealing

Q5. Why are multiple conformers generated in structure calculations?

To average out bond lengths

To identify the global minimum region consistent with restraints

To reduce RMSD artificially

To eliminate steric clashes

Q6. What is a major bottleneck in NMR structure determination addressed by automated methods?

Ramachandran analysis

NOESY peak assignment

Energy refinement

RDC measurement

Q7. What is an ambiguous distance restraint?

A restraint without an upper bound

A restraint applied to multiple possible atom pairs

A violated restraint

A restraint without chemical shift data

Q8. Why is energy refinement performed after structure generation?

To redefine the fold completely

To remove all restraints

To improve stereochemical quality using an accurate force field

To change the amino acid sequence

Q9. What was a major improvement in NMR structure refinement after the mid-1990s?

Removal of solvent

Use of implicit vacuum models

Inclusion of explicit water and ions

Elimination of dihedral restraints

Q10. What does a consistent restraint violation (present in >75% of conformers) most likely indicate?

Good convergence

Overfitting

Errors in the experimental dataset or assignments

Perfect refinement

Q11. What does RMSNOE measure?

Deviation between calculated and experimental RDCs

Agreement of interatomic distances with experimental limits

Deviation from ideal bond angles

Agreement of chemical shifts

Q12. What is the purpose of calculating a NOESY R-factor?

To measure bond length accuracy

To compare experimental and back-calculated NOESY spectra

To compute rotamer populations

To validate Ramachandran regions

Q13. What does the DP (discriminating power) score evaluate?

Energy refinement quality

Ability of data to distinguish a structure from a random coil

Number of hydrogen bonds

Side-chain rotamer distribution

Q14. What does an Rdip value close to 1 indicate?

Random agreement

Poor RDC fit

Perfect agreement between observed and calculated RDCs

No dipolar couplings present

Q15. Distance geometry operates directly in Cartesian coordinate space.

True

False

Q16. In torsion angle dynamics, bond lengths and bond angles are treated as variable degrees of freedom.

True

False

Q17. Simulated annealing allows the conformer to escape local minima by temporarily increasing kinetic energy.

True

False

Q18. Refinement typically changes the global fold of a protein drastically.

True

False

Q19. All conformers in a refined bundle are typically deposited in the PDB, even if their energies differ.

True

False

Q20. RMSNOE values are always straightforward to interpret numerically.

True

False

Q21. Short-distance NOE interactions dominate R-factor calculations due to strong distance dependence.

True

False

Q22. The Ramachandran plot evaluates bond length deviations.

True

False

Q23. High-quality protein structures have most residues in favored Ramachandran regions.

True

False

Q24. Side-chain rotamers prefer eclipsed conformations due to steric stabilization.

True

False

Q25. Z-scores express how many standard deviations a parameter deviates from database averages.

True

False

Q26. Z-scores larger than ±4 are generally considered statistical outliers.

True

False

Q27. Interatomic bumps correspond to nonbonded atoms being too far apart.

True

False

Q28. Hydrogen bonds in NMR structures are fully defined by experimental data alone.

True

False

Q29. The use of explicit solvent during refinement can improve hydrogen bond networks and stereochemistry.

True

False

Lecture 4 Book Chapter 4.3 & 4.4

🧬 Chapter 4.3 – Structure Calculations (Except 4.3.4)

🧪 Chapter 4.4 – Validation of Protein Structures

violations > 0.3 Å

🎯 Big Picture Takeaways

Quiz