Lecture 4 Book Chapter 4.3 & 4.4

Protein structure

🧬 Chapter 4.3 – Structure Calculations (Except 4.3.4)

Protein structure determination by NMR is essentially a constrained optimization problem: We want to find 3D structures that satisfy all experimental restraints while remaining chemically realistic.

It happens in two major phases:

  1. Structure generation
  2. Energy refinement

Let’s break it down.


🔹 4.3.1 Traditional Structure Generation

🧩 Step 1: Generate Conformers

The goal is to create a bundle of structures that all satisfy the experimental restraints (NOEs, dihedral angles, RDCs, etc.).

Historically, different computational strategies were developed:


🏗 1️⃣ Distance Geometry (Early Method)

Instead of working in Cartesian coordinates, this method works in distance space.

  • Uses interatomic distances as primary variables.
  • Computational cost scales with N³ (very heavy in the 1980s).
  • Conceptually elegant but computationally demanding.

Programs: early implementations before more modern methods.


🔄 2️⃣ Dihedral Angle Space (Huge Improvement)

Key idea: Bond lengths and bond angles barely fluctuate → fix them.

Only allow:

  • ϕ, ψ backbone angles
  • Side-chain χ angles

Result:

  • ~10× fewer degrees of freedom.
  • Much more efficient sampling.

Programs:

  • DISMAN (variable target function)
  • DIANA
  • Later: DYANA → CYANA
  • XPLOR → CNS

🔥 3️⃣ Simulated Annealing + Torsion Angle Dynamics

This became the dominant approach.

Why?

Simple minimization gets stuck in local minima.

Simulated annealing:

  • Add kinetic energy
  • Heat system
  • Cool slowly
  • Escape local minima
  • Reach global minimum

Modern example: CYANA


⚙️ How CYANA Represents the Protein

From page 3 (Figure 4.9 in your file):

  • Protein = rigid bodies connected by rotatable bonds
  • Only dihedral angles are variable
  • Tree-like structure
  • Conformation uniquely defined by torsion angles

Covalent structure:

  • Derived automatically from sequence
  • Uses libraries (cyana.lib)
  • Based on ECEPP/2 (proteins)
  • AMBER (nucleotides)

🧱 Steric Repulsion Model

Instead of full Lennard–Jones potential:

  • Each atom assigned a core radius
  • Lower bounds on distances = sum of core radii
  • Prevent steric clashes

Example values (Table 4.3):

  • H (amide): 0.95 Å
  • Carbon: ~1.4 Å
  • Nitrogen: 1.3 Å
  • Oxygen: 1.2 Å
  • Sulfur: 1.6 Å

Disulfide bonds enforced manually via restraints.


🎯 Target Function (Core Concept!)

The target function acts like the potential energy.

It penalizes:

  • Distance violations
  • Dihedral angle violations
  • Steric clashes

If all restraints satisfied → V = 0

Distance penalty usually quadratic:

f(d,b) = (d-b)^2

Only applied when violated.


🔥 Simulated Annealing Protocol (Figure 4.10)

From page 4 in your file:

Step-by-step overview:

  1. Generate random torsion angles.
  2. 100 minimization steps (local restraints).
  3. 100 minimization steps (all restraints).
  4. MD with reduced atom radii (high temperature).
  5. MD with normal radii.
  6. Cool to 0 K.
  7. Final minimization (1000 steps).

The cooling process:

  • Allows escape from local minima.
  • Ends in a stable energy minimum.

🎲 Why Multiple Conformers?

Calculation repeated 50–150 times.

Typically:

  • Keep ~20–30 best structures
  • Select lowest target function

Rule of thumb: Final bundle ≈ 25% of generated conformers.

If fold is well-defined:

  • RMSD stable
  • Target function stable

🔹 4.3.2 Automated NOESY Assignment

Major bottleneck: manual NOESY assignment 😵

Problem:

  • Overlapping peaks
  • Ambiguities
  • Incomplete data

Solution: integrate assignment into structure calculation.


🔁 Iterative Strategy

  1. Use chemical shifts → tentative NOE assignments
  2. Convert to distance restraints
  3. Calculate structure
  4. Remove incompatible assignments
  5. Reduce ambiguity
  6. Repeat

This mimics how experts do it manually — but automated.


⚠️ Major Challenges

  1. Many peaks are ambiguous.
  2. Dataset contains:
    • Mis-picked peaks
    • Inaccurate shifts
    • Missing data

Solution:

  • Ambiguous distance restraints
  • Statistical filtering
  • Iterative refinement

Automation is improving, especially with RDC integration.


🔹 4.3.3 Energy Refinement (Very Important!)

Structure generation uses simplified force fields.

Result:

  • Fold correct
  • Local geometry imperfect

Refinement step:

  • Use full force field
  • Run restrained MD (rMD)
  • In Cartesian space
  • Include solvent (water + ions!)

Huge improvement since mid-1990s:

  • Explicit water significantly improves stereochemistry.

🔥 rMD Protocol (Figure 4.11)

From page 6:

  1. Solvate protein
  2. Heat to 300–500 K
  3. Short MD at high temperature
  4. Slowly cool to 0 K
  5. Final minimization

Why heat?

  • Improve side-chain rotamers
  • Fix local geometry

Restraints active during entire simulation:

  • Prevent unfolding

Programs:

  • GROMACS
  • AMBER

⚠️ Important Warning

If refinement introduces new violations: → likely incorrect assignments → must re-examine restraints


🧪 Chapter 4.4 – Validation of Protein Structures

A structure is a model, not truth.

Validation asks:

  1. Does it fit experimental data?
  2. Is geometry physically realistic?

🔹 4.4.1 Agreement with Experimental Data

📏 1️⃣ Restraint Violations

Commonly reported:

  • violations > 0.3 Å

  • 0.1–0.3 Å
  • < 0.1 Å
  • Dihedral violations > 5°

🚨 Consistent violations (>75% conformers) are serious.


📐 2️⃣ RMSNOE

Measures overall fit to distance restraints.

Lower RMS → better agreement.


📊 3️⃣ R-Factor for NOESY

Inspired by crystallography.

Compare:

  • Experimental NOESY
  • Back-calculated NOESY

Full relaxation matrix used.

Modified R-factor reduces bias from short-distance interactions.

Program: RFAC.


📦 4️⃣ Completeness / RPF Scores

Instead of intensity:

Ask:

  • If two protons <5–6 Å → do we see peak?
  • If peak exists → are protons close?

RPF gives:

  • Recall
  • Precision
  • F-measure
  • DP score (discriminating power)

Very powerful validation metric.


🧲 5️⃣ RDC Validation

Rdip factor:

  • 0 = random
  • 1 = perfect agreement

Cross-validation (Rdip free) possible but computationally heavy.


🔹 4.4.2 Geometric Quality

Because NMR restraints are sparse, geometry depends strongly on force field.

Validation uses reference databases.


📊 Z-Scores

Z = rac{x - ext{mean}}{ ext{std}}

Interpretation:

  • ±1 → normal
  • ±2 → borderline
  • ±4 → outlier

🧬 Geometry Checks

  1. Bond lengths
  2. Bond angles
  3. Chirality
  4. Planarity

Usually tightly constrained.


📈 Ramachandran Plot

Most important backbone validation tool.

Categories:

  • Favored
  • Allowed
  • Generously allowed
  • Disallowed

High-quality structures: → Almost all residues in favored regions.

Programs:

  • PROCHECK
  • WHAT-IF

🔄 Side-Chain Rotamers

Side chains prefer:

  • Staggered conformations
  • Avoid eclipsed

Compare to PDB statistics.

Programs:

  • PROCHECK
  • WHAT-IF
  • MolProbity

💥 Steric Clashes (Bumps)

Nonbonded atoms too close.

Should not occur.

Detected by:

  • WHAT-IF
  • MolProbity

🤝 Hydrogen Bonds

Important but hard in NMR:

  • Rarely measured directly (through-bond J)
  • Often inferred indirectly

Validation:

  • Count unsatisfied donors/acceptors
  • Evaluate hydrogen-bond energy

🎯 Big Picture Takeaways

🧩 Structure calculation = search problem

Find conformations minimizing target function.

🔥 Simulated annealing is central

Escapes local minima.

💧 Refinement improves realism

Explicit solvent is critical.

📊 Validation is multi-layered

Must check:

  • Restraint fit
  • NOESY consistency
  • RDC agreement
  • Geometric plausibility
  • Rotamers
  • Hydrogen bonding
  • Steric clashes

Quiz

Score: 0/30 (0%)