Day 9 part 2

Protein structure

🧬 Molecular Dynamics (MD) Simulation — Theoretical Summary

⚙️ 1. Newton’s Equations and Motion of Atoms

Molecular dynamics simulations are based on classical Newtonian mechanics.

Each atom has:
- Position (x)
- Velocity (v) → derivative of position
- Acceleration (a) → derivative of velocity
Motion is described by Force = mass × acceleration

This means MD simulation becomes a huge system of ordinary differential equations because:

For M atoms → 3M position coordinates + 3M velocity coordinates
So the system exists in a 6M-dimensional phase space

👉 Because this system is extremely complex, an analytical (exact) solution is impossible. Instead, MD uses numerical integration — moving atoms step-by-step in time.

⏱️ 2. Time Step — The Most Critical Parameter

MD simulations advance in small discrete time steps (Δt).

Typical values:

🟢 ~2 femtoseconds → standard protein simulations
🟡 ~4 femtoseconds → very long or membrane simulations

Why is time step important?

If the time step is:

❌ Too small

Simulation becomes very slow
You may never reach biologically relevant timescales (µs or ms)

❌ Too large

You may miss important events like collisions
System can become numerically unstable
Atoms may “jump” outside the simulation box → simulation “explodes”

Thus, MD requires a trade-off between accuracy and computational speed.

🐸 3. Leapfrog Integration Algorithm

To propagate motion efficiently, MD often uses the leapfrog algorithm.

Concept:

Velocities and positions are calculated at offset time points
Values “jump over” each other like a frog hopping

Advantages:

Time-symmetric
Numerically stable
Accurate for long simulations

This makes it one of the most widely used integration methods in MD.

🌡️ 4. Atoms Are Always Moving (Even at Low Temperature)

Important physical concept:

Atoms are never static
Even near 0 K → quantum and thermal motion still exist

Thus:

Proteins do not sit permanently in minimum energy
They constantly fluctuate around it

Over long simulation time, atomic configurations follow the:

📊 Boltzmann Distribution

This distribution describes:

The probability of a conformation as a function of its potential energy.

Low-energy conformations → frequent
High-energy conformations → rare

If MD samples all conformations, we can calculate thermodynamic properties identical to experiments.

But reaching full sampling requires very long simulations.

🎰 5. MD vs Monte Carlo Sampling

Monte Carlo (MC)

Samples conformations randomly
Immediately shows distribution
Good for thermodynamic properties

Molecular Dynamics (MD)

Follows realistic time evolution
Captures:
- Fast motions
- Dynamic pathways
- Mechanistic transitions

Thus:

MC → better statistics
MD → better physical realism and motion information

🔋 6. Energy Conservation and Thermostats

Total system energy:

E_ = E_ + E_

However:

Real atomic collisions are not perfectly elastic
Energy is gradually lost

Therefore MD uses:

🛁 Heat bath / thermostat

Functions:

Adds energy if system cools
Removes energy if system overheats
Maintains constant temperature

Energy curves therefore fluctuate around a mean value, not perfectly flat.

🚀 7. Equilibration Phase

At simulation start:

All atoms may have identical velocities
This is physically unrealistic

During equilibration:

Velocities redistribute according to mass
System relaxes to correct temperature distribution
Artifacts from starting conditions disappear

Only after equilibration → production simulation begins.

💧 8. Solvent Representation

Explicit Solvent

Real water molecules simulated
Most accurate
Very computationally expensive
Can introduce artifacts (water clustering, ion shells)

Implicit Solvent

No water molecules
Environment treated mathematically like water

Advantages:

Faster
Fewer atoms

Disadvantages:

Loss of specific hydrogen-bond interactions
Reduced realism near protein surface

📦 9. Periodic Boundary Conditions (PBC)

To avoid atoms leaving the system:

Simulation box is surrounded by copies of itself

When atom exits one side:

It re-enters from opposite side

Benefits:

Constant particle number
No artificial wall effects
Mimics infinite bulk environment

⚡ 10. Computational Limits of MD

Challenges:

Very small time steps required
Structural changes may occur on ms scale
Huge number of steps needed

Solutions:

GPU acceleration
Parallel computing
Algorithm improvements

📚 11. Force Fields — Semi-Empirical Models

Force fields are:

Parameter tables describing atomic interactions
Based on:
- Physics equations
- Experimental data

Thus MD is semi-empirical.

⚠️ Must use scientific judgement:

Simulations may produce physically unrealistic minima
Example: alkane chain passing through benzene ring (artifact)

Always check: 👉 Does the simulation result make chemical sense?

❌ 12. What MD Cannot Do

Classical MD limitations:

Cannot form or break covalent bonds
Cannot change protonation states dynamically
Cannot simulate enzyme reactions directly

Reason:

Electronic structure is fixed at simulation start
Born-Oppenheimer approximation assumed

To model reactions → need QM/MM methods (quantum mechanics + molecular mechanics).

🖥️ 13. Software and Acceleration

Common MD packages:

GROMACS
AMBER
CHARMM

Visualization:

VMD (Visual Molecular Dynamics)

Speed improvements:

Parallelization
GPU (CUDA cores especially important)

⭐ Key Conceptual Takeaway

Molecular dynamics simulations:

Provide time-resolved molecular motion
Sample conformational landscapes
Require careful parameter choices
Balance accuracy vs computational feasibility
Must always be interpreted with chemical and physical intuition

Quiz

Score: 0/30 (0%)

Q0. Why is an analytical solution generally impossible in molecular dynamics simulations?

Because atoms do not obey Newton’s laws

Because the system involves a very high-dimensional phase space

Because potential energy is always zero

Because time is treated as continuous

Q1. What is the primary reason numerical integration is used in MD simulations?

To avoid using force fields

Because experimental data are unavailable

To propagate atomic positions stepwise in time

To eliminate thermal motion

Q2. What is the main risk of choosing an excessively large time step in an MD simulation?

Atoms will stop moving

The system becomes more accurate

Important interactions may be missed and instability can occur

The Boltzmann distribution becomes symmetric

Q3. Why are time steps of around 2 femtoseconds commonly used for protein simulations?

They eliminate the need for thermostats

They balance accuracy and computational efficiency

They allow bond formation

They remove solvent effects

Q4. What is the main conceptual idea behind the leapfrog integration algorithm?

Energy minimization before simulation

Random sampling of conformations

Offset calculation of velocities and positions over time

Simultaneous calculation of all forces

Q5. Why do atoms in MD simulations never remain at a single minimum-energy conformation?

Force fields are inaccurate

Thermal energy causes continuous motion

Velocities are always zero

Periodic boundaries prevent stability

Q6. What does the Boltzmann distribution describe in the context of MD simulations?

The velocity of electrons

The probability of conformations as a function of energy

The force between solvent molecules

The change in system volume

Q7. Why can Monte Carlo simulations be advantageous for purely thermodynamic studies?

They simulate realistic atomic trajectories

They show conformational distributions directly

They eliminate potential energy

They allow bond breaking

Q8. What is the role of equilibration in MD simulations?

To remove all solvent molecules

To allow the system to reach realistic temperature and velocity distributions

To calculate quantum states

To fix periodic boundaries

Q9. Why is a thermostat used during MD simulations?

To calculate electron orbitals

To enforce periodic boundaries

To maintain stable temperature by adding or removing energy

To stop atomic motion

Q10. What is a major advantage of implicit solvent models?

Higher chemical accuracy

Reduced number of atoms and faster simulations

Ability to model hydrogen bonding precisely

Automatic bond formation

Q11. Why are periodic boundary conditions used in MD simulations?

To prevent atoms from vibrating

To mimic an infinite bulk environment

To eliminate temperature effects

To increase solvent clustering

Q12. What is a force field in molecular dynamics?

A quantum mechanical wavefunction

A lookup table of interaction parameters derived from physics and experiments

A method for breaking bonds

A thermostat algorithm

Q13. Why is classical MD referred to as a semi-empirical method?

It ignores physical laws

It uses only experimental data

It combines theoretical physics models with experimental parameterization

It always predicts exact results

Q14. Which hardware feature is particularly important for accelerating MD simulations on GPUs?

Tensor cores

CUDA cores

Cache memory

Display drivers

Q15. In classical MD simulations, covalent bonds can form and break dynamically.

True

False

Q16. Even at very low temperatures, atoms in a molecular system are completely motionless.

True

False

Q17. Using too small a time step can make it difficult to reach biologically relevant timescales in simulations.

True

False

Q18. Monte Carlo simulations provide realistic time-resolved trajectories of proteins.

True

False

Q19. Periodic boundary conditions ensure that the total number of particles remains constant during simulation.

True

False

Q20. A perfectly flat energy curve during production MD indicates a well-controlled temperature.

True

False

Q21. Implicit solvent simulations retain exact hydrogen-bond interactions between water molecules and protein surfaces.

True

False

Q22. Force fields must be interpreted with chemical intuition because simulations can sometimes produce physically unrealistic results.

True

False

Q23. Equilibration allows velocities to redistribute according to atomic mass and temperature.

True

False

Q24. Thermostats can both add and remove energy from a system during MD simulations.

True

False

Q25. GPU parallelization reduces computational load by distributing calculations across many processing cores.

True

False

Q26. The Boltzmann distribution implies that high-energy conformations are sampled more frequently than low-energy ones.

True

False

Q27. MD simulations can directly model protonation state changes at constant pH.

True

False

Q28. Increasing simulation box size always improves computational efficiency.

True

False

Q29. Long biological processes such as protein conformational transitions may require simulations reaching microseconds or longer.

True

False