Lecture 6 Video 5

Protein structure

📊 Indirect Fourier Transform & the P(r) Function (SAXS Analysis)

This lecture section focuses on one of the most powerful tools in small-angle scattering (SAXS): the indirect Fourier transform (IFT) and the pair distance distribution function, P(r)

The key idea: We measure data in reciprocal space (I(q), intensity vs scattering vector q), but what we really want is real-space structural information about our macromolecule.

The Fourier transform is the mathematical bridge between those two spaces.

🔄 1. From Reciprocal Space to Real Space

In SAXS:

I(q) → measured experimentally
P(r) → real-space distance distribution inside the molecule

These two are mathematically connected through Fourier transform equations

You can:

Transform I(q) → P(r)
Or transform P(r) → I(q)

When you transform P(r) back into I(q), you get a smooth fitted curve to your experimental data.

If that smooth line matches your experimental points well → your transform worked properly.

🧮 2. What Is the P(r) Function?

The P(r) function (pair distance distribution function) is:

A histogram of all pairwise electron–electron distances in the macromolecule

Imagine your protein:

Take every pair of electrons
Measure the distance between them
Count how many pairs occur at each distance
Plot:
- x-axis = distance (r)
- y-axis = number of electron pairs at that distance

That gives you P(r).

It is literally a structural fingerprint of the molecule.

📏 3. Maximum Dimension (Dmax)

One immediate feature:

P(r) always goes to zero at some maximum distance.
That distance is the maximum dimension of the molecule (Dmax)

Why?

Because beyond that distance, there are no more electron pairs inside the object.

So:

D_ = ext{largest internal distance in the molecule}

This is extremely important for structural modeling.

🔵 4. How Shape Affects the P(r) Curve

The shape of the P(r) curve depends strongly on molecular geometry

Let’s go through the major cases.

🟠 (A) Solid Sphere (Globular Protein)

Shape of P(r):

Symmetric
Almost Gaussian
Smooth rise and fall

This is what most globular proteins look like.

Interpretation:

Many electron pairs at intermediate distances
Fewer at very short and very long distances

If your protein is compact and folded → expect this shape.

🟢 (B) Long Rod

Features:

Sharp peak at low distances
Long tail extending toward Dmax

Why?

Many electron pairs exist across the short width (short r)
Fewer but important distances span the long axis (large r)

This produces:

Early strong peak
Extended tail

Common for:

Fibrous proteins
Elongated complexes

🟣 (C) Disc

Looks somewhat similar to sphere but:

Broader distribution
Peak occurs earlier

Because it’s flatter, distances are distributed differently.

🟡 (D) Hollow Sphere

Opposite behavior of rod:

Large number of long distances
Peak near the maximum dimension

Why?

Most electrons are arranged in a shell → many distances span the entire diameter.

🔵 (E) Dumbbell (Two Domains)

This is extremely important biologically.

Features:

First peak = distances within each domain
Second peak = distances between domains

This is typical for:

Multi-domain proteins
Proteins with flexible linkers

The second peak corresponds to inter-domain spacing.

If you see two peaks → think domain organization.

🧬 5. Real Protein Examples

From actual SAXS data :

Globular proteins

Similar to sphere
Slight tail

Multi-domain proteins

Shoulders or secondary peaks
Inter-domain distances visible

Unfolded proteins

Compressed at short distances
Very long extended tail

Unfolded systems show much more extended distributions.

This becomes important when studying:

Protein flexibility
Folding
Disorder

📐 6. What Can You Extract from P(r)?

The P(r) function gives:

✅ 1. Maximum dimension (Dmax)

Clear cutoff where curve goes to zero.

✅ 2. Radius of gyration (Rg)

You can calculate Rg directly from P(r).

This can be:

More accurate than Guinier analysis
Especially useful for:
- Large particles
- Noisy data
- Small Guinier range

Because P(r) uses the entire curve, not just low-q points.

✅ 3. I(0) (Forward scattering intensity)

Can also be obtained from P(r).

Good for cross-checking:

If:

Guinier Rg ≈ P(r) Rg
Guinier I(0) ≈ P(r) I(0)

Then your data processing is likely reliable.

⚠️ 7. Sensitivity to Problems

The P(r) function is sensitive to:

Aggregation
Interparticle interference

If your P(r):

Doesn’t smoothly go to zero
Shows strange oscillations
Has unexpected long tails

→ something may be wrong with the sample.

This makes P(r) a powerful diagnostic tool.

🧠 8. Why Is Indirect Fourier Transform Necessary?

It is:

Model-independent
Real-space based
Required before advanced modeling

Especially important because:

Dmax is needed for ab initio shape reconstruction
It constrains the search space
It improves reliability of structural modeling

Without a proper P(r), you cannot confidently move forward to 3D reconstructions.

📌 9. Big Picture Summary

Indirect Fourier Transform allows you to:

🔄 Convert reciprocal space data (I(q)) ➡ Into real-space structural information (P(r))

P(r) tells you:

Molecular shape
Maximum dimension (Dmax)
Radius of gyration (Rg)
I(0)
Presence of multiple domains
Folding state
Flexibility
Aggregation artifacts

It is:

Model-independent
Highly informative
Required for advanced analysis
More robust than Guinier in many cases

🧩 Conceptual Takeaway

Think of I(q) as:

A blurry fingerprint in reciprocal space.

And P(r) as:

The real-space histogram of all internal distances — the molecule describing itself from the inside.

The transform is simply the mathematical bridge between those two worlds.

Quiz

Score: 0/30 (0%)

Q0. What is the main purpose of applying an indirect Fourier transform (IFT) in SAXS analysis?

To convert real-space data into reciprocal space

To convert reciprocal-space scattering data into real-space structural information

To remove noise from high-q data

To calculate molecular weight directly from intensity

Q1. The scattering profile I(q) is measured in which type of space?

Real space

Time domain

Reciprocal (one-over-distance) space

Frequency space only

Q2. What does the P(r) function physically represent?

The electron density map of a protein

The histogram of all pairwise electron distances within a macromolecule

The energy distribution of atomic vibrations

The probability of protein folding states

Q3. Why does the P(r) function go to zero at large r values?

Because intensity becomes negative

Because of background subtraction errors

Because there are no electron pairs beyond the maximum dimension

Because of detector limitations

Q4. What structural parameter is directly obtained from where P(r) goes to zero?

Radius of gyration (Rg)

Forward scattering I(0)

Maximum dimension (Dmax)

Molecular symmetry number

Q5. A symmetric, near-Gaussian P(r) curve is most characteristic of which shape?

Long rod

Hollow sphere

Solid sphere

Dumbbell

Q6. What feature in a P(r) function suggests a multi-domain protein?

A sharp peak at r = 0

A second peak or shoulder at higher r values

No tail at long distances

Complete symmetry

Q7. In a rod-like particle, why is there a strong peak at low r values?

Because rods have no long distances

Because many electron pairs exist across the short cross-section

Because intensity is highest at q=0

Because rods are spherical

Q8. Why can P(r)-derived Rg be more accurate than Guinier Rg for large particles?

It ignores noisy regions

It only uses low-q data

It uses the entire scattering curve

It assumes spherical symmetry

Q9. Which of the following is NOT obtained from the P(r) function?

Dmax

I(0)

Atomic-resolution coordinates

Q10. Why is Dmax required for ab initio reconstruction?

It determines molecular weight

It constrains the search volume

It defines electron density contrast

It removes background scattering

Q11. A hollow sphere P(r) curve typically peaks near:

r = 0

Intermediate distances only

The maximum dimension

Negative distances

Q12. Unfolded proteins in P(r) analysis typically show:

Highly symmetric Gaussian curves

Short compact distributions only

Extended tails toward large r

No defined Dmax

Q13. Why is transforming P(r) back to I(q) useful?

To increase resolution

To generate a smooth fitted curve for comparison to experimental data

To remove solvent effects

To calculate binding constants

Q14. Which issue can distort the P(r) function?

Proper buffer subtraction

Aggregation

Low concentration

Symmetric folding

Q15. True or False: The indirect Fourier transform converts data from real space into reciprocal space.

True

False

Q16. True or False: The P(r) function is model-independent.

True

False

Q17. True or False: Dmax corresponds to the smallest internal distance in a protein.

True

False

Q18. True or False: A dumbbell-shaped protein may show two peaks in its P(r) function.

True

False

Q19. True or False: Guinier analysis uses the entire scattering curve.

True

False

Q20. True or False: P(r) can help detect aggregation effects.

True

False

Q21. True or False: A rod-like particle produces a symmetric Gaussian P(r).

True

False

Q22. True or False: The P(r) function can be transformed back into I(q).

True

False

Q23. True or False: The maximum dimension must be estimated before ab initio modeling.

True

False

Q24. True or False: Hollow spheres tend to peak at small r values only.

True

False

Q25. True or False: P(r) provides atomic-resolution structural information.

True

False

Q26. True or False: A smooth P(r) that returns cleanly to zero suggests proper data processing.

True

False

Q27. True or False: The I(0) value can be derived from the P(r) function.

True

False

Q28. True or False: Unfolded proteins generally have shorter Dmax values than globular proteins of the same mass.

True

False

Q29. True or False: The P(r) function is required before moving to advanced SAXS modeling methods.

True

False