Day 2 part 2

Protein chemistry

ChatGPT conversation

🧬 1. How Do We Determine Protein Primary Structure?

First choice: DNA databases

Today, we usually determine protein sequence from:

Genome / cDNA databases
Translate DNA → protein sequence

But this does not reveal:

Post-translational modifications (PTMs)
Processing events (cleavages, signal peptides removed)

When DNA info is missing or incomplete → we must determine sequence experimentally.

✂️ 2. Protein Digestion: Why Do We Cut Proteins?

Large proteins are too big to sequence directly. So we:

Digest into smaller peptides
Separate peptides (HPLC)
Sequence peptides (Edman or MS)
Reconstruct full sequence

🔪 Enzymatic Cleavage (Very Important)

Each protease cuts at specific residues:

Enzyme	Cleaves
Trypsin	After Lys (K) & Arg (R)
Chymotrypsin	After Phe (F), Trp (W), Tyr (Y)
V8 protease	After Asp (D), Glu (E)
Asp-N	Before Asp
Thermolysin	Before large hydrophobic residues

Also chemical cleavage:

CNBr → after Met
Mild acid → Asp–Pro
Hydroxylamine → Asn–Gly

These different cleavage patterns are crucial for determining sequence order.

🧪 3. Edman Degradation (Stepwise N-terminal sequencing)

How it works

Label N-terminal amino acid with phenyl isothiocyanate (PITC)
Cleave off labeled amino acid
Identify it by reverse-phase HPLC
Repeat cycle

❗ Why can we only sequence ~10 amino acids?

You asked:

Edman only sequences few AA because labeling is not 100% degraded?

Yes — correct, and here is the full explanation:

Each cycle is ~99% efficient (example).

Cycle 1:

99% correct removal
1% remains uncleaved

Cycle 2:

Now mixture contains:
- Mostly second AA
- 1% leftover from first cycle

Cycle 3:

Even more mixture

After many rounds:

Signal becomes too noisy
Peaks overlap
Sequence becomes ambiguous

So we can only reliably sequence ~10 residues.

This is cumulative inefficiency.

🔁 4. Why Use Trypsin AND Chymotrypsin?

You asked:

To know the order, do we use trypsin & chymotrypsin?

Yes — this is essential logic.

If we digest only with trypsin:

Example fragments:

A - B - C - K D - E - F - R G - H - I - K

We don’t know their order in full protein.

Now digest same protein with chymotrypsin:

Different fragments:

C - K - D F - R - G I - K - J

Now we find overlaps:

Fragment 1 overlaps with fragment 2
Like solving a puzzle

This is called overlapping peptide mapping.

Without two different digestions → cannot reconstruct full order.

This applies to BOTH:

Edman sequencing
Mass spectrometry

⚡ 5. Tandem Mass Spectrometry (MS/MS)

Modern method of choice.

Step 1 – Ionization

Two soft ionization methods:

🟢 MALDI-TOF

Matrix Assisted Laser Desorption Ionization Usually produces:

Mostly singly charged ions (+1)
Sometimes +2

🔵 ESI (Electrospray Ionization)

Sprays protein into charged droplets.

Produces:

Multiple charge states (+5, +10, +20…)
One molecule can carry many charges

❓ Why does ESI produce more charges than MALDI?

Because:

MALDI → desorption event usually creates single protonation
ESI → droplet evaporation leaves many protons on protein surface
Large proteins expose many basic residues
More sites → more protonation

So ESI gives a charge distribution spectrum.

🧩 6. Tandem MS: How Sequence Is Determined

Fragment Type	Breaks at	Name
b-ion	N-terminal fragment
y-ion	C-terminal fragment
c-ion	N-side different bond
z+1-ion	C-side different bond

How sequence is read

If peptide is:

A–B–C–D–E

You get:

b1 (A)
b2 (A–B)
b3 (A–B–C)
etc.

Differences between peaks correspond to one amino acid mass.

So sequence is deduced by: Mass difference between consecutive fragments.

❗ 7. Can MS Distinguish Lysine and Isoleucine?

You asked:

MS can't determine lysine and isoleucine because masses are same?

Correction:

It is Leucine (L) and Isoleucine (I) that have identical mass.

Both:

Same elemental composition
Same molecular weight
Structural isomers

Mass spectrometry cannot distinguish them by mass alone.

However:

HPLC separation can sometimes distinguish due to structural differences.
Advanced MS fragmentation methods sometimes help.

But classical MS → cannot distinguish L and I.

📏 8. Protein Size Determination Methods

1️⃣ SDS-PAGE

SDS denatures proteins
Coats protein with negative charge
~1 SDS per 2 amino acids
Mobility depends mainly on size

Smaller proteins → move faster.

Used for 1–200 kDa.

2️⃣ Mass Spectrometry

Most accurate molecular weight method.

Can detect:

Monomers
Dimers
Multiple charge states

3️⃣ Size Exclusion Chromatography (SEC)

You asked:

Smaller molecules have longer travel time? Why?

Yes — and this is correct.

Column contains porous beads.

Large proteins:

Cannot enter pores
Travel straight through
Elute earlier

Small proteins:

Enter pores
Take longer path
Elute later

So:

Size	Elution Time
Large	Short
Small	Long

Separation based on hydrodynamic radius.

⚖️ 9. Comparison of Methods

Method	Accuracy	Size Range	Notes
SDS-PAGE	Moderate	1–200 kDa	Very common
SEC	Lower accuracy	Up to 10,000 kDa	Good for complexes
Ultracentrifugation	Good	Large complexes	Rarely used today
Mass Spectrometry	Very high	Depends	Best for exact MW

🔬 10. Post-Translational Modifications (PTMs)

Mass spectrometry can detect:

Phosphorylation
Glycosylation
Other modifications

Because: Fragment masses shift according to modification.

DNA sequencing cannot detect PTMs.

🧠 Final Big Picture

To determine protein sequence experimentally:

Digest protein
Separate peptides (HPLC)
Ionize (ESI or MALDI)
MS1 → select peptide
Fragment in collision cell
MS2 → read b/y ion ladder
Repeat with different protease
Align overlapping fragments

🔎 Summary of Your Specific Questions

✔ Edman limited due to cumulative inefficiency ✔ Two proteases needed for overlap mapping ✔ Tandem MS determines sequence via fragment mass differences ✔ MS cannot distinguish Leu/Ile by mass ✔ b, y, c, z+1 ions result from backbone cleavage ✔ ESI produces multiple charge states due to droplet protonation ✔ SEC: small molecules take longer because they enter pores

Quiz

Score: 0/30 (0%)

Q0. Why is Edman degradation limited to sequencing only about 10 amino acids from the N-terminus?

Because peptides longer than 10 residues cannot bind the reagent

Because labeling efficiency decreases cumulatively with each cycle

Because reverse-phase HPLC cannot separate more than 10 amino acids

Because internal residues cannot be cleaved

Q1. Why are two different proteases often used when sequencing a protein?

To increase digestion speed

To prevent post-translational modifications

To generate overlapping peptide fragments for sequence reconstruction

To reduce peptide charge states

Q2. Which residues does trypsin cleave after?

Phenylalanine and tyrosine

Lysine and arginine

Aspartate and glutamate

Methionine only

Q3. Which protease primarily cleaves after aromatic amino acids?

Trypsin

V8 protease

Chymotrypsin

Asp-N

Q4. What is the main purpose of the collision cell in tandem mass spectrometry?

To ionize peptides

To fragment selected peptide ions

To remove matrix molecules

To neutralize charges

Q5. Which ion types are most commonly observed in collision-induced dissociation (CID)?

a and x ions

b and y ions

c and z+1 ions

m and n ions

Q6. Why can mass spectrometry not distinguish leucine from isoleucine?

They fragment identically

They have identical molecular masses

They ionize differently

They have identical polarity

Q7. Why does electrospray ionization (ESI) produce multiple charge states?

Because proteins are unstable in vacuum

Because matrix molecules attach repeatedly

Because multiple protonation sites remain after droplet evaporation

Because laser energy causes over-ionization

Q8. In MALDI-TOF, most detected protein ions typically carry:

One charge

Ten charges

Twenty charges

No charge

Q9. In size exclusion chromatography (SEC), which molecules elute first?

Small molecules

Highly charged molecules

Large molecules

Hydrophobic molecules

Q10. Why do small proteins elute later in SEC?

They bind to the column material

They enter pores and take a longer path

They are negatively charged

They aggregate inside the column

Q11. What does SDS do in SDS-PAGE?

Cleaves peptide bonds

Adds positive charge

Denatures proteins and provides uniform negative charge

Removes glycosylation

Q12. Why is DTT added during SDS-PAGE preparation?

To oxidize cysteines

To reduce disulfide bonds

To increase electrophoretic mobility

To stabilize SDS binding

Q13. What information can MS/MS provide that genomic databases cannot?

DNA codon usage

Post-translational modifications

Promoter sequences

mRNA stability

Q14. Why are overlapping peptides necessary when sequencing proteins?

To increase ionization efficiency

To confirm protease specificity

To reconstruct the full protein sequence

To reduce chromatographic time

Q15. True or False: Edman degradation removes amino acids from the C-terminus of a peptide.

True

False

Q16. True or False: In tandem MS, fragment mass differences correspond to individual amino acid masses.

True

False

Q17. True or False: Trypsin cleaves before lysine and arginine residues.

True

False

Q18. True or False: MALDI is considered a soft ionization technique.

True

False

Q19. True or False: SDS-PAGE provides exact molecular weight determination.

True

False

Q20. True or False: Large protein complexes are difficult to analyze by mass spectrometry without prior digestion.

True

False

Q21. True or False: In ESI, one protein molecule can carry multiple positive charges.

True

False

Q22. True or False: Size exclusion chromatography separates proteins based on charge differences.

True

False

Q23. True or False: Leucine and isoleucine can be distinguished solely by their molecular mass in MS.

True

False

Q24. True or False: Post-translational phosphorylation can be detected in MS/MS spectra.

True

False

Q25. True or False: In SEC, proteins that cannot enter the bead pores elute later.

True

False

Q26. True or False: The unit 'Svedberg' originates from ultracentrifugation studies.

True

False

Q27. True or False: The matrix in MALDI assists in energy absorption and ion formation.

True

False

Q28. True or False: Fragmentation in MS/MS occurs only at peptide bonds.

True

False

Q29. True or False: Using two different digestion enzymes increases confidence in sequence assembly.

True

False

Day 2 part 2

🧬 1. How Do We Determine Protein Primary Structure?

First choice: DNA databases

✂️ 2. Protein Digestion: Why Do We Cut Proteins?

🔪 Enzymatic Cleavage (Very Important)

🧪 3. Edman Degradation (Stepwise N-terminal sequencing)

How it works

❗ Why can we only sequence ~10 amino acids?

🔁 4. Why Use Trypsin AND Chymotrypsin?

⚡ 5. Tandem Mass Spectrometry (MS/MS)

Step 1 – Ionization

🟢 MALDI-TOF

🔵 ESI (Electrospray Ionization)

❓ Why does ESI produce more charges than MALDI?

🧩 6. Tandem MS: How Sequence Is Determined

MS1:

Collision Cell:

MS2:

💥 Peptide Fragmentation

How sequence is read

❗ 7. Can MS Distinguish Lysine and Isoleucine?

📏 8. Protein Size Determination Methods

1️⃣ SDS-PAGE

2️⃣ Mass Spectrometry

3️⃣ Size Exclusion Chromatography (SEC)

⚖️ 9. Comparison of Methods

🔬 10. Post-Translational Modifications (PTMs)

🧠 Final Big Picture

🔎 Summary of Your Specific Questions

Quiz