Day 2 part 2

Protein chemistry

🧬 1. How Do We Determine Protein Primary Structure?

First choice: DNA databases

Today, we usually determine protein sequence from:

  1. Genome / cDNA databases
  2. Translate DNA β†’ protein sequence

But this does not reveal:

  • Post-translational modifications (PTMs)
  • Processing events (cleavages, signal peptides removed)

When DNA info is missing or incomplete β†’ we must determine sequence experimentally.


βœ‚οΈ 2. Protein Digestion: Why Do We Cut Proteins?

Large proteins are too big to sequence directly. So we:

  1. Digest into smaller peptides
  2. Separate peptides (HPLC)
  3. Sequence peptides (Edman or MS)
  4. Reconstruct full sequence

πŸ”ͺ Enzymatic Cleavage (Very Important)

Each protease cuts at specific residues:

EnzymeCleaves
TrypsinAfter Lys (K) & Arg (R)
ChymotrypsinAfter Phe (F), Trp (W), Tyr (Y)
V8 proteaseAfter Asp (D), Glu (E)
Asp-NBefore Asp
ThermolysinBefore large hydrophobic residues

Also chemical cleavage:

  • CNBr β†’ after Met
  • Mild acid β†’ Asp–Pro
  • Hydroxylamine β†’ Asn–Gly

These different cleavage patterns are crucial for determining sequence order.


πŸ§ͺ 3. Edman Degradation (Stepwise N-terminal sequencing)

How it works

  1. Label N-terminal amino acid with phenyl isothiocyanate (PITC)
  2. Cleave off labeled amino acid
  3. Identify it by reverse-phase HPLC
  4. Repeat cycle

❗ Why can we only sequence ~10 amino acids?

You asked:

Edman only sequences few AA because labeling is not 100% degraded?

Yes β€” correct, and here is the full explanation:

Each cycle is ~99% efficient (example).

Cycle 1:

  • 99% correct removal
  • 1% remains uncleaved

Cycle 2:

  • Now mixture contains:
    • Mostly second AA
    • 1% leftover from first cycle

Cycle 3:

  • Even more mixture

After many rounds:

  • Signal becomes too noisy
  • Peaks overlap
  • Sequence becomes ambiguous

So we can only reliably sequence ~10 residues.

This is cumulative inefficiency.


πŸ” 4. Why Use Trypsin AND Chymotrypsin?

You asked:

To know the order, do we use trypsin & chymotrypsin?

Yes β€” this is essential logic.

If we digest only with trypsin:

Example fragments:

A - B - C - K D - E - F - R G - H - I - K

We don’t know their order in full protein.

Now digest same protein with chymotrypsin:

Different fragments:

C - K - D F - R - G I - K - J

Now we find overlaps:

  • Fragment 1 overlaps with fragment 2
  • Like solving a puzzle

This is called overlapping peptide mapping.

Without two different digestions β†’ cannot reconstruct full order.

This applies to BOTH:

  • Edman sequencing
  • Mass spectrometry

⚑ 5. Tandem Mass Spectrometry (MS/MS)

Modern method of choice.

Step 1 – Ionization

Two soft ionization methods:

🟒 MALDI-TOF

Matrix Assisted Laser Desorption Ionization Usually produces:

  • Mostly singly charged ions (+1)
  • Sometimes +2

πŸ”΅ ESI (Electrospray Ionization)

Sprays protein into charged droplets.

Produces:

  • Multiple charge states (+5, +10, +20…)
  • One molecule can carry many charges

❓ Why does ESI produce more charges than MALDI?

Because:

  • MALDI β†’ desorption event usually creates single protonation
  • ESI β†’ droplet evaporation leaves many protons on protein surface
  • Large proteins expose many basic residues
  • More sites β†’ more protonation

So ESI gives a charge distribution spectrum.


🧩 6. Tandem MS: How Sequence Is Determined

MS1:

Measure mass of intact peptide.

Collision Cell:

Peptide fragmented.

MS2:

Measure masses of fragments.


πŸ’₯ Peptide Fragmentation

Fragmentation happens at peptide backbone.

Main ions:

Fragment TypeBreaks atName
b-ionN-terminal fragment
y-ionC-terminal fragment
c-ionN-side different bond
z+1-ionC-side different bond

Most common in CID:

  • b-ions
  • y-ions

How sequence is read

If peptide is:

A–B–C–D–E

You get:

  • b1 (A)
  • b2 (A–B)
  • b3 (A–B–C)
  • etc.

Differences between peaks correspond to one amino acid mass.

So sequence is deduced by: Mass difference between consecutive fragments.


❗ 7. Can MS Distinguish Lysine and Isoleucine?

You asked:

MS can't determine lysine and isoleucine because masses are same?

Correction:

It is Leucine (L) and Isoleucine (I) that have identical mass.

Both:

  • Same elemental composition
  • Same molecular weight
  • Structural isomers

Mass spectrometry cannot distinguish them by mass alone.

However:

  • HPLC separation can sometimes distinguish due to structural differences.
  • Advanced MS fragmentation methods sometimes help.

But classical MS β†’ cannot distinguish L and I.


πŸ“ 8. Protein Size Determination Methods

1️⃣ SDS-PAGE

  • SDS denatures proteins
  • Coats protein with negative charge
  • ~1 SDS per 2 amino acids
  • Mobility depends mainly on size

Smaller proteins β†’ move faster.

Used for 1–200 kDa.


2️⃣ Mass Spectrometry

Most accurate molecular weight method.

Can detect:

  • Monomers
  • Dimers
  • Multiple charge states

3️⃣ Size Exclusion Chromatography (SEC)

You asked:

Smaller molecules have longer travel time? Why?

Yes β€” and this is correct.

Column contains porous beads.

Large proteins:

  • Cannot enter pores
  • Travel straight through
  • Elute earlier

Small proteins:

  • Enter pores
  • Take longer path
  • Elute later

So:

SizeElution Time
LargeShort
SmallLong

Separation based on hydrodynamic radius.


βš–οΈ 9. Comparison of Methods

MethodAccuracySize RangeNotes
SDS-PAGEModerate1–200 kDaVery common
SECLower accuracyUp to 10,000 kDaGood for complexes
UltracentrifugationGoodLarge complexesRarely used today
Mass SpectrometryVery highDependsBest for exact MW

πŸ”¬ 10. Post-Translational Modifications (PTMs)

Mass spectrometry can detect:

  • Phosphorylation
  • Glycosylation
  • Other modifications

Because: Fragment masses shift according to modification.

DNA sequencing cannot detect PTMs.


🧠 Final Big Picture

To determine protein sequence experimentally:

  1. Digest protein
  2. Separate peptides (HPLC)
  3. Ionize (ESI or MALDI)
  4. MS1 β†’ select peptide
  5. Fragment in collision cell
  6. MS2 β†’ read b/y ion ladder
  7. Repeat with different protease
  8. Align overlapping fragments

πŸ”Ž Summary of Your Specific Questions

βœ” Edman limited due to cumulative inefficiency βœ” Two proteases needed for overlap mapping βœ” Tandem MS determines sequence via fragment mass differences βœ” MS cannot distinguish Leu/Ile by mass βœ” b, y, c, z+1 ions result from backbone cleavage βœ” ESI produces multiple charge states due to droplet protonation βœ” SEC: small molecules take longer because they enter pores

Quiz

Score: 0/30 (0%)