Lecture 6 Video 2

Protein structure

🧬 Lecture 6 – Protein Structure Bioinformatics & Classification

(Fun, detailed, and beginner-friendly walkthrough)

This lecture moves from sequence bioinformatics into the world of structure bioinformatics β€” and that’s where things get very interesting. Instead of comparing strings of amino acids, we compare 3D shapes of proteins.

I’ll go through everything step-by-step and explain the logic behind it clearly.


🧩 1. Structure Alignment vs Structure Superposition

You may hear these terms used interchangeably β€” but they are not the same.

πŸ”Ή Structure Alignment

  • Used for different proteins
  • They may:
    • Have different sequences
    • Be homologs from different organisms
    • Be mutants
    • Have different residue numbering
  • Goal: πŸ‘‰ Find the best overlap in 3D space

Importantly:

  • Not all parts of proteins necessarily overlap
  • You must identify which regions should be compared
  • Alignment is based on atomic coordinates, not sequence similarity

πŸ”Ή Structure Superposition

  • Used for identical proteins
  • Example: same protein in two conformations
  • Goal: Compare structural differences

So:

  • Alignment = comparing different proteins
  • Superposition = comparing the same protein in different states

πŸ§ͺ 2. How Alignment Works in Practice (PyMOL Example)

In PyMOL, two commands are commonly used:

  • align
  • super

Example from the lecture: Two proteins (CPZ and CC8) both had:

  • 4-stranded Ξ²-sheet
  • 2 Ξ±-helices

After alignment:

  • RMSD = 2.2 Γ…
  • That is considered quite good

πŸ“ What is RMSD?

RMSD = Root Mean Square Deviation It measures the average distance between aligned atoms.

  • Low RMSD β†’ structures are very similar
  • High RMSD β†’ poor alignment

Important:

PyMOL did NOT align sequences β€” it aligned atomic coordinates

This is structural comparison, not sequence comparison.


🧬 3. Sequence β†’ Structure β†’ Function (Direction Matters!)

The lecture emphasizes something crucial:

🧭 Direction of Determination

  1. DNA sequence β†’ determines protein sequence
  2. Protein sequence β†’ determines protein structure

BUT NOT THE OTHER WAY AROUND.

Why?

  • Genetic code is redundant
  • Many DNA sequences β†’ same protein sequence
  • Many protein sequences β†’ same structure
  • Fewer possible structures than sequences

πŸ”½ Variability decreases downward:

DNA variability > Protein sequence variability > Protein structure variability

This is extremely important for understanding why structure classification works.


🧠 4. Structure vs Function β€” Not Always Simple

Two important evolutionary insights:

πŸ”Ή Similar structure, different function

Rare but possible

πŸ”Ή Same function, different structures

Example: serine proteases Different folds, same catalytic activity Likely convergent evolution

This tells us: Structure and function are often linked β€” but not guaranteed.


πŸ—‚ 5. Why Classify Protein Structures?

Because:

  • There are fewer possible folds than sequences
  • We want to organize structural space
  • Helps predict function
  • Helps understand evolution

Two major databases attempt this:


πŸ› 6. CATH Database

CATH stands for:

  • Class
  • Architecture
  • Topology
  • Homologous superfamily

πŸ”Ή Level 1: Class

Four main classes:

  • Mainly Ξ±
  • Mainly Ξ²
  • Ξ±/Ξ²
  • Few secondary structures (often unstructured proteins)

πŸ”Ή Level 2: Architecture

Describes overall arrangement of secondary structures.

Examples in Ξ±/Ξ² class:

  • Super roll
  • Ξ² barrel
  • Two-layer sandwich
  • Three-layer sandwich
  • Ξ±Ξ²Ξ± three-layer
  • Ξ²Ξ²Ξ± three-layer

Architecture = overall 3D arrangement (Not sequence order yet)


πŸ”Ή Level 3: Topology

Topology = πŸ‘‰ The path the polypeptide chain takes through the structure

It depends on:

  • Order of secondary structures
  • Number of elements

Two proteins can:

  • Have same architecture
  • But different topology

Because the order in sequence differs

Topology β‰  spatial arrangement only Topology = connectivity pattern


πŸ”Ή Level 4: Homologous Superfamily

Groups proteins with:

  • Structural similarity
  • Evolutionary relationship

πŸ› 7. SCOP Database

SCOP = Structural Classification of Proteins

More complex hierarchy than CATH.

Levels include:

  • Class
  • Fold
  • Superfamily
  • Family

Example: In class Ξ± and Ξ²:

  • 147 different folds

Difference between Ξ±/Ξ² and Ξ±+Ξ² is not always obvious


🧩 8. Domain Classification β€” The Complicated Part

Important: Both CATH and SCOP classify domains, not entire proteins

What is a domain?

A structural and functional unit within a protein.

Example: Pyruvate phosphate dikinase

  • SCOP identified 3 domains
  • CATH identified 6 domains

Even more confusing:

  • Some domains overlap
  • Some domains consist of non-contiguous sequence regions

This makes domain definition:

  • Difficult
  • Sometimes manual
  • Not fully reproducible

CATH:

  • Automated + manual inspection

SCOP:

  • Manual classification

That introduces operator bias.


πŸ”Ž 9. Finding Similar Structures – The DALI Server

Suppose you:

  • Solved a new structure
  • Or built a homology model

How do you know if similar structures exist?

πŸ‘‰ Use the DALI server

Process:

  1. Upload PDB file
  2. DALI compares your structure to all known structures
  3. Returns similar hits

Example: Uploaded small copper-binding protein (COPSET)

Returned hits:

  • Copper transporting proteins
  • Mercury transporting proteins
  • Heavy metal binding proteins

Conclusion: Proteins with similar structure often share similar function


🧠 Big Conceptual Takeaways

🧬 1. Structure is more conserved than sequence

You can lose sequence similarity and still retain fold.

🧱 2. Alignment is geometric, not sequence-based

Atomic coordinates are compared.

πŸ› 3. CATH and SCOP organize structural space differently

  • CATH: hierarchical & semi-automated
  • SCOP: more manual & detailed

🧩 4. Domain definition is not trivial

It is partly subjective and complex.

πŸ” 5. DALI is your structural BLAST

It finds structural neighbors.


πŸ“Œ Conceptual Flow of the Lecture

  1. Structural alignment basics
  2. RMSD interpretation
  3. Sequence β†’ structure β†’ function direction
  4. Evolutionary implications
  5. Structural classification systems
  6. Domain complications
  7. Structural similarity search

Quiz

Score: 0/30 (0%)