We start with a fundamental problem in environmental genomics:
We can’t understand what an organism can do if we don’t know what its genes actually do.
When scientists find new DNA sequences, they need to figure out:
But this process is iterative — we must keep checking, predicting, and testing as new genes are discovered.
In microbial communities:
So microdiversity makes it very hard to map out what’s really happening in an ecosystem.
Databases like GenBank and UniProt are open — anyone can upload gene sequences. But here’s the problem:
If one early annotation is wrong, the mistake spreads — because new annotations are based on the old ones. This is called the transitive catastrophe ⚠️:
A false assumption gets copied again and again, creating an expanding chain of error.
In the 1940s, Beadle and Tatum proposed the one gene → one protein → one function model, which earned them a Nobel Prize. 🏅 But modern biology shows it’s much more complex:
This flexibility is part of what we now call epigenetics — how the same genome can produce very different outcomes.
🦋 Example: A caterpillar and a butterfly share the same DNA, yet look and act completely different because gene expression changes drastically between life stages.
Even bacteria can do this! They may look different or form filaments under certain conditions but still be the same species. So: morphology ≠function.
To understand function, we combine multiple “omics” data sets:
| Omics Type | What It Tells Us | Limitation |
|---|---|---|
| Genomics (DNA) | What could happen | No info on activity |
| Transcriptomics (RNA) | Which genes are being expressed | Fluctuates quickly |
| Proteomics (Proteins) | What workhorses are active | Difficult to detect all |
| Metabolomics (Metabolites) | What results from activity | Hard to trace back to genes |
💡 Idea: Combine DNA (potential) + proteins (activity) to connect who’s there and what they’re doing.
Scientists test this by:
If a protein’s signal is stronger with the pollutant → it’s upregulated If weaker → downregulated
Data visualization tool for proteomics results:
Only proteins that change >2-fold (logâ‚‚) and are statistically significant are interesting.
These proteins are then matched to the genome to figure out:
After identifying candidate proteins, scientists use databases like KEGG (Kyoto Encyclopedia of Genes and Genomes):
đź§Ş Example: When studying Gemfibrozil (a cholesterol-lowering drug), researchers saw certain amino acid synthesis enzymes upregulated. This hinted that microbes might be degrading the compound or using it to make new proteins.
Databases can simulate degradation routes:
If a specific metabolite is also found experimentally (via metabolomics), that pathway is confirmed âś….
Result: We can build a predicted gene pathway for how microbes degrade pollutants!
| Concept | Meaning |
|---|---|
| Microdiversity problem | Rare microbes are hard to analyze |
| Transitive catastrophe | Wrong annotations get copied and amplified |
| One gene–one function is outdated | Genes and proteins are multifunctional |
| Epigenetics | Same DNA, different expression outcomes |
| Omics integration | DNA = potential, Proteins = activity, Metabolites = result |
| Volcano plot | Visualizes up/downregulated proteins |
| KEGG mapping | Links proteins to metabolic functions |
| Pathway prediction | Reveals how pollutants are degraded |