Day 5 part 2 microbial community analysis using 16S rRNA

Environmental Biotechnology

🧠 Overall Goal

The goal of this session is to gain hands-on experience in microbiome analysis using R and the ampvis2 (AMBIS2) package. You’ll explore real microbiome data from wastewater treatment plants (WWTPs) in Denmark — focusing on how microbial communities vary, how to visualize them, and how to interpret basic ecological patterns.


⚙️ Step 1: From Sequences to Data Tables

Concept

In microbiome studies, raw DNA sequences (from 16S rRNA or similar) are processed into:

  • OTU/ASV tables (LSB tables) → matrices showing which species (or taxa) are present in which samples and how many reads each has.
  • Metadata → contextual info (location, time, temperature, treatment type, etc.) describing each sample.

🧩 Without metadata, the data are meaningless — you’d know what was sequenced but not where or when.

In this course, the LSB table and metadata are already provided, so students focus on data exploration, not preprocessing.


🧫 The Data Set

  • Real samples from Danish WWTPs (50 plants total; 20 sampled weekly since 2015).
  • Course subset: 4 WWTPs sampled weekly in 2020, about 140 samples.
  • Makes Denmark one of the world’s best-catalogued wastewater microbiome systems 🇩🇰.

🧩 Step 2: Microbiome Analysis Workflow

1️⃣ Quality Control (QC)

Before analysis, the dataset must be filtered:

  • Remove low-quality reads or rare taxa.
  • Ensure consistency between sample metadata and taxonomy tables.

2️⃣ Data Exploration

Once data are clean:

  • Examine species composition across samples.
  • Look for patterns in microbial communities.

🔬 Core Analyses

🧯 Heatmaps

Visualize which species are abundant in which samples. Rows = species; columns = samples; color intensity = abundance. ➡️ Turns giant unreadable tables into intuitive color-coded maps 🎨.


📦 Boxplots

Used to compare distributions — for example, how the abundance of a certain taxon differs between treatment plants or seasons.


🌍 Ordination (Beta Diversity)

Explore how similar or different microbial communities are across samples. Common methods:

  • PCA (Principal Component Analysis)
  • PCoA (Principal Coordinates Analysis)
  • NMDS (Non-metric Multidimensional Scaling)

Each point = sample → close points share similar microbial communities. This helps find clusters, gradients, or outliers in microbial composition.


🧮 Alpha Diversity

Although not deeply elaborated, this typically refers to within-sample diversity (e.g., Shannon, Simpson indices). It measures how many species and how evenly they are distributed.


⏳ Time Series

Track changes in microbial composition over time. Useful for studying seasonal variation or system stability.


⚙️ Functional Information

Once taxonomic patterns are clear, further analysis can identify metabolic functions or gene content of the microbial community — though this part is only briefly mentioned here.


📦 Step 3: Using the ampvis2R (AMBIS2) Package

This is the main analysis tool.

🔧 What It Does

  • Handles data import and merging (avoids mismatch errors).
  • Performs QC, filtering, subsetting, and visualization.
  • Works seamlessly with ggplot2 for custom plots.
  • Enables reproducible workflows (important for science).

🧭 Key Features

  • Easy to combine LSB table, metadata, and taxonomy.
  • Pre-built visualization tools (heatmaps, boxplots, ordinations, time series).
  • Excellent documentation (via GitHub or R help).

💡 Designed for both teaching and real-world research.


🗂️ Data Files for the Exercise

The folder data_for_hands-on contains:

  1. LSB table → samples × counts.
  2. Taxonomy files → two versions (MIDAS and SILVA databases).
  3. Metadata file → describes each sample.
  4. Exercise file → includes instructions, hints, and code scaffolds.
  5. Answer plots → reference visuals (without code to prevent copy-paste learning).

📁 All files should be kept in one folder for smoother R operation (setwd() to that folder).


💻 Practical Tips for R Setup

  • Install R and RStudio.
  • Install ampvis2R and dependencies.
  • Work inside one consistent directory.
  • If you’re new to R, load all files in one place to avoid path issues.
  • Experienced users can set working directories manually.

🎯 Final Aim

Not to produce graded results — but to practice, learn, and gain confidence in microbiome data analysis.

By the end, participants understand:

  • How to visualize microbial community data.
  • How to measure diversity.
  • How to explore time-dependent ecological trends.
  • How to use R and ampvis2R efficiently for reproducible science.

Quiz

Score: 0/30 (0%)