Analysis of Beta-Diversity

Author

Olga Brovkina (o.brovkina@ikmb.uni-kiel.de)

This document provides an analysis of beta-diversity using a distance matrix and metadata.

Load Required Libraries

We will work with two specific microbiome packages phyloseq ans vegan. The tidyverse package is common R package

Code
suppressMessages(suppressWarnings(library(vegan)))
suppressMessages(suppressWarnings(library(ggplot2)))
suppressMessages(suppressWarnings(library(phyloseq)))

Load phyloseq object from previous step

We will use rarefied samples for this analysis

Code
load("~/kmc_workshop/R_objects/subset_rare_phyloseq_object.RData")

Perform PCoA ordination using Bray-Curtis distance

ps.rare argument specifies the phyloseq object that contains your microbiome data. method = “PCoA”: This specifies that Principal Coordinates Analysis (PCoA) is the ordination method to be used. distance = “bray”: This indicates that the Bray-Curtis dissimilarity is the distance metric used for the PCoA.

Code
bray <- ordinate(
  physeq = ps.rare, 
  method = "PCoA", 
  distance = "bray"
)
Code
#What other distances and methods are avialable in vegan package in `ordination` function? Try below other approach
Code
help(ordinate)
Code
# Explore variables in your sample_data
Code
sample_data(ps.rare)

Creating PCoA ordination plot based on Bray-Curtis distance

We colored scatters by country and shaped by lifestyle. You can choose other variables from the sample_data(ps.rare)

Code
plot_ordination(
  physeq = ps.rare,                                 
  ordination = bray) + 
  geom_point(aes(color = country, shape = lifestyle), size = 3) +  # Points are colored by country and shaped by lifestyle
  theme_classic() +                                                 # Applying a classic theme for a clean look
  theme(legend.title = element_blank(),                             # Removing legend titles for simplicity
        legend.position = "right",                                  # Positioning the legend on the right side
        text = element_text(size = 12),                             # Setting the base text size
        axis.title = element_text(size = 14))  
Code
#Here you can try other variables for colors and shape. Which type variables can not be used for shapes? 

continuous

you can also save the plot. Saving plot as pdf will provide you with the best resolution. Sometimes it is woth to save the plot as png/jpg object, e.g. for presentation

Code
p1 <- plot_ordination(
  physeq = ps.rare,                                 
  ordination = bray) + 
  geom_point(aes(color = country, shape = lifestyle), size = 3) +  
  theme_classic() +                                                 
  theme(legend.title = element_blank(),                             
        legend.position = "right",                                  
        text = element_text(size = 12),                             
        axis.title = element_text(size = 14))

# Save the plot as a PDF
ggsave(filename = "~/kmc_workshop/results/pcoa_bray.pdf", plot = p1, device = "pdf", width = 10, height = 6)

Perform PCoA ordination using Weighted UniFrac distance

This approach incorporates the relative abundances of taxa, providing a more nuanced measure of community differences that takes into account both phylogenetic relationships and species abundances.

Code
wuni <- ordinate(
  physeq = ps.rare, 
  method = "PCoA", 
  distance = "wunifrac"
)

Creating PCoA ordination plot based on Weighted UniFrac distance

Code
plot_ordination(
  physeq = ps.rare,                                 
  ordination = wuni) + 
  geom_point(aes(color = country, shape = lifestyle), size = 3) + 
  theme_classic() +                                                 
  theme(legend.title = element_blank(),                             
        legend.position = "right",                                  
        text = element_text(size = 12),                            
        axis.title = element_text(size = 14)) 

Creating Weighted UniFrac distance object

to analyse how environmental variables influence the variation in community data we need distances, not ordination!

Code
uni_distance <- phyloseq::distance(ps.rare, method = "wunifrac")

Prepare metadata

Code
meta_encoded <- sample_data(ps.rare)
Code
meta_encoded <- data.frame(meta_encoded)
for (col in colnames(meta_encoded)) {
  if (is.factor(meta_encoded[[col]]) || is.character(meta_encoded[[col]])) {
    meta_encoded[[col]] <- as.numeric(factor(meta_encoded[[col]]))
  }
}

Check if there are any remaining missing values

Code
sapply(meta_encoded, function(x) sum(is.na(x)))
Code
#Capscale analysis is sensitive to missing values
#In what situtaion will you drop the columns from the table with metadata and in what situations will you drop the rows (samples)?
Code
#situation where you drop the samples: 1.If a particular variable in the metadata is crucial for your research (e.g., age, gender, or a key clinical measurement), and some samples have missing values for that variable, you might choose to drop those samples. 2.If certain samples are outliers that could significantly skew your results 
#situation where you drop variable from meta_data: 1. the you need to preserve as much samples as possible; 2.If a variable is not essential to your research question or if it is highly correlated with another variable; 3. In cases where a variable has very little variation

Perform constrained ordination using capscale

Code
rda_model <- capscale(uni_distance ~ ., data = meta_encoded)

Stepwise Selection with capscale

We want to know which environmental variables are most important for explaining the variation in your data We will start with an intercept-only model as the lower scope

Code
lower_model <- capscale(uni_distance ~ 1, data = meta_encoded)

Perform stepwise model selection

Code
stepwise_model <- ordiR2step(lower_model, scope = formula(rda_model))

Plot the biplot based on rda_model

Code
plot(rda_model, scaling = 2) 

Perform adonis

adonis takes a dist object as an input

Code
adonis_result <- adonis2(
  formula = uni_distance ~ country + lifestyle, 
  data = as(sample_data(ps.rare), "data.frame")
)
Code
# What will happen if we include both lifestyle and urbanism in our formula?
Code
adonis_result <- adonis2(
  formula = uni_distance ~ country + lifestyle +urbanism, 
  data = as(sample_data(ps.rare), "data.frame")
)
adonis_result

Let’s incorporate adonis result into the final ordination plot We extract p-values from ADONIS result and define significance level for asterisks

Code
# Extract p-values from ADONIS result
p_values <- adonis_result$`Pr(>F)`

# Define significance level for asterisks
asterisks <- ifelse(p_values < 0.001, "***", 
                    ifelse(p_values < 0.01, "**", 
                           ifelse(p_values < 0.05, "*", "")))

p2 <- plot_ordination(
  physeq = ps.rare,                                                         
  ordination = wuni) +                                                
  geom_point(aes(color = country, shape = lifestyle), size = 3) +  
  theme_classic() +  
  theme(legend.title = element_blank(),
        legend.position = "right",  
        text = element_text(size = 12),  
        axis.title = element_text(size = 14)) 

# Adding asterisks to the plot manually
p2 + annotate("text", x = 0, y = 0.05, label = paste("Country:", asterisks[1]), size = 5, color = "black") +
  annotate("text", x = 0, y = 0.045, label = paste("Lifestyle:", asterisks[2]), size = 5, color = "black")