Code
suppressMessages(suppressWarnings(library(vegan)))
suppressMessages(suppressWarnings(library(ggplot2)))
suppressMessages(suppressWarnings(library(phyloseq)))
Olga Brovkina (o.brovkina@ikmb.uni-kiel.de)
This document provides an analysis of beta-diversity using a distance matrix and metadata.
We will work with two specific microbiome packages phyloseq
ans vegan
. The tidyverse
package is common R package
We will use rarefied samples for this analysis
ps.rare argument specifies the phyloseq object that contains your microbiome data. method = “PCoA”: This specifies that Principal Coordinates Analysis (PCoA) is the ordination method to be used. distance = “bray”: This indicates that the Bray-Curtis dissimilarity is the distance metric used for the PCoA.
We colored scatters by country and shaped by lifestyle. You can choose other variables from the sample_data(ps.rare)
plot_ordination(
physeq = ps.rare,
ordination = bray) +
geom_point(aes(color = country, shape = lifestyle), size = 3) + # Points are colored by country and shaped by lifestyle
theme_classic() + # Applying a classic theme for a clean look
theme(legend.title = element_blank(), # Removing legend titles for simplicity
legend.position = "right", # Positioning the legend on the right side
text = element_text(size = 12), # Setting the base text size
axis.title = element_text(size = 14))
continuous
you can also save the plot. Saving plot as pdf will provide you with the best resolution. Sometimes it is woth to save the plot as png/jpg object, e.g. for presentation
p1 <- plot_ordination(
physeq = ps.rare,
ordination = bray) +
geom_point(aes(color = country, shape = lifestyle), size = 3) +
theme_classic() +
theme(legend.title = element_blank(),
legend.position = "right",
text = element_text(size = 12),
axis.title = element_text(size = 14))
# Save the plot as a PDF
ggsave(filename = "~/kmc_workshop/results/pcoa_bray.pdf", plot = p1, device = "pdf", width = 10, height = 6)
This approach incorporates the relative abundances of taxa, providing a more nuanced measure of community differences that takes into account both phylogenetic relationships and species abundances.
to analyse how environmental variables influence the variation in community data we need distances, not ordination!
Check if there are any remaining missing values
#situation where you drop the samples: 1.If a particular variable in the metadata is crucial for your research (e.g., age, gender, or a key clinical measurement), and some samples have missing values for that variable, you might choose to drop those samples. 2.If certain samples are outliers that could significantly skew your results
#situation where you drop variable from meta_data: 1. the you need to preserve as much samples as possible; 2.If a variable is not essential to your research question or if it is highly correlated with another variable; 3. In cases where a variable has very little variation
We want to know which environmental variables are most important for explaining the variation in your data We will start with an intercept-only model as the lower scope
Perform stepwise model selection
adonis takes a dist
object as an input
Let’s incorporate adonis result into the final ordination plot We extract p-values from ADONIS result and define significance level for asterisks
# Extract p-values from ADONIS result
p_values <- adonis_result$`Pr(>F)`
# Define significance level for asterisks
asterisks <- ifelse(p_values < 0.001, "***",
ifelse(p_values < 0.01, "**",
ifelse(p_values < 0.05, "*", "")))
p2 <- plot_ordination(
physeq = ps.rare,
ordination = wuni) +
geom_point(aes(color = country, shape = lifestyle), size = 3) +
theme_classic() +
theme(legend.title = element_blank(),
legend.position = "right",
text = element_text(size = 12),
axis.title = element_text(size = 14))
# Adding asterisks to the plot manually
p2 + annotate("text", x = 0, y = 0.05, label = paste("Country:", asterisks[1]), size = 5, color = "black") +
annotate("text", x = 0, y = 0.045, label = paste("Lifestyle:", asterisks[2]), size = 5, color = "black")