Differential abundance analysis

Author

Eike Matthias Wacker (e.wacker@ikmb.uni-kiel.de)

This tutorial can be executed with the kernel ‘kmc_workshop’ directly in the code-cells of this jupyter-notebook.

Objective

The aim of this tutorial is to give you an insight into possible ways how to perform differential abaundance analysis of microbiome amplicon data.

We continue to work with earlier processed GMBC dataset. To remind you, this is a dataset consisting of 16S rRNA V3-V4 amplicon data from human fecal samples.

Code
getwd()
# "~/KMC_workshop/scripts"
#setwd("~/KMC_workshop/scripts")

Load necessary libraries

For this tutorial we will stick with R packages, that you worked with earlier plus some new, that we later use for the differential abundance analysis.

Following packes are added: * DESeq2: DESeq2 is a framework designed for the analysis of differential expression analysis of RNASeq data. RNASeq data and microbiome amplicon data share many properties regarding their general structure, which enables us to repurpose DESeq2 for the analysis of microbiome data. * Maaslin2: Maaslin2 (Microbiome Multivariable Association with Linear Models 2) is a package for the the efficient determination of multivaribale associations between (clinical) metadata and microbial meta-omics features. It does so by applying general linear models with a multitude of filtering, normalization and tranformations methods.

Code
library(tidyverse)
library(phyloseq)
library(vegan)
library(microbiome)
library(DESeq2)
library(Maaslin2)
library(EnhancedVolcano)

Import and format the data

The dataset we are using was pre-processed using the DADA2 workflow.

Let’s import the data so, so we can start with the analysis:

Code
path <- "~/kmc_workshop/"

We load here the full gmbc-phyloseq object, so you can also play with other country combinations if you want, additionally the count data is still “raw”.

Code
load(paste0(path,'R_objects/full_phyloseq_object.RData'))

The phyloseq object

Code
ps

The dataset that we are using is called ‘ps’. It should look the same as in the previous tutorial sessions. If this is not the case, please inform one of the supervisors.

Note: Here we are using %>% (pipe) from the magrittr package which is part of the tidyverse. It is much easier to code with it. “(With it, you) … may pipe a value forward into an expression or function call; something along the lines of x %>% f, rather than f(x).” You will see the pipe alot in this tutorial.

Code
ps %>% 
    sample_data() %>% 
    pull(country) %>% 
    unique()

Tip To understand better what is going on, try to run each line removing the “%>%”

If the loaded phyloseq object should still contain several countries, for example because you have loaded the total-cohort phyloseq object instead of your previously prepared one, then use the lower cell to restrict the object to samples from only two countries of your choice. Please choose a country combination that contains both the lifestyle categories “lifestyle_industrialized” and “lifestyle_non_industrialized”.

Code
ps %>% 
    sample_data() %>% 
    data.frame() %>% 
    select(country, lifestyle) %>% 
    table()

In this case, you can execute the function here to restrict the phylseq object to your selected countries. If the object already contains only the country combination of samples, we copy the phyloseq object “ps” as “subset.ps”.

Code
#subset.ps <- subset_samples(ps, country %in% c("Tanzania", "UnitedStates")) # Execute this, if you need to subset your phyloseq object
subset.ps <- subset_samples(ps, country %in% c("Malaysia", "Finland"))

#subset.ps <- ps # Execute this if your object is already a subset.
Code
subset.ps

Analysis

Now we can start with the analysis part. We should first have a glimpse what the data looks like. For this purpose, we will make some plots of the raw abundance counts.

This is an important step to check for quality issues in the data. Some samples might only contain a single taxa or contain very low counts. Maybe we need to remove some samples due to poor quality.

First, let’s check that we have sufficient counts in every sample:

Code
counts <-
  subset.ps %>% 
  otu_table() %>%
  colSums()
counts %>% data.frame(counts=.) %>% head()

In our case we would like to have more than 8 thousand counts per samples.

Can you identify samples with too few counts? If so, one might need to exclude this sample from further analysis. If many samples show low count numbers, something else might be off and one should investigate these issues before continuing with the analysis.

We can also use ggplot2 to visualise the counts per sample. With a horizontal line as a minimum threshold and a conditional coloration it can be easily used to show outliers:

Code
to.plot <-
  counts %>% 
  data.frame(counts = ., Sample = names(.))
ggplot(to.plot, aes(x = Sample, y = counts, fill = counts > 8000)) +
  geom_col() +
  geom_hline(yintercept = 8000) + # add horizontal line 
  scale_fill_manual(values = c("TRUE" = "darkgrey", "FALSE" = "purple")) + # color low counts purple
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) # flip the axis names

To get the sample names with poor quality directly, we can filter the abdundance table directly and then display the row-names. Should we want to remove samples we can do this with the function commented out below:

Code
bad_quality <- to.plot[counts < 8000,] %>% rownames()
bad_quality

#subset.ps <- subset_samples(subset.ps, !(sample_names(subset.ps) %in% bad_quality))

Recap: Taxonomy

Here, we are going to look at the microbiome at the taxonomical perspective. To remember: we produced amplicon variant sequences (ASVs) using DADA2 and these were classified into different taxonomic levels.

Taoxnomic levels. Author: Annina Breen.

The assignment of taxonomic labels to the final ASV sequences is an important step, as these information can help make sense of the results for example by knowing that specific bacteria can perform specific metabolic functions. Also, taxonomic labels help to bin sequences together into larger phylogenetically related groups (meaning: groups that have a common ancestor at a given level of similarity), for example belonging to the same bacterial Phylum (very broad), Family (kind of similar), or even species or strain (very similar; many functions/genes are shared). As already mentioned, databases are still growing and newly discovered bacteria are constantly added to them, especially now as large-scale sequencing is widely available. This makes it especially hard to keep them up to date. Luckily, today we are working with samples from human stool samples, an environment which is widely studied and thus key members of the community are rather well-known and described.

So, lets have a look at the taxonomic composition of the stool microbiome

Tip To understand better what is going on, try to run each line removing the “%>%”

Set a taxonomic level for your analysis

For all the following Visualizations and Analysis we can set the taxonomic level in a single variable!

We will start with Phylum level but can later change this to other levels:

Code
taxonomic_level = "Family"
paste("We are using the taxonomic level", taxonomic_level, "for our analysis.")

We will use this variable again and again in the further steps so that the outputs are all comparable with each other. Of course, you always have the option of replacing the variable with a string with a valid taxonomic level name e.g. ‘Phylum’, ‘Order’ or ‘Genus’.

Visualization

Phyloseq offers a function to directly plot the taxonomic composition. This function is a so-called wrapper, i.e. it combines many different functions to simplify our lives. The output of this function is a ggplot object and can therefore be handled in exactly the same way as we did in the previous sessions.

Code
options(repr.plot.width = 16, repr.plot.height = 12, repr.plot.res = 100) #Set the plot size

plot_bar(subset.ps, x="Sample", fill=taxonomic_level) +  
    facet_grid(~ country, scales = "free_x",space="free") +# divide the plot into facets 
    theme(legend.position="none") # position of the legend in the plot, to remove legend use 'none'.

We see that samples have different sequence counts. Each ASV is a subdivision of the plot. Can you already spot phyla that differ between the two groups?

How can we improve here?

Due to the different counts in each sample, it is quite difficult to say whether a certain taxa is more common in one group than in others, so the abundance can be relativized here so that all samples have the same total number of counts. This can be achieved by converting our abundance dataframe from absolute to relative counts (the in total 100 counts equal 100 percent), but we can also simply use rarefying, both options lead to the desired result.

In this case, we will use a rarefying procedure to reduce all counts for every sample to the minimum counts number we have in our dataset. This is why it is important to remove samples with low quality prior to the analysis.

Code
options(repr.plot.width = 16, repr.plot.height = 12, repr.plot.res = 100) #Set the plot size

subset.ps %>% 
  aggregate_taxa(level = taxonomic_level) %>% # aggregate all ASVs into the level phyloum
  rarefy_even_depth(rngseed = 123) %>% # make all samples with the same sequencing depth using rarefaction
  plot_bar(x="Sample", fill=taxonomic_level) +  
    facet_grid(~ country, scales = "free_x", space="free") + # divide the plot into facets 
    theme(legend.position="none") # remove legend from plot

As you can see, we have quite a few different taxa in our plot. The colors differ only marginally when plotting dozents of unique taxa and you can hardly distinguish the taxa.

To keep a plot clearly arranged, we can also plot only the most abundant taxa by using the ‘aggregate_top_taxa’ function:

Code
#Original function was deprecated. So we keep a backup here:
aggregate_top_taxa2 <- function(x, top, level) {
  x <- aggregate_taxa(x, level)
  tops <- top_taxa(x, top)
  tax <- tax_table(x)
  inds <- which(!rownames(tax) %in% tops)
  tax[inds, level] <- "Other"
  tax_table(x) <- tax
  tt <- tax_table(x)[, level]
  tax_table(x) <- tax_table(tt)
  aggregate_taxa(x, level)
}
subset.ps %>% 
    aggregate_top_taxa2(top = 10, level = taxonomic_level) %>% # Here we used the function from the package microbiome to reduce the number of taxa to the top 10. The rest is lumped into the category "other"
    rarefy_even_depth(rngseed = 123) %>% 
    plot_bar(x="Sample", fill=taxonomic_level) +
    facet_grid(~ country, scales = "free", space='free') + 
    scale_fill_brewer(palette="Set3") # change colors of bars

It is also recommended to use a different color palette than the standard one. In the plot above we have used a color palette from ColorBrewer with the function scale_fill_brewer(palette=“Set3”)

TASK: Try out other color palettes as well!

There are many more color palettes to choose from, with the following command you can see all the options directly:

Code
RColorBrewer::display.brewer.all()

TASK, if you have spare time: The color palettes shown can only be used with a maximum of 13 values shown. Find out if you can create your own color palette that you can use to visualize results with more values. Remember that your color palette must also be suitable for colorblind people.

Code
# Run your own code here:


TASK: Try with other levels!

If you do know which levels there are: check the columns of tax_table(ps)@.Data using the function colnames() It’s possible that we can actually identify differences between the groups also on genus level “by eye”. Additionally, check the internet, if there is a direct function that will return the taxonomic levels in your phyloseq object.

Code
#The options in our phyloseq objects are:
tax_table(subset.ps)@.Data %>% colnames()
# #Phyloseq function, returning the taxonomic ranks in a phyloseq object:
#rank_names(subset.ps)
Code
# Run your own code here:

But how do we know whether it is this significant?

Statistical differential abundance testing

There are many ways how to perform statistical testing with the abundance table and metadata. We can start with a simple Wilcoxon test. Later, we will also use some more advanced tools to identify significant differential abundant taxa in our dataset.

There are many specialized tools out there which you can use to perform differential abundance testing in microbiome data. For further reading I can recommend this Paper, which compares lots more tools than we can do today.

Data normalization & transformation

First of all, we should make sure that our phyloseq object is clean. If several rows suddenly describe the same species during preprocessing, we combine them in the following step:

Code
glom.ps <- tax_glom(subset.ps, "Species") # Agglomerate taxa of the same type, this might take a moment

We also set new taxa names:

Code
full_tax_names <- apply(tax_table(glom.ps), 1, function(x) paste(x, collapse = ";"))
head(full_tax_names)
taxa_names(glom.ps) <- full_tax_names

Then we will take another look at the newly created phyloseq object:

Code
glom.ps

We also look at how many taxa we have when we aggretgate our phyloseq object to our chosen taxonomic level:

Code
glom.ps %>% aggregate_taxa(level = taxonomic_level) 

Depending on the level selected, we will certainly have a number of taxa that are very rare. These can subsequently be disruptive, as we can only perform meaningful statistics to a limited extent in rare taxa and there is always the possibility that these could also be artifacts. For the sake of simplicity, we will remove these from our dataset using the following functions. In this case, our filter is that we want to have at least 5 counts in at least 20 percent of the available samples. But this is of course highly customizable to the given study design and dataset.

Code
filter <- glom.ps %>% 
                 aggregate_taxa(level = taxonomic_level) %>% 
                 phyloseq::genefilter_sample(., filterfun_sample(function(x) x >= 5), 
                                       A = 0.2*nsamples(glom.ps))
ps_filtered <- prune_taxa(filter, glom.ps %>% aggregate_taxa(level = taxonomic_level))
ps_filtered

TASK Could you think of a more suitable filter, as the individual samples in our dataset might have different sums of abundance counts?

In our case, we are currently using the non-rarefied data set. All samples have a different number of counts. To take this into account, we would have to transform our data so that the sample counts are relative. One function to do this would be the function ‘transform_sample_counts’. The filter can then also be applied to the absolute count data.

Code
better_filter <- glom.ps %>% 
    aggregate_taxa(level = taxonomic_level) %>% 
    transform_sample_counts(., function(x) x / sum(x) ) %>%
    phyloseq::genefilter_sample(., filterfun_sample(function(x) x >= 5), 
                                       A = 0.2*nsamples(glom.ps))

ps_filtered <- prune_taxa(better_filter, glom.ps %>% aggregate_taxa(level = taxonomic_level))

CLR-Transformation

For some steps, in this case simple statistical tests, its useful not to work with the raw counts itself. In these cases we apply a transformation to reduce outlier spreading as well as remove variance introduced in the data collection and measurement process (Batch effects). We apply centered log-ratio transformation (CLR) to our data. With that we adress the fact, that 16S data quantify relative, rather than absolute, abundance of the microbiome.

The CLR transformation has two useful features: First it normalizes compositional data so that samples representing the same relative abundances are equated. Secondly it converts multiplicative relationships between the relative abundances into linear relationships – a feature which allows statistical models to represent fold-differences. This is especially useful when applying maschine learning methods, but this would go beyond the scope of this workshop.

Code
# How our counttable looks currently:
ps_filtered %>% 
    otu_table() %>% 
    head()

# Apply a transformation on the counttable, in this case centered log ratio transformation:
transf.ps <- ps_filtered %>% 
                microbiome::transform(., transform='clr') %>% 
                otu_table()

# How our counttable looks after applied transformation:
transf.ps %>% 
    otu_table() %>%
    head()

If we plot the distributions in the raw state compared to the CLR transformation, we can see how outliers are now less far from the rest of the distribution.

Don’t be afraid of this complicated looking cell below, there are probably easier ways how to quickly inspect the differences, but this one here works.

Code
taxa_to_inspect <- rownames(ps_filtered %>% otu_table())[3] # Select a single taxa

ps_filtered %>% 
  otu_table() %>% # extract otu table
  as.data.frame() %>% # make it a data frame
  rownames_to_column("taxa") %>% # rownames are added as column named taxa
  pivot_longer(cols = c(-taxa), names_to = "sample_id") %>% # make the table to a long three column table, containing the columns taxa, sample_id  and value
  mutate(type="rawreads") %>% # add a column labeling all rows as "rawreads"
  rbind( transf.ps %>%  # bind to that the clr transformed dataset, we repeat all the previous functions on this, too
           otu_table() %>% 
           as.data.frame() %>%
           rownames_to_column("taxa") %>%
           pivot_longer(cols = c(-taxa), names_to = "sample_id") %>%
           mutate(type="clr") ) %>% # add a column labeling all rows of this dataset as "clr"
  filter(taxa == taxa_to_inspect) %>% # we filter the data to only contain our chosen taxa
  ggplot() +
  geom_histogram(aes(x = value), bins = 50) + #plot a histogram
  ylab(paste0("Counts for ",taxa_to_inspect ))+
  facet_grid( ~type, scales="free") + # split the plot by the column type
  theme(text = element_text(size=20)) # increase textsize

TASK Try to plot some other taxa by changing the number in the first line!

Wilcoxon’s Test

The transformed data is now well suited for statistical tests. We assume that our abundance counts are not normally distributed, so we use the non-parametric Wilcoxon test instead of the popular t-test.

We are working outside the phyloseqs object here, so we create a table that contains both counts and our phenotype to be tested.

Code
transf.df <- transf.ps %>% 
    otu_table() %>% 
    t() %>% # transpose the otu_table
    as.data.frame()

Add the metadata column to the table for which to test

Code
tansf.df <- transf.df %>%
                mutate(grouping=sample_data(ps_filtered)$country)

tansf.df %>% head()
Code
# Retrieve countries
chosen_countries <- tansf.df$grouping %>% unique()
chosen_countries

#apply wilcox test to clr transformed table

wilcox_clr <- tansf.df %>%
    rownames_to_column("sample_id")%>%
    pivot_longer(cols = c(-sample_id,-grouping), names_to = "taxa", values_to = "abundance") %>%
    group_by(taxa) %>%
    summarize(
        p_value = wilcox.test(abundance[grouping == chosen_countries[1]], 
                              abundance[grouping == chosen_countries[2]])$p.value,
        countryone_mean=mean(abundance[grouping == chosen_countries[1]]),
        countrytwo_mean=mean(abundance[grouping == chosen_countries[2]])
    ) %>%
  # Adjust p-values with Bonferroni correction
  mutate(p_adj = p.adjust(p_value, method = "bonferroni"), .before = countryone_mean)

wilcox_clr %>% 
    arrange(p_adj)

TASK What would you need to change to test other metadata features? Use lifestyle as phenotype.

Code
tansf.df <- transf.df %>%
                mutate(grouping=sample_data(ps_filtered)$lifestyle)

TASK Why do we need to apply p-value adjustment to our results? Search the internet if you don’t know.

DESeq2

We will use DESeq2 package to apply a generalized linear model with negative binomial distribution to the bacterial abundances - in our default case here, on the Phylum level. Within DESeq2, we will apply Wald test to see whether the abundance of taxa differs between the groups. DESeq2 includes an internal calculation for library size to account for different sequencing depths and also performs P value adjustments for multiple tests. This means that all we have to do is use the raw abundance dataset with the metadata as input. There is no need for normalization and transformation here.

The theory behind DESeq2 is quite complex. If you want to dive deeper, have a look into the DESeq2 Vignette. As DESeq2 is a tool developed for RNA-seq the explanation uses the vocabulary of gene expression: In short, DESeq2 uses a generalized linear model with negative binomial distribution to model the counts. For this the counts are normalized based on a per gene (for our case: taxa) normalization factor and dispersion. The model coefficients are estimated for each sample group along with their standard error and result in log2 foldchanges for each gene for each sample group. Then the Wald test will be used. The null hypthesis for every gene is that the expression foldchanges between two groups is equal to 0. The Wald test uses the log2foldchanges and divides those by their standard errors. This results in z-statistics. Next, the z-statistic is compared to a standard normal distribution, and a p-value is computed reporting the probability that a z-statistic at least as extreme as the observed value would be found at random. If the returned p-value is small we can reject the null hypothesis, which means that we see differential gene expression (differential abundance) in our data.

Multiple testing correction will automatically be performed by DESeq2. We already used Bonferroni correction, where we adjust the p-values with: > p-adj = p-value * m (total number of tests)

It is very conversative, which means we report highly likely false negatives. But on the other hand, it’s very easy to calculate.

DESeq2 uses by default Benjamini-Hochberg or False Discovery Rate. As it is named, it tries to control the rate of type I errors in the null hypothesis. The p-values are collected, sorted in order from smallest du largest, so every p-value has a defined rank. It then calculates:
> p-adj = (m / rank of p-value) * p-value

This means, that with a selected FDR threshold of usually q=p-adj=0.05, our reported results are expected to contain 5% false positives.

Now to the practical part:

First, we need to format the data by combining all ASV counts into the chosen taxonomic level.

Code
ps.to.dseq <-
  ps_filtered %>%
  aggregate_taxa(level = taxonomic_level)

Now, let us do the DESeq2 routine. Luckily the phyloseq package contains functions to create the necessary DESeq object directly from the phyloseq object.

Code
# Create DESeq2 object from 
dseq <-
  ps.to.dseq %>% 
  phyloseq_to_deseq2(design = ~ sex + lifestyle)
# Perform test. There is a lot going under the hood here, including: estimation of size factors, estimation of dispersion, and Negative Binomial GLM fitting and Wald statistics.
res <-
  DESeq(dseq)

With DESeq2, it should be said that the design is only tested for the last variable. All the variables mentioned before in the design formula are used as covariates. The variable to be tested must be categorical. In R this is referred to as a factor. The last level as a factor is compared with the reference level by default. For other comparisons, the contrast must be specified in the “contrast” parameter of the “result” function.

Code
# Extract the result table
res.df <-
  res %>% 
  results(tidy = T, contrast = c("lifestyle","lifestyle_non_industrialized","lifestyle_industrialized"))
#Visualize what we got out of it
res.df %>% head()

That’s it! You can have a look at the res.df table and you will find the results of all the taxa tested. Depending on your question, you can perform similar analysis to all levels, from phylum to ASV (species level).

Code
# Filter and format to plot
res.df.to.plot <-
  res.df %>% 
  filter(padj < 0.05) %>% # keep only results with adjusted P value less than 0.05
  mutate(Taxa = row) %>% # Create Taxa column
  #left_join(tax_table(ps.to.dseq )@.Data %>% data.frame()), by = "Taxa")# %>% # Add taxonomy information from the phyloseq object.
  ## Arrange the data for a prettier plot
  arrange(log2FoldChange) %>% 
  mutate(Taxa = factor(Taxa, levels = Taxa %>% unique()))
head(res.df.to.plot)

We will later format this table and visualize the data using ggplot2. But before we do that, let’s take a look at another tool.

TASK What do you need to change to test for significant abundance changes between nationalities?

MaAsLin2

MaAsLin2 is the second generation of MaAsLin (Microbiome Multivariable Association with Linear Models). It relies on general linear models to handle most modern epidemiological study designs.

A simple linear regression looks like this: > y = βX + ε

Where y is the observed value, X the design matrix, β the regression coefficients and ε the error term. While fitting a regression, β is estimated in a way that ε is minimized.

A linear mixed model is slightly more complicated and looks like this:

y = βX + Zu + ε

β is a vector for fixed effects and X the associated matrix with fixed effects, while Z is a matrix for random effects with u being a vector for the random effects. The sum of the vector u equals 0.

So what are fixed and random effects? As fixed effects we describe our factors that are of our primary interest, as we want to test them, whether they have an effect on the abundance of the microbiome. A random effect is an (unmeasured) effect that introduces statistical variability to our model, but we aknowledge their presence and want to account for this. In longitudinal study designs, where you have multiple samples from the sampe participant you can account for the variability of the microbiome within the same study participant with a random effect.

Unlike DESeq2, there is no wrapper function to use the phyloseq object directly as an input. But it’s not hard to put our abdundance table and metadata into the right format.

To prepare the inputs for MaAsLin2, we just have to prepare the abundance table in a chosen taxonomic level and the metadata dataframe. MaAsLin2 will take of transformation and filtering.

Code
ps.to.maaslin <- ps_filtered %>%
  aggregate_taxa(level = taxonomic_level)

df_input_data <-
    ps.to.maaslin %>% 
    otu_table() %>% 
    as("matrix") %>% 
    round(digits = 0)

df_input_data[1:5, 1:5]

df_input_metadata <- 
    ps.to.maaslin %>% 
    sample_data() %>%
    data.frame()

df_input_metadata[1:5, ]

In the parameter fixed_effects you can pass all variables you want to test. The tool is ordering the categories in alphabetical order, where the first category is treated as the reference, a second option is to make it a factor with specified level order. We are using a third option here and declare the reference directly in the parameter ‘reference’. Covariates can be passed along to the parameter ‘random_effects’ that you want to control for.

Code
fit_data = Maaslin2(
    input_data = df_input_data, 
    input_metadata = df_input_metadata, 
    output = "../maaslin2_output", 
    fixed_effects = c("lifestyle"), # The phenotypes that you want to analyse
    random_effects = c(), # Include random effects that you want to adjust for, e.g. sample IDs when you have longitudinal study
    transform = "NONE", # Possible transformation choices:  LOG, LOGIT, AST, NONE 
    normalization= 'TSS', # Choices: TSS, CLR, CSS, NONE, TMM
    reference = 'lifestyle,lifestyle_non_industrialized',
    standardize = FALSE,
    plot_heatmap = F, 
    plot_scatter = T
    )
Code
fit_data$results  <- fit_data$results %>% mutate(feature = gsub("^X.","",feature)) # Remove some artifacts from the column feature
fit_data$results %>% head()

The column qval is synonymous with p_adj. It is the output column for multiple testing corrected p-values.

Plot Results

Let’s move on to the part where we want to visualize our tabular results.

Wilcoxon

The Wilcoxon test provides us with results comparing two groups with each other. It is therefore easy to plot the abundance of both groups and then use the p-value to make a statement about the significance of the differential abundance.

Code
options(repr.plot.width = 12, repr.plot.height = 12, repr.plot.res = 100) #Set the plot size

mostsign_clr <- wilcox_clr %>%
                    arrange(p_adj) %>%
                    pull(taxa) %>% # Store the name of most significant taxa
                    .[1] # Take the first value
mostsign_clr


ps_filtered %>% 
    otu_table() %>% 
    t() %>% # transpose the otu_table
    as.data.frame() %>%
    mutate(grouping=sample_data(ps_filtered)$country) %>% #add country column to our dataframe
    select(counts=matches(mostsign_clr), "grouping") %>% #replace mostsign_rare with a taxa you would like to plot
    ggplot(.,aes(x=grouping, y=counts, fill=grouping))+
    geom_boxplot()+ #make a simple boxplot
    geom_jitter()+ #add jittered points to the plots
    #stat_compare_means(method = "wilcox.test")+
    theme(text = element_text(size=28))+ # increase text size
    ylab(paste0("Abundance of ", mostsign_clr )) + #Add y axis label
    xlab(element_blank())+ #Keep x axis label blank
    annotate("label", x=Inf, y=Inf, vjust=1, hjust=1, size=7,
              label=paste("Wilcoxon p-adj=",format(wilcox_clr[wilcox_clr$taxa==mostsign_clr,]$p_adj, scientific=T)))

DESeq2

For deseq2, it makes sense to plot the foldchange together with basemean, as we have obtained these values directly in our results table. Significance can then be made visible in the plot using color.

Code
options(repr.plot.width = 12, repr.plot.height = 8, repr.plot.res = 100) #Set the plot size

#Plot
ggplot(res.df.to.plot, aes(x = log2FoldChange, y = Taxa, col = padj<=0.01)) +
  geom_point(aes(size = baseMean))  +
  geom_vline(xintercept = 0) +
  theme(text = element_text(size=18))

However, there are other ways to visualize such a foldchange in the RNAseq field. One method is the volcano plot. When many variables are tested, the shape of a volcano is usually created by this visualisation style, hence the name of the plot. The plot shows the foldchange on one axis and the p-value on the other. DESeq2 results can also be visualized like this quite easily.

Code
EnhancedVolcano::EnhancedVolcano(res.df,
    lab = res.df$row,
    x = 'log2FoldChange',
    y = 'pvalue',
    title = element_blank(),
    FCcutoff = 0.5)

MaAsLin2

Code
sig_res_fit <- fit_data$results %>% 
  filter(qval <= 0.25)  %>%
  mutate(feature=factor(feature, levels=feature))
sig_res_fit

Task All that’s missing now is some visualization for Maaslin2. You should now be capable of making your own ggplot. Can you think of the best way to display such results?

Are the plots really missing? Have a look in the folder ~/kmc_workshop/maaslin2_outputs to see if you can already find plots there.

Of course, you can also use a boxplot analogous to Wilcoxon’s visualization, in which we display the statistical values

Code
mostsign_maaslin <- sig_res_fit %>%
                    arrange(qval) %>%
                    pull(feature) %>% # Store the name of most significant taxa
                    .[1] %>% # Take the first value
                    as.character()
mostsign_maaslin


ps_filtered %>% 
    otu_table() %>% 
    t() %>% # transpose the otu_table
    as.data.frame() %>%
    mutate(grouping=sample_data(ps_filtered)$lifestyle) %>% #add feature column to our dataframe
    select(counts=matches(mostsign_maaslin %>% as.character()), "grouping") %>% #replace mostsign_rare with a taxa you would like to plot
    ggplot(.,aes(x=grouping, y=counts, fill=grouping))+
    geom_boxplot()+ #make a simple boxplot
    geom_jitter()+ #add jittered points to the plots
    #stat_compare_means(method = "wilcox.test")+
    theme(text = element_text(size=28))+ # increase text size
    ylab(paste0("Abundance of ", mostsign_maaslin )) + #Add y axis label
    xlab(element_blank())+ #Keep x axis label blank
    annotate("label", x=Inf, y=Inf, vjust=1, hjust=1, size=7,
              label=paste("qval=",format(sig_res_fit[sig_res_fit$feature==mostsign_maaslin,]$qval, scientific=T)))

Result Comparison between Tools

Finally, let’s compare the p-values of the three methods. To do this, we need to create a table in which we enter the adjusted p-values.

Code
pvalue_df <- wilcox_clr %>% select(taxa, wilcox_padj=p_adj) %>% #Wilcoxon table
    merge(.,
        res.df.to.plot %>% select(taxa=row, deseq_padj=padj), # DESeq2 Results table
          by="taxa", all.x=T, all.y=T) %>% 
    mutate(taxa=str_trim(taxa)) %>%
    merge(.,
        sig_res_fit %>% # Maaslin2 table
        select(taxa=feature, maaslin_padj=qval) %>% 
         mutate(taxa=as.character(taxa)), 
         by="taxa", all.x=T, all.y=T) %>% 
    replace(is.na(.), 1) # Fill NA values with 1
Code
pvalue_df %>% head()

We can now create two plots and simply plot the p-values.

Code
compare_plot_one <- ggplot(pvalue_df, aes(x = deseq_padj, y = wilcox_padj)) +
       labs(x = "DESeq2 adjusted p-value", y = "Wilcoxon test adjusted p-value") +
       geom_count() +
       scale_size_area(max_size = 10) +
       theme(text = element_text(size=18))

compare_plot_one
Code
compare_plot_two <- ggplot(pvalue_df, aes(x = deseq_padj, y = maaslin_padj)) +
       labs(x = "DESeq2 adjusted p-value", y = "MaAsLin2 adjusted p-value") +
       geom_count() +
       scale_size_area(max_size = 10) +
       theme(text = element_text(size=18))

compare_plot_two

TASK Could you improve the above plots by applying logarithmic scaling?

Code
compare_plot_two + scale_y_continuous(trans='log10') + scale_x_continuous(trans='log10')
Code
print(paste0("Wilcoxon test p-values under 0.05: ", sum(pvalue_df$wilcox_padj<0.05, na.rm = TRUE), "/", length(pvalue_df$wilcox_padj)))
print(paste0("DESeq2 p-values under 0.05: ", sum(pvalue_df$deseq_padj<0.05, na.rm = TRUE), "/", length(pvalue_df$deseq_padj)))
print(paste0("MaAsLin2 p-values under 0.05: ", sum(pvalue_df$maaslin_padj<0.05, na.rm = TRUE), "/", length(pvalue_df$maaslin_padj)))

TASK Which method provides the most significant taxa and which tool would you prefer?

TASK for when you have time: Can you perform the Wilcoxon test on the rarefied count data similar as for the CLR transformation?

Code
#rare_members(subset.ps)
wilcox_rare <- ps_filtered %>%
    rarefy_even_depth(rngseed = 123) %>% # Perform rarefication
    otu_table() %>% 
    t() %>% # transpose the otu_table
    as.data.frame() %>%
    mutate(grouping=sample_data(ps_filtered)$country)    %>%
    rownames_to_column("sample_id") %>%
    pivot_longer(cols = c(-sample_id,-grouping), names_to = "taxa", values_to = "abundance") %>%
    group_by(taxa) %>%
    summarize(
        p_value = wilcox.test(abundance[grouping == chosen_countries[1]], abundance[grouping == chosen_countries[2]])$p.value,
        countryone_mean=(mean(abundance[grouping == chosen_countries[1]])),
        countrytwo_mean=(mean(abundance[grouping == chosen_countries[2]]))
    ) %>%
    # Adjust p-values with Bonferroni correction
    mutate(p_adj = p.adjust(p_value, method = "bonferroni"), .before = countryone_mean) %>%
    arrange(p_adj)

wilcox_rare

TASK for when you have time: Can you plot the Wilcoxon test results based on rarefied count data?

Code
options(repr.plot.width = 12, repr.plot.height = 8, repr.plot.res = 100) #Set the plot size

mostsign_rare <- wilcox_clr %>%
                    arrange(p_adj) %>%
                    pull(taxa) %>% # Store the name of most significant taxa
                    .[1] # Take the first value
mostsign_rare

tansf.df %>% 
    select(counts=matches(mostsign_clr), "grouping") %>% #replace mostsign_clr with a taxa you would like to plot
    ggplot(.,aes(x=grouping, y=counts))+
    geom_boxplot()+ #make a simple boxplot
    geom_jitter()+ #add jittered points to the plots
    #stat_compare_means(method = "wilcox.test")+
    theme(text = element_text(size=28))+
    ylab(paste0("Abundance of ", mostsign_clr )) +
    xlab(element_blank())

What difference can you observe between the different statistical results based on the rarefied and CLR transformed data?