Chapter 3 Basic class

The microtable class is the basic class. All the other classes depend on the microtable class.

The objects inside the rectangle with full line represent functions. The red rectangle means it is extremely important function. The dashed line denotes the key objects (input or output of functions) that deserve more attention.

3.1 microtable class

　Many tools can be used for the bioinformatic analysis of amplicon sequencing data, such as QIIME (Caporaso et al. 2010), QIIME2 (Bolyen et al. 2019), usearch (https://www.drive5.com/usearch/), mothur (Schloss et al. 2009), SILVAngs (https://ngs.arb-silva.de/silvangs/), LotuS2 (Ozkurt et al. 2022), and RDP (http://rdp.cme.msu.edu/). Although the formats of result files may vary across tools, the main contents can be generally classified into the following parts: (1) OTU/ASV table, i.e. the feature-sample abundance table; (2) taxonomic assignment table; (3) representative sequences; (4) phylogenetic tree; (5) metadata. It is generally useful to create a detailed sample metadata table to store all the sample information (including the environmental data).

　The microtable class is the basic class and designed to store the basic data for all the downstream analysis in the microeco package. At least, the OTU table (i.e. feature-sample abundance table) should be provided to create microtable object. Thus, the microtable class can determine that the sample information table is missing and create a default sample table according to sample names in otu_table. To make the file input more convenient, we also build another R package file2meco (https://github.com/ChiLiubio/file2meco) to read the output files of some tools into microtable object. Currently, those tools/softwares include not only commonly-used QIIME (Caporaso et al. 2010) and QIIME2(Bolyen et al. 2019), but also several metagenomic tools, such as HUMAnN (Franzosa et al. 2018) and kraken2 (Wood, Lu, and Langmead 2019). In this tutorial, the data inside the package was employed to show some basic operations.

3.1.1 Prepare the example data

　The example data inside the microeco package is used to show the main part of the tutorial. This dataset arose from 16S rRNA gene Miseq sequencing results of wetland soils in China published by An et al. (An et al. 2019), who surveyed soil prokaryotic communities in Chinese inland wetlands (IW), coastal wetland (CW) and Tibet plateau wetlands (TW) using amplicon sequencing. These wetlands include both saline and non-saline samples (classified for the tutorial). The sample information table has 4 columns: “SampleID”, “Group”, “Type” and “Saline”. The column “SampleID” is same with the rownames. The column “Group” represents the IW, CW and TW. The column “Type” means the sampling region: northeastern region (NE), northwest region (NW), North China area (NC), middle-lower reaches of the Yangtze River (YML), southern coastal area (SC), upper reaches of the Yangtze River (YU), Qinghai-Tibet Plateau (QTP). The column “Saline” denotes the saline soils and non-saline soils. In this dataset, the environmental factor table is separated from the sample information table. It is also recommended to put all the environmental data into sample information table.

library(microeco)
# load the example data; 16S rRNA gene amplicon sequencing data
# metadata table; data.frame
data(sample_info_16S)
# feature table; data.frame
data(otu_table_16S)
# taxonomic assignment table; data.frame
data(taxonomy_table_16S)
# phylogenetic tree; not necessary; use for the phylogenetic analysis
# Newick format; use read.tree function of ape package to read a tree
data(phylo_tree_16S)
# load the environmental data table if it is not in sample table
data(env_data_16S)
# use pipe operator in magrittr package
library(magrittr)
# fix the random number generation to make the results repeatable
set.seed(123)
# make the plotting background same with the tutorial
library(ggplot2)
theme_set(theme_bw())

Make sure that the data types of sample_table, otu_table and tax_table are all data.frame format as the following part shows.

class(otu_table_16S)

## [1] "data.frame"

otu_table_16S[1:5, 1:5]

	S1	S2	S3	S4	S5
OTU_4272	1	0	1	1	0
OTU_236	1	4	0	2	35
OTU_399	9	2	2	4	4
OTU_1556	5	18	7	3	2
OTU_32	83	9	19	8	102

class(taxonomy_table_16S)

## [1] "data.frame"

taxonomy_table_16S[1:5, 1:3]

	Kingdom	Phylum	Class
OTU_4272	k__Bacteria	p__Firmicutes	c__Bacilli
OTU_236	k__Bacteria	p__Chloroflexi	c__
OTU_399	k__Bacteria	p__Proteobacteria	c__Betaproteobacteria
OTU_1556	k__Bacteria	p__Acidobacteria	c__Acidobacteria
OTU_32	k__Archaea	p__Miscellaneous Crenarchaeotic Group	c__

Generally, users’ taxonomic table has some messy information, such as NA, unidentified and unknown. These information can potentially influence the following taxonomic abundance calculation and other taxonomy-based analysis. So it is usually necessary to clean this data using the tidy_taxonomy function. Another very important result of this operation is to unify the taxonomic prefix automatically, e.g., converting D_1__ to p__ for Phylum level or adding p__ to Phylum directly if no prefix is found.

# make the taxonomic information unified, very important
taxonomy_table_16S %<>% tidy_taxonomy

The rownames of sample_table in microtable object (i.e. sample names) are used for selecting samples/groups in all the related operations in the package. Using pure number as sample names is not recommended in case of unknown disorder or man-made mistake. Before creating microtable object, make sure that the rownames of sample information table are sample names.

class(sample_info_16S)

## [1] "data.frame"

sample_info_16S[1:5, ]

	SampleID	Group	Type	Saline
S1	S1	IW	NE	Non-saline soil
S2	S2	IW	NE	Non-saline soil
S3	S3	IW	NE	Non-saline soil
S4	S4	IW	NE	Non-saline soil
S5	S5	IW	NE	Non-saline soil

In this example, the environmental data is stored in the env_data_16S alone. The user can also directly integrate those data into the sample information table.

class(env_data_16S)

## [1] "data.frame"

	Latitude	Longitude	Altitude	Temperature	Precipitation
S1	52.96	122.6	432	-4.2	445
S2	52.95	122.6	445	-4.3	449
S3	52.95	122.6	430	-4.3	449
S4	52.95	122.6	430	-4.3	449
S5	52.95	122.6	429	-4.3	449

class(phylo_tree_16S)

## [1] "phylo"

Then, we create an object of microtable class. This operation is very similar with the package phyloseq(Mcmurdie and Holmes 2013), but in microeco it is more brief. The otu_table in the microtable class must be the feature-sample format: rownames - OTU/ASV/pathway/other names; colnames - sample names. The colnames in otu_table must have overlap with rownames of sample_table. Otherwise, the following check can filter all the samples of otu_table because of no same sample names between otu_table and sample_table.

# In R6 class, '$new' is the original method used to create a new object of class
# If you only provide abundance table, the class can help you create a sample info table
mt <- microtable$new(otu_table = otu_table_16S)

## No sample_table provided, automatically use colnames in otu_table to create one ...

class(mt)

## [1] "microtable" "R6"

# generally add the metadata
mt <- microtable$new(otu_table = otu_table_16S, sample_table = sample_info_16S)
mt

## microtable-class object:
## sample_table have 90 rows and 4 columns
## otu_table have 13628 rows and 90 columns

# Let's create a microtable object with more information
mt <- microtable$new(sample_table = sample_info_16S, otu_table = otu_table_16S, tax_table = taxonomy_table_16S, phylo_tree = phylo_tree_16S)
mt

## microtable-class object:
## sample_table have 90 rows and 4 columns
## otu_table have 13628 rows and 90 columns
## tax_table have 13628 rows and 7 columns
## phylo_tree have 14096 tips

To fully back up this original dataset (R6 class), the clone function should be used instead of direct assignment (https://chiliubio.github.io/microeco_tutorial/notes.html#clone-function). To further save the dataset to a local computer file, the save function should be used (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function).

mt_raw <- clone(mt)

3.1.2 How to read your files to microtable object?

The above-mentioned example data are directly loaded from microeco package. So the question is how to read your data to create a microtable object?
There are two ways:

▲ 1. Use file2meco package
R package file2meco (https://chiliubio.github.io/microeco_tutorial/file2meco-package.html) is designed to directly read the output files of some famous tools into microtable object. Currently, it supports QIIME (Caporaso et al. 2010), QIIME2(Bolyen et al. 2019), HUMAnN (Franzosa et al. 2018), MetaPhlAn (Blanco-Míguez et al. 2023), kraken2 (Wood, Lu, and Langmead 2019), phyloseq (Mcmurdie and Holmes 2013), etc. Please read the tutorial of file2meco package for more detailed information (https://chiliubio.github.io/microeco_tutorial/file2meco-package.html).

▲ 2. Other cases
To transform customized files to microtable object, there should be two steps:
I) read files to R
The required format of microtable$new parameters, otu_table, sample_table and tax_table, are all the data.frame, which is the most frequently-used data format in R. So no matter what the format the files are, they should be first read into R with some functions, such as read.table and read.csv. If the user want to perform phylogenetic analysis, please also read your phylogenetic tree using read.tree function of ape package and provide the tree to the phylo_tree parameter of microtable$new function like the above example.
II) create the microtable object
Then the user can create the microtable object like the operation in the last section. Please also see the help document of the microtable class for detailed descriptions using the following help command.

# search the class name, not the function name
?microtable
# then see microtable$new()

3.1.3 Functions in microtable class

Then, we remove OTUs which are not assigned in the Kingdom “k__Archaea” or “k__Bacteria”.

# use R subset function to filter taxa in tax_table
mt$tax_table %<>% base::subset(Kingdom == "k__Archaea" | Kingdom == "k__Bacteria")
# another way with grepl function
mt$tax_table %<>% .[grepl("Bacteria|Archaea", .$Kingdom), ]
mt

## microtable-class object:
## sample_table have 90 rows and 4 columns
## otu_table have 13628 rows and 90 columns
## tax_table have 13330 rows and 7 columns
## phylo_tree have 14096 tips

We also remove OTUs with the taxonomic assignments “mitochondria” or “chloroplast”.

# This will remove the lines containing the taxa word regardless of taxonomic ranks and ignoring word case in the tax_table.
# So if you want to filter some taxa not considerd pollutions, please use subset like the previous operation to filter tax_table.
mt$filter_pollution(taxa = c("mitochondria", "chloroplast"))

## Total 34 features are removed from tax_table ...

mt

## microtable-class object:
## sample_table have 90 rows and 4 columns
## otu_table have 13628 rows and 90 columns
## tax_table have 13296 rows and 7 columns
## phylo_tree have 14096 tips

To make the OTU and sample information consistent across all files in the object, we use function tidy_dataset to trim the data.

mt$tidy_dataset()
mt

## microtable-class object:
## sample_table have 90 rows and 4 columns
## otu_table have 13296 rows and 90 columns
## tax_table have 13296 rows and 7 columns
## phylo_tree have 13296 tips

The function save_table can be performed to save all the basic data in microtable object to local files, including feature abundance, metadata, taxonomic table, phylogenetic tree and representative sequences.

mt$save_table(dirpath = "basic_files", sep = ",")

Then, let’s calculate the taxa abundance at each taxonomic rank using cal_abund(). This function generate a list called taxa_abund stored in the microtable object. This list contains several data frame of the abundance information at each taxonomic rank. It’s worth noting that the cal_abund() function can be used to solve more complicated cases with special parameters, such as supporting both the relative and absolute abundance calculation and selecting the partial ‘taxonomic’ columns. Those have been shown in file2meco package part (https://chiliubio.github.io/microeco_tutorial/file2meco-package.html#humann-metagenomic-results) with complex metagenomic data.

# default parameter (rel = TRUE) denotes relative abundance
mt$cal_abund()

## The result is stored in object$taxa_abund ...

# return taxa_abund list in the object
class(mt$taxa_abund)

## [1] "list"

# show part of the relative abundance at Phylum level
mt$taxa_abund$Phylum[1:5, 1:5]

	S1	S2	S3	S4	S5
k__Bacteria\|p__Proteobacteria	0.2005	0.2055	0.2115	0.2572	0.1683
k__Bacteria\|p__Chloroflexi	0.1208	0.1888	0.1668	0.1494	0.312
k__Bacteria\|p__Bacteroidetes	0.1814	0.03274	0.02704	0.02249	0.02859
k__Bacteria\|p__Acidobacteria	0.1193	0.2483	0.2497	0.2644	0.2462
k__Bacteria\|p__Actinobacteria	0.1222	0.08405	0.08869	0.08852	0.07963

The function save_abund() can be used to save the taxa abundance file to a local place easily.

mt$save_abund(dirpath = "taxa_abund")

All the abundance tables can also be merged into one to be saved. This type of file format can be opened directly by other software, such as STAMP.

# tab-delimited, i.e. mpa format
mt$save_abund(merge_all = TRUE, sep = "\t", quote = FALSE)
# remove those unclassified
mt$save_abund(merge_all = TRUE, sep = "\t", rm_un = TRUE, rm_pattern = "__$|Sedis$", quote = FALSE)

Sometimes, to reduce the impact of sequencing depth on diversity measurements, it is advisable to perform resampling to equalize the sequence numbers for each sample (McKnight et al. 2018). The function rarefy_samples can automatically invoke the function tidy_dataset before and after the rarefying process. The method = 'SRS' is available for performing rarefaction by scaling with ranked subsampling (Beule and Karlovsky 2020). The default choice is method = 'rarefy'. Note that the “rarefy” method in microtable object is a shortcut call from the “rarefy” method in the trans_norm class. For other normalization methods, please refer to the section of the trans_norm class.

# first clone the data
mt_rarefied <- clone(mt)
# use sample_sums to check the sequence numbers in each sample
mt_rarefied$sample_sums() %>% range

## [1] 10316 37087

# As an example, use 10000 sequences in each sample
mt_rarefied$rarefy_samples(sample.size = 10000)

## 530 features are removed because they are no longer present in any sample after random subsampling ...

## 530 taxa with 0 abundance are removed from the otu_table ...

mt_rarefied$sample_sums() %>% range

## [1] 10000 10000

Then, let’s calculate the alpha diversity. The result is also stored in the object microtable automatically.

# If you want to add Faith's phylogenetic diversity, use PD = TRUE, this will be a little slow
mt_rarefied$cal_alphadiv(PD = FALSE)

## The result is stored in object$alpha_diversity ...

# return alpha_diversity in the object
class(mt_rarefied$alpha_diversity)

## [1] "data.frame"

# save alpha_diversity to a directory
mt_rarefied$save_alphadiv(dirpath = "alpha_diversity")

Let’s go on to beta diversity with function cal_betadiv(). If method parameter is not provided, the function automatically calculates Bray-curtis, Jaccard, weighted Unifrac and unweighted unifrac matrixes (Lozupone and Knight 2005).

# unifrac = FALSE means do not calculate unifrac metric
# require GUniFrac package installed
mt_rarefied$cal_betadiv(unifrac = TRUE)
# return beta_diversity list in the object
class(mt_rarefied$beta_diversity)
# save beta_diversity to a directory
mt_rarefied$save_betadiv(dirpath = "beta_diversity")

3.1.4 merge taxa or samples

Merging taxa according to a specific taxonomic rank level of tax_table can generate a new microtable object. In the new microtable object, each feature in otu_table represents one taxon at the output level.

test <- mt$merge_taxa("Genus")
test

## microtable-class object:
## sample_table have 90 rows and 4 columns
## otu_table have 1267 rows and 90 columns
## tax_table have 1267 rows and 6 columns

Similarly, merging samples according to a specific group of sample_table can also generate a new microtable object.

test <- mt$merge_samples("Group")
test

## microtable-class object:
## sample_table have 3 rows and 1 columns
## otu_table have 13296 rows and 3 columns
## tax_table have 13296 rows and 7 columns
## phylo_tree have 13296 tips

3.1.5 subset of samples

We donnot provide a special function to filter samples in microtable object, as we think it is redundant. We recommend manipulating the sample_table in microtable object directly. For example, if the user want to extract samples, please do like this:

# remember first clone the whole dataset
# see https://chiliubio.github.io/microeco_tutorial/notes.html#clone-function
group_CW <- clone(mt)
# select 'CW'
group_CW$sample_table <- subset(group_CW$sample_table, Group == "CW")
# or: group_CW$sample_table <- subset(group_CW$sample_table, grepl("CW", Group))
# trim all the data
group_CW$tidy_dataset()

## 1945 taxa with 0 abundance are removed from the otu_table ...

group_CW

## microtable-class object:
## sample_table have 30 rows and 4 columns
## otu_table have 11351 rows and 30 columns
## tax_table have 11351 rows and 7 columns
## phylo_tree have 11351 tips
## Taxa abundance: calculated for Kingdom,Phylum,Class,Order,Family,Genus,Species

3.1.6 subset of taxa

Similar with above operation, subset of features can be achieved by manipulating the tax_table or otu_table in microtable object directly.

# extracting OTU data by manipulating tax_table
proteo <- clone(mt)
proteo$tax_table <- subset(proteo$tax_table, Phylum == "p__Proteobacteria")
# or: proteo$tax_table <- subset(proteo$tax_table, grepl("Proteobacteria", Phylum))
proteo$tidy_dataset()
proteo

## microtable-class object:
## sample_table have 90 rows and 4 columns
## otu_table have 3166 rows and 90 columns
## tax_table have 3166 rows and 7 columns
## phylo_tree have 3166 tips
## Taxa abundance: calculated for Kingdom,Phylum,Class,Order,Family,Genus,Species

# proteo is a new microtable object with all OTUs coming from phylum Proteobacteria
# beta diversity dissimilarity for Proteobacteria
proteo$cal_betadiv()

## The result is stored in object$beta_diversity ...

# extracting OTU data by manipulating otu_table
test <- clone(mt)
test$otu_table <- test$otu_table[c("OTU_32", "OTU_50", "OTU_1"), ]
test$tidy_dataset()
test

3.1.7 add ASV/OTU to tax_table

The function add_rownames2taxonomy can add the row names of tax_table as the last column of tax_table directly. This operation is very useful in some analysis, e.g., the visualization of the OTUs/ASVs with the relative abundance.

test <- clone(mt)
ncol(test$tax_table)
test$add_rownames2taxonomy(use_name = "OTU")
ncol(test$tax_table)

test <- clone(mt)
test$add_rownames2taxonomy("ASV")
# then we can calculate the relative abundance of ASV
# for all taxonomic levels like commonly operation
test$cal_abund()
View(test$taxa_abund$ASV)
# only ASV level
test$cal_abund(select_cols = "ASV")
View(test$taxa_abund$ASV)

3.1.8 filter the features with low abundance/occurrence frequency

The filter_taxa function can be applied to filter the features with low abundance or occurrence frequency when needed. For other operations on the features, please directly manipulate the otu_table of your microtable object.

# It is better to have a backup before filtering features
test <- clone(mt)
# In this example, the relative abundance threshold is 0.0001
# occurrence frequency 0.1; 10% samples should have the target features
test$filter_taxa(rel_abund = 0.0001, freq = 0.1)
test

3.1.9 Other examples

In microtable$new, if auto_tidy = TRUE, the function can automatically use tidy_dataset to make all files uniform. Then, all other functions in microtable will also do this. But if the user changes the file in microtable object, the class can not recognize this modification, the user should use tidy_dataset function to manually trim the microtable object.

test <- microtable$new(sample_table = sample_info_16S[1:40, ], otu_table = otu_table_16S, auto_tidy = FALSE)
test

## microtable-class object:
## sample_table have 40 rows and 4 columns
## otu_table have 13628 rows and 90 columns

test1 <- microtable$new(sample_table = sample_info_16S[1:40, ], otu_table = otu_table_16S, auto_tidy = TRUE)

## 881 taxa with 0 abundance are removed from the otu_table ...

test1

## microtable-class object:
## sample_table have 40 rows and 4 columns
## otu_table have 12747 rows and 40 columns

test1$sample_table %<>% .[1:10, ]
test1

## microtable-class object:
## sample_table have 10 rows and 4 columns
## otu_table have 12747 rows and 40 columns

test1$tidy_dataset()

## 3883 taxa with 0 abundance are removed from the otu_table ...

test1

## microtable-class object:
## sample_table have 10 rows and 4 columns
## otu_table have 8864 rows and 10 columns

The phylogenetic tree can be read with read.tree function in ape package.

# use the example data rep_phylo.tre in file2meco package https://chiliubio.github.io/microeco_tutorial/file2meco-package.html#qiime
phylo_file_path <- system.file("extdata", "rep_phylo.tre", package="file2meco")
tree <- ape::read.tree(phylo_file_path)

Other functions and examples are listed here.

# clone a complete dataset
test <- clone(mt)

# rename features in all the files of microtable object
test$rename_taxa(newname_prefix = "new_name_")
rownames(test$otu_table)[1:5]
rownames(test$tax_table)[1:5]

# sum the abundance for each taxon
test$taxa_sums()

# output sample names of microtable object
test$sample_names()[1:5]

# output taxa names of microtable object
test$taxa_names()[1:5]

3.1.10 Key points

sample_table: rownames of sample_table must be sample names used
otu_table: rownames must be feature names; colnames must be sample names
microtable class: creating microtable object requires at least one file input (otu_table)
tidy_taxonomy(): necessary to make taxonomic table have unified format
tidy_dataset(): necessary to trim files in microtable object
add_rownames2taxonomy(): add the rownames of tax_table as the last column of tax_table
cal_abund(): powerful and flexible to cope with complex cases in tax_table, see the parameters
taxa_abund: taxa_abund is a list stored in microtable object and have several data frame
beta_diversity: beta_diversity is a list stored in microtable object and have several distance matrix

3.2 trans_norm class

The ASV/OTU data generated by amplicon sequencing have several issues that need to be considered, including sequencing depth, compositional effect, and sparsity.
How to normalize the data usually depends on the following analysis content. The trans_norm class in the microeco package (>= 1.6.0) provides several data normalization approaches for the microtable object. Current available methods include rarefaction, centered log-ratio (CLR) (Aitchison 1986; Greenacre 2021), robust centered log-ratio (RCLR) (Martino et al. 2019), total sum scaling (TSS, also known as Proportion), median ratio of counts relative to geometric mean (DESeq2) (Michael I. Love, Huber, and Anders 2014b), trimmed mean of M-values (TMM) (Robinson and Oshlack 2010), relative log expression (RLE), cumulative sum scaling (CSS) (Paulson et al. 2013), geometric mean of pairwise ratios (GMPR) (Chen et al. 2018), and Wrench (Kumar et al. 2018). The rarefaction method used here is exactly the same as the rarefy_samples function in the previous microtable chapter, in which using resampling was to conveniently connect with the calculation of diversity. TSS/CLR/RCLR are also recommended in some beta diversity analysis by some researchers. CLR and RCLR are specifically designed for compositional effects. The relative abundance calculated using cal_abund as demonstrated in the previous chapter also falls under TSS. The relative abundance values (ranging from 0 to 1) are commonly used for visualization or differential abundance test at high taxonomic levels (e.g., Genus and Phylum). Other normalization methods are mainly developed for differential abundance analysis of ASVs/OTUs, considering compositional effects and sequencing depth issues. (Swift et al. 2022). The format of output is same with the input. For the details and references of the approaches, please refer to the help document of the class with the command ?trans_norm.

tmp <- trans_norm$new(dataset = mt)

# rarefaction
mt_rarefied <- tmp$norm(method = "rarefy", sample.size = 10000)

# Centered log-ratio normalization
mt_clr <- tmp$norm(method = "clr")
# Robust centered log-ratio normalization
mt_rclr <- tmp$norm(method = "rclr")

# Total sum scaling, dividing counts by the sequencing depth
mt_TSS <- tmp$norm(method = "TSS")

# Geometric mean of pairwise ratios
mt_GMPR <- tmp$norm(method = "GMPR")
# Cumulative sum scaling normalization
mt_CSS <- tmp$norm(method = "CSS")

References

Aitchison, J. 1986. The Statistical Analysis of Compositional Data. London: Chapman; Hall.

An, Jiaxing, Chi Liu, Qian Wang, Minjie Yao, Junpeng Rui, Shiheng Zhang, and Xiangzhen Li. 2019. “Soil Bacterial Community Structure in Chinese Wetlands.” Journal Article. Geoderma 337: 290–99.

Beule, Lukas, and Petr Karlovsky. 2020. “Improved normalization of species count data in ecology by scaling with ranked subsampling (SRS): application to microbial communities.” Journal Article. PeerJ 8: e9593. https://doi.org/10.7717/peerj.9593.

Blanco-Míguez, Aitor, Francesco Beghini, Fabio Cumbo, Lauren J. McIver, Kelsey N. Thompson, Moreno Zolfo, Paolo Manghi, et al. 2023. “Extending and Improving Metagenomic Taxonomic Profiling with Uncharacterized Species Using MetaPhlAn 4.” Journal Article. Nature Biotechnology 41 (11): 1633–44. https://doi.org/10.1038/s41587-023-01688-w.

Bolyen, E., J. R. Rideout, M. R. Dillon, N. A. Bokulich, C. C. Abnet, G. A. Al-Ghalith, H. Alexander, et al. 2019. “Reproducible, Interactive, Scalable and Extensible Microbiome Data Science Using QIIME 2.” Journal Article. Nat Biotechnol 37 (8): 852–57. https://doi.org/10.1038/s41587-019-0209-9.

Caporaso, J. Gregory, Justin Kuczynski, Jesse Stombaugh, Kyle Bittinger, Frederic D. Bushman, Elizabeth K. Costello, Noah Fierer, et al. 2010. “QIIME Allows Analysis of High-Throughput Community Sequencing Data.” Journal Article. Nature Methods 7 (5): 335–36. https://doi.org/http://www.nature.com/nmeth/journal/v7/n5/suppinfo/nmeth.f.303_S1.html.

Chen, Li, James Reeve, Lujun Zhang, Shengbing Huang, Xuefeng Wang, and Jun Chen. 2018. “GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data.” Journal Article. PeerJ 6: e4600. https://doi.org/10.7717/peerj.4600.

Franzosa, Eric A., Lauren J. McIver, Gholamali Rahnavard, Luke R. Thompson, Melanie Schirmer, George Weingart, Karen Schwarzberg Lipson, et al. 2018. “Species-Level Functional Profiling of Metagenomes and Metatranscriptomes.” Journal Article. Nature Methods 15 (11): 962–68. https://doi.org/10.1038/s41592-018-0176-y.

Greenacre, Michael. 2021. “Compositional Data Analysis.” Journal Article. Annual Review of Statistics and Its Application 8: 271–99. https://doi.org/10.1146/annurev-statistics-042720-124436.

Kumar, M. Senthil, Eric V. Slud, Kwame Okrah, Stephanie C. Hicks, Sridhar Hannenhalli, and Héctor Corrada Bravo. 2018. “Analysis and Correction of Compositional Bias in Sparse Sequencing Count Data.” Journal Article. BMC Genomics 19 (1): 799. https://doi.org/10.1186/s12864-018-5160-5.

Love, Michael I, Wolfgang Huber, and Simon Anders. 2014b. “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Journal Article. Genome Biology 15 (12): 550. https://doi.org/10.1186/s13059-014-0550-8.

Lozupone, C., and R. Knight. 2005. “UniFrac: a new phylogenetic method for comparing microbial communities.” Journal Article. Applied and Environmental Microbiology 71 (12): 8228–35. https://doi.org/10.1128/AEM.71.12.8228-8235.2005.

Martino, Cameron, James T. Morton, Clarisse A. Marotz, Luke R. Thompson, Anupriya Tripathi, Rob Knight, and Karsten Zengler. 2019. “A Novel Sparse Compositional Technique Reveals Microbial Perturbations.” Journal Article. mSystems 4 (1): e00016–19. https://doi.org/10.1128/msystems.00016-19.

McKnight, Donald T., Roger Huerlimann, Deborah S. Bower, Lin Schwarzkopf, Ross A. Alford, Kyall R. Zenger, and Simon Jarman. 2018. “Methods for Normalizing Microbiome Data: An Ecological Perspective.” Journal Article. Methods in Ecology and Evolution 10 (3): 389–400. https://doi.org/10.1111/2041-210x.13115.

Mcmurdie, Paul J, and Susan Holmes. 2013. “Phyloseq: An r Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” Journal Article. Plos One 8 (4): e61217.

Ozkurt, E., J. Fritscher, N. Soranzo, D. Y. K. Ng, R. P. Davey, M. Bahram, and F. Hildebrand. 2022. “LotuS2: An Ultrafast and Highly Accurate Tool for Amplicon Sequencing Analysis.” Journal Article. Microbiome 10 (1): 176. https://doi.org/10.1186/s40168-022-01365-1.

Paulson, Joseph N, O Colin Stine, Héctor Corrada Bravo, and Mihai Pop. 2013. “Differential Abundance Analysis for Microbial Marker-Gene Surveys.” Journal Article. Nature Methods 10 (12): 1200–1202.

Robinson, M, and A Oshlack. 2010. “A scaling normalization method for differential expression analysis of RNA-seq data.” Journal Article. Genome Biology 11 (3): R25. https://doi.org/10.1186/gb-2010-11-3-r25.

Schloss, P. D., S. L. Westcott, T. Ryabin, J. R. Hall, M. Hartmann, E. B. Hollister, R. A. Lesniewski, et al. 2009. “Introducing Mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities.” Journal Article. Applied and Environmental Microbiology 75 (23): 7537–41. https://doi.org/10.1128/aem.01541-09.

Swift, Dionne, Kellen Cresswell, Robert Johnson, Spiro Stilianoudakis, and Xingtao Wei. 2022. “A Review of Normalization and Differential Abundance Methods for Microbiome Counts Data.” Journal Article. WIREs Computational Statistics 15 (1): e1586. https://doi.org/10.1002/wics.1586.

Wood, D. E., J. Lu, and B. Langmead. 2019. “Improved Metagenomic Analysis with Kraken 2.” Journal Article. Genome Biology 20 (1): 257. https://doi.org/10.1186/s13059-019-1891-0.