Chapter 8 file2meco package
In the microtable class part, we showed the basic way about how to create microtable object with the example data.
Actually, constructing the microtable object from other tools/platforms (e.g., QIIME, QIIME2, HUMAnN, Kraken2 and phyloseq)
can be easily achieved with the package file2meco (https://github.com/ChiLiubio/file2meco).
The idea of creating file2meco package comes from a study involved in complex metagenomic analysis (C. Liu et al. 2022).
Note that the sample_table
parameter in each function of file2meco package supports various metadata input format, including
1) comma seperated file with the suffix .csv or tab seperated file with the suffix .tsv or .txt;
2) Excel file with the suffix .xlsx or .xls;
3) data.frame object in R session.
# install file2meco package (>= 0.9.0)
if(!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
if(!require("file2meco")) install.packages("file2meco", repos = BiocManager::repositories())
▲ Trouble shooting:
• The files in the following examples all come from the package file2meco and are found by the function system.file
automatically irrespective of Operating System.
When the user imports a file, please donot use this function.
The first thing should be to make sure that R can find your input file.
The user should either provide a full path to the import function or only the file name after putting the file into the R working directory.
For the latter one, if the user does not know where the working directory is,
please use the function getwd
to find the working directory or directly create a new project in the target directory with RStudio [File –> New Project].
8.1 QIIME
The qiime1meco() function can be used to construct the microtable object using the raw OTU file from QIIME 1 (Caporaso et al. 2010).
library(file2meco)
# see the help document
?qiime1meco# Let's run the examples
# use the raw data files stored inside the package
<- system.file("extdata", "otu_table_raw.txt", package="file2meco")
otu_file_path # csv file of metadata
<- system.file("extdata", "sample_info.csv", package="file2meco")
sample_file_path <- system.file("extdata", "rep_phylo.tre", package="file2meco")
phylo_file_path # if you want to use Tax4Fun2 approach, you need read the representative sequences and add it to the microtable object.
<- system.file("extdata", "rep.fna", package="file2meco")
rep_fasta_path # contruct microtable object
qiime1meco(otu_file_path)
qiime1meco(otu_file_path, sample_table = sample_file_path)
qiime1meco(otu_file_path, sample_table = sample_file_path, phylo_tree = phylo_file_path)
qiime1meco(otu_file_path, sample_table = sample_file_path, phylo_tree = phylo_file_path, rep_fasta = rep_fasta_path)
8.2 QIIME2
The qiime2meco() function is designed to create the microtable object using files from QIIME2 (Bolyen et al. 2019). The example data is the ASV (amplicon sequence variant) abundance table based on DADA2 (Callahan et al. 2016).
library(file2meco)
?qiime2meco# use data files inside the package which were downloaded from (https://docs.qiime2.org/2022.2/tutorials/pd-mice/).
<- system.file("extdata", "dada2_table.qza", package="file2meco")
abund_file_path # tsv file of metadata
<- system.file("extdata", "sample-metadata.tsv", package="file2meco")
sample_file_path <- system.file("extdata", "taxonomy.qza", package="file2meco")
taxonomy_file_path # construct microtable object
qiime2meco(abund_file_path)
qiime2meco(abund_file_path, sample_table = sample_file_path, taxonomy_table = taxonomy_file_path)
# add phylogenetic tree and fasta for more demonstrations
# please download tree from https://docs.qiime2.org/2022.2/data/tutorials/pd-mice/tree.qza
# the file name is 'tree.qza'; put it into the R working directory
<- "tree.qza"
tree_data # please download fasta from https://docs.qiime2.org/2022.2/data/tutorials/pd-mice/dada2_rep_set.qza
# the file name is 'dada2_rep_set.qza'; put it into the R working directory
<- "dada2_rep_set.qza"
rep_data <- qiime2meco(abund_file_path, sample_table = sample_file_path, taxonomy_table = taxonomy_file_path, phylo_tree = tree_data, rep_fasta = rep_data, auto_tidy = TRUE)
test1 test1
8.3 HUMAnN
Many methods in microeco package can be used not only for the traditional species abundance data, i.e. species-sample table, but also for other data, such as metagenomic data. HUMAnN (Franzosa et al. 2018) is an excellent tool for functional profiling analysis of metagenomes and metatranscriptomes at species-level. The humann2meco() function can be used to create the microtable object using metagenomic analysis files from HUMAnN3 (https://huttenhower.sph.harvard.edu/humann). Certainly, it can also be used for the whole community profile of metabolic pathways when needed. Currently, it supports both the MetaCyc (https://metacyc.org/) and KEGG pathway abundance file input directly.
library(file2meco)
library(microeco)
library(magrittr)
?humann2meco<- system.file("extdata", "example_metagenome_sample_info.tsv", package="file2meco")
sample_file_path <- system.file("extdata", "example_metagenome_match_table.tsv", package="file2meco")
match_file_path
# MetaCyc pathway database based analysis
# use the raw data files stored inside the package for MetaCyc pathway database based analysis
<- system.file("extdata", "example_HUMAnN_MetaCyc_abund.tsv", package="file2meco")
abund_file_path # the default db is "MetaCyc"
humann2meco(abund_file_path, db = "MetaCyc")
humann2meco(abund_file_path, db = "MetaCyc", sample_table = sample_file_path, match_table = match_file_path)
# Let's try more interesting usages with microeco
<- humann2meco(abund_file_path, db = "MetaCyc", sample_table = sample_file_path, match_table = match_file_path)
test $tidy_dataset()
test# rel = FALSE sum original abundance instead of relative abundance
$cal_abund(select_cols = 1:3, rel = FALSE)
test$taxa_abund$Superclass1 %<>% .[!grepl("unclass", rownames(.)), ]
test# use_percentage = FALSE disable percentage for relative abundance
<- trans_abund$new(test, taxrank = "Superclass1", ntaxa = 10, use_percentage = FALSE)
test1 # reassign ylab title instead of default 'Relative Abundance'
$ylabname <- "Abundance (RPK)"
test1# bar_full = FALSE show original abundance instead of normalized 0-1
$plot_bar(facet = "Group", bar_full = FALSE)
test1# select both function and taxa
$cal_abund(select_cols = c("Superclass1", "Phylum", "Genus"), rel = TRUE)
test<- trans_abund$new(test, taxrank = "Phylum", ntaxa = 10, delete_taxonomy_lineage = FALSE)
test1 $plot_bar(facet = "Group")
test1# functional biomarker
$cal_abund(select_cols = 1:3, rel = TRUE)
test$taxa_abund$Superclass1 %<>% .[!grepl("unclass", rownames(.)), ]
test<- trans_diff$new(test, method = "lefse", group = "Group")
test1 $plot_diff_bar(use_number = 1:20)
test1# taxonomic biomarker
$cal_abund(select_cols = 4:9, rel = TRUE)
test$taxa_abund$Phylum %<>% .[!grepl("unclass", rownames(.)), ]
test# p_adjust_method = "none" shut down the p value adjustment
<- trans_diff$new(test, method = "lefse", group = "Group", p_adjust_method = "none")
test1 $plot_diff_bar(threshold = 2) test1
# use KEGG pathway based HUMAnN result
<- system.file("extdata", "example_HUMAnN_KEGG_abund.tsv", package="file2meco")
abund_file_path <- humann2meco(abund_file_path, db = "KEGG", sample_table = sample_file_path, match_table = match_file_path)
test $tax_table %<>% subset(Level.1 != "unclassified")
test$tidy_dataset() test
8.4 MetaPhlAn
MetaPhlAn is an software used for metagenomic taxonomic profiling (Blanco-Míguez et al. 2023). The format of MetaPhlAn classification results is usually called ‘mpa’ format. The mpa2meco function is developed for this format conversion to microtable object. See the following example of Kraken2 part.
8.5 Kraken2/Bracken
Kraken is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps k-mers to the lowest common ancestor (LCA) of all genomes known to contain a given k-mer. Kraken2 (Wood, Lu, and Langmead 2019) is the newest version. Bracken (Lu et al. 2017) can be applied to estimate species abundance following the Kraken analysis. The merged Kraken2/Bracken results can be obtained by merge_metaphlan_tables.py from MetaPhlAn or combine_mpa.py from KrakenTools (https://ccb.jhu.edu/software/krakentools/).
# use the raw data files inside the package
<- system.file("extdata", "example_kraken2_merge.txt", package="file2meco")
abund_file_path <- system.file("extdata", "example_metagenome_sample_info.tsv", package="file2meco")
sample_file_path <- system.file("extdata", "example_metagenome_match_table.tsv", package="file2meco")
match_file_path mpa2meco(abund_file_path)
# 'rel = FALSE' means raw abundance in taxa_abund
<- mpa2meco(abund_file_path, sample_table = sample_file_path, match_table = match_file_path, rel = FALSE, use_level = "s__")
test1 $taxa_abund$Kingdom[, 1:3]
test1# 'rel = TRUE' means relative abundance in taxa_abund
<- mpa2meco(abund_file_path, sample_table = sample_file_path, match_table = match_file_path, rel = TRUE)
test2 $taxa_abund$Kingdom[, 1:3]
test2# The relative abundance in test2 is different with that in test3
# The taxonomic abundances in taxa_abund of test3 is calculated based on the otu_table (species abundance) and tax_table
# So when the user need to use raw or relative taxonomic abundance coming from original file, please donot run cal_abund function.
library(microeco)
<- clone(test2)
test3 $cal_abund()
test3$taxa_abund$Kingdom[, 1:3] test3
8.6 NCycDB/PCycDB
NCycDB database (Tu et al. 2018) is a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes.
The ncyc2meco
function is designed for construct the microtable object using gene abundance files from NCycDB.
This function can also be used to parse the output of PCycDB (Zeng et al. 2022) database benefiting from implemented mapping database from v0.7.0.
The ncyc2meco
function can identify the database and invoke the internal mapping data automatically according to the gene names of input features.
library(file2meco)
library(microeco)
library(magrittr)
?ncyc2meco# use the raw data files stored inside the package
<- system.file("extdata", "example_Ncyc_table.tsv", package="file2meco")
abund_file_path <- system.file("extdata", "example_metagenome_sample_info.tsv", package="file2meco")
sample_file_path <- system.file("extdata", "example_metagenome_match_table.tsv", package="file2meco")
match_file_path ncyc2meco(abund_file_path)
ncyc2meco(abund_file_path, sample_table = sample_file_path, match_table = match_file_path)
# Let's try more interesting usages with microeco
<- ncyc2meco(abund_file_path, sample_table = sample_file_path, match_table = match_file_path)
test $tidy_dataset()
test# use split_group = TRUE to calculate the pathway abundance with multipe map correspondance
$cal_abund(select_cols = 1:2, rel = TRUE, split_group = TRUE, split_column = "Pathway")
test$taxa_abund$Pathway %<>% .[!grepl("unclass", rownames(.)), ]
test<- trans_abund$new(test, taxrank = "Pathway")
test1 $plot_bar(bar_full = FALSE)
test1# for gene abundance, no splitting on the pathways
$cal_abund(select_cols = 1:2, rel = TRUE, split_group = FALSE)
test$taxa_abund$Gene %<>% .[!grepl("unclass", rownames(.)), ]
test<- trans_abund$new(test, taxrank = "Gene")
test1 $plot_bar(bar_full = FALSE) test1
8.7 phyloseq
Two functions meco2phyloseq() and phyloseq2meco() were provided for the conversion between microtable object and phyloseq object of phyloseq package (Mcmurdie and Holmes 2013).
# Please first install phyloseq
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
::install("phyloseq")
BiocManagerlibrary(phyloseq)
# from microtable to phyloseq object
data("dataset")
<- meco2phyloseq(dataset)
physeq physeq
# from phyloseq to microtable object
data("GlobalPatterns")
<- phyloseq2meco(GlobalPatterns)
meco_dataset meco_dataset
8.8 PICRUSt2
PICRUSt2 (Douglas et al. 2020) contains an updated and larger database of gene families and reference genomes compared to the original version of PICRUSt. We do not create a special file conversion function for PICRUSt2 as it is very easy to convert the output pathway files of PICRUSt2 to microtable object. Two example files of PICRUSt2 output in file2meco package were used to show the operation.
# MetaCyc pathway output
<- system.file("extdata", "example_PICRUSt2_MetaCyc_path_abun_unstrat.tsv", package="file2meco")
tmp_file_path <- read.delim(tmp_file_path, row.names = 1)
pathway_table data("MetaCyc_pathway_map")
<- microtable$new(otu_table = pathway_table, tax_table = MetaCyc_pathway_map)
tmp $tidy_dataset()
tmp tmp
# KEGG pathway output
<- system.file("extdata", "example_PICRUSt2_KEGG_path_abun_unstrat.tsv", package="file2meco")
tmp_file_path <- read.delim(tmp_file_path, row.names = 1)
pathway_table data("Tax4Fun2_KEGG")
<- microtable$new(otu_table = pathway_table, tax_table = Tax4Fun2_KEGG$ptw_desc)
tmp $tidy_dataset()
tmp tmp
8.9 ViromeScan
ViromeScan (Rampelli et al. 2016) is a tool for metagenomic viral community profiling.
The input of vs2meco
function must be a folder containing all the directories named by sample names.
Each sample directory should have the original output files generated by ViromeScan software.
library(microeco)
library(file2meco)
# use viromescan directory inside the package
<- system.file("extdata", "viromescan", package="file2meco")
dir_path <- vs2meco(dir_path)
d1 $cal_abund(rel = TRUE)
d1# d1$taxa_abund$Family is same with the percentage output of viromescan at
# Family level, i.e. Family_level_results-%.txt file
$cal_abund(rel = FALSE)
d1# d1$taxa_abund$Family is same with the count output of viromescan at
# Family level, i.e. Family_level_results-Counts.txt file