bugphyzz: a harmonized data resource and software for enrichment analysis of microbial physiologies

bugphyzz: a harmonized data resource and software for enrichment analysis of microbial physiologies


Author(s): Samuel David Gamboa-Tuz,Kelly Eckenrode,Jonathan Xi-Yao Ye,Jennifer Wokaty,Ben Nachod,Eric Franzosa,Nicola Segata,Curtis Huttenhower,Levi Waldron

Affiliation(s): CUNY ISPH

Social media: https://twitter.com/samueldgamboa

Microbiome sequencing allows the study of the abundance and composition of uncultured microbial communities. Tools such as PICRUSt and HUMAnN allow the analysis of microbial molecular functions and metabolic pathways using gene and genome information from highly curated databases such as the NCBI, KEGG, and the Gene Ontology. On the other hand, data about other microbial traits such as physiology, lifestyles, and habitats, are often sparse across different sources and formats, not in readily computer-readable formats, and provide incomplete annotation of taxa, making them impractical to use in enrichment analysis workflows. Here, we describe the development of a manually-curated database and the bugphyzz R client (https://github.com/waldronlab/bugphyzz, planned for release in Bioconductor 3.18) to harmonize physiologies and other microbial traits from 17 different sources, including Bergey’s abstracts, literature reports, and popular databases like BacDive and ProTraits. We used the NCBI taxonomy to standardize microbial taxonomic names and ranks and ontologies to standardize microbial traits. We applied an ancestral state reconstruction and inheritance algorithm to propagate standardized microbial trait annotations across unannotated taxa, providing over 2 million harmonized annotations of 40 microbial traits in more than 60,000 species and genera. These data will be stored on Zenodo following the FAIR (Findability, Accessibility, Interoperability, and Reuse) guiding principles for cross-platform utility, with the R package providing streamlined access and additional features. We demonstrate the application to a microbe set enrichment workflow, including enrichment analysis of the BugSigDB database of published microbial signatures. We expect that bugphyzz will aid researchers in the interpretation of high-throughput microbiome data in the context of microbial physiology and other microbial traits.