Fecal microbiome profiling of pediatric and adult irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD)
Predominant diarrheal diseases of inflammatory bowel diseases (IBD) including ulcerative disease (UC) and Crohn’s disease (CD), and irritable bowel syndrome (IBS) including both diarrhea (IBS-D) and mix (IBS-M) subtype represent major GI clinical manifestations that affect up to 20% of the population. Those diarrheal diseases share several common symptoms including diarrhea and abdominal pain, thus clinical diagnosis usually requires professional evaluation, comprehensive laboratory testing and Rome questionnaire that are time-consuming procedures and usually need multiple patient visits. Since there is no disease-specific stool biomarkers for IBD and IBS1,2, endoscopy remains as the gold standard approach for accurately diagnosing IBD and IBS with the assistance of laboratory testing and questionnaire. Moreover, there are several co-morbid conditions such as IBD and post-infectious IBS patients experiencing CDI3,4, that can impact diagnostic accuracy of diagnosis and treatment efficacy. Thus, there is a need for alternative diagnosis of these diarrheal patients. For the past decade, gut microbiome surveys on those diseases have provided promising insights into disease associations5–7. However, interesting observation reported from single-center microbiome survey lack power and independent cohort is rarely performed. Meta-analysis based on the results of multiple survey reports does not provide confident integration since methods of data generation and analysis vary significantly.
Gut microbiome survey is now frequently conducted with targeted metagenomic sequencing, often referred to 16S rDNA amplicon and internal transcribed spacer (ITS) sequencing, has been extensively applied to profile complex consortia of microbes, including bacteria and fungi that populate human gastrointestinal (GI) tract8–10. Pioneering studies, including the American Gut Project8, Human Microbiome Project9 and Earth Microbiome Project10 have established standard operating procedures (SOPs) for microbiome profiling that includes DNA extraction methods, 16S primer design, and bioinformatics pipelines11. Establishing robust microbiome-associations that adequately account for different demographics in human disease requires integrated and sufficiently powered multi-center investigations, including time-consuming ethic approvals for conducting clinical trials. Thus, re-analyzing publicly available cohort data becomes an increasing need to mine host-microbiome interactions that may facilitate precision or personalized diagnosis and microbiome-based therapeutics. However, performing meta-analysis on individual microbiome projects that sequence distinct 16S variable regions on different technology platforms remains a significant challenge since appropriate pipelines are lacking for accurate taxonomic profiling12,13.
The UCLUST algorithm in the QIIME1 package, VSEARCH and Deblur in the QIIME2, USEARCH, mothur and DADA2 are currently the most popular tools for clustering or denoising 16S amplicons. These pipelines have been applied to many single microbiome surveys, but there is limited effort in using these data for microbiome meta-analysis. Closed-reference OTU picking strategy represents the first attempt to analyze multiple data sets and has been used for several large projects including MiBioGen consortium initiative14. A recent report utilizing OTUX also adopted closed-reference OTU picking by using its pre-built region-specific OTU reference database for meta-analysis15. However, concerns are reported about inaccurate taxonomic profiles generated using closed-reference analysis because they fail to assign amplicon reads from different 16S variable regions to the same reference sequence16,17. Notably, de novo sequence clustering is not readily applicable for merging multiple OTU tables, whereas sequencing denoising offer the possibility because they implement amplicon sequence variants (ASVs) for such merging purpose18. However, this is simple combination of ASV tables without solving the phylogeny of ASVs originated from the same parent 16S sequence. Thus, meta-analysis interpretation at OTU/ASV level still requires novel tools. Taxonomic binning from OTU/ASV tables becomes a more feasible approach if taxonomic annotation of OTUs/ASVs is accurate across multiple 16S regions19–21. This represents the major goal of this study. We designed a bioinformatics pipeline – Taxa4Meta that achieves high accuracy of clustering and taxonomic annotation across different amplicon sequences. We evaluated its performance using both mock and experimental microbiome datasets. Taxa4Meta pipeline was further used for microbiome meta-analysis of predominant diarrheal diseases including IBD and IBS. We plan to apply the same bioinformatics approach to the LifeLines cohort data and use shotgun metagenomic data for further validation of taxonomic and functional features.