Investigation of a novel bacterial enzyme in the adult and infant gut microbiome
Our aim is to characterize the presence of a novel enzyme that we have already identified and experimentally characterized in the human gut microbiome. We have already processed a large collection of other metagenomes from multiple studies focused on the infant gut and studies focused on the adults with IBD. We are in the process of adding in healthy adult populations to our dataset in order to give us a background population to compare with.
Briefly our methodology consists of the following: 1) Performing a standardized quality control process that consists of filtering out low quality reads and removing potential host contamination. 2) performing taxonomic classification on the reads to assess the presence of different bacterial taxa within each sample. 3) performing read mapping to custom databases of sequences of our gene of interest to quantify the relative abundance of our gene and other reference genes within each sample. 4) performing statistical analyses of the gene mapping data to assess if there are differences in gene presence and absence within different populations of interest.
Our timeline for use of the data would be relatively fast. Our read processing and analysis pipeline is already being used on other datasets. We would likely be processing data from the Lifelines DEEP dataset within days of being granted access.
I have been performing bioinformatics analysis of metagenomic, metatranscriptomic, and genomic data my entire scientific career. I have experience working with these data at every step from raw sequencing reads through assembled metagenomes. At the NIH we have access to a large HPC and would not have any practical limitations in storing or processing these data. Furthermore my postdoctoral advisor also has extensive experience working with these types of data as do my other collogues.
We have already used our in house analysis pipeline to process datasets with thousands of metagenomes in them and would not have any issue processing the 1473 samples of the Lifelines DEEP metagenomes.