Clustering of a large gene catalogue of the human microbiota in MetaGenomic Species
Access to EGA study: EGAS00001001704 / Lifelines-DEEP
A) We have a built a large non-redundant gene catalogue of the human microbiota by collecting genomes reconstructed from metagenomic data (MAGs) and isolates. Now we would like to structure this catalog by grouping co-abundant genes into MetaGenomic Species (doi: 10.1038/nbt.2939 & 10.1093/bioinformatics/bty830). To do so, we have to measure gene abundances (read counts) across thousands of samples.
B) We will align reads from each sample of the LLD cohort against our gene catalogue to build a large gene abundance table. We will NOT perform de novo assembly and neither add genes from LDD samples into our catalog. This table will be merged with others that were built from large cohorts (predict1, metacardis, SCAPIS etc). Using MSPminer or canopy (popular gene binning tools), this table will be used to identify MetaGenomic Species in the catalog.
C) We will finish this work before the end of 2023 and publish it in mid 2024. Please note that given the use that is made of the data, we agree to cite the LDD cohort in the acknowledgments section of our publication but NOT as an author.