Inference of Sparse Microbial Association Networks using Multi-Graphical Lasso
Access to EGA study: EGAS00001001704 / Lifelines-DEEP
A. outline of the study design and methodology
We intend to use this data to test the capabilities of a novel association-network-estimation algorithm, following on from the work by Prost et al. ("A zero inflated log-normal model for inference of sparse microbial association networks", which also used this dataset).
Our algorithm yields multiple association graphs; those between microbial communities (as in Prost et al.) as well as those between metabolites and those between samples. The novel methodological contribution is the simultaneous estimation of all these graphs, in a similar manner to bi-Graphical Lasso algorithms such as EiGLasso. The hope is that this will yield a more accurate estimation, as non-simultaneous methods are forced to make stronger independence assumptions.
We will validate our resulting microbial graph in the same manner as in Prost et al, i.e. measuring homophyly/assortativity.
B. proposed use of the requested data
The 16S rRNA dataset will be used to estimate the microbial association graph. As our methodology was designed to take into account multiomics data during the estimation process, we are also requesting the MGS and the plasma-untargeted metabolomics datasets.
While we are most interested in the microbial association graph, our methodology will also yield graphs for the other axes of the data. We are requesting the age and gender data as a source of validation for the graph of sample associations.
C. timeline;
The methodology has already been tested on simulated data, and some real-world single-omic datasets. It should not take long to apply this methodology to the LifeLines-DEEP dataset. Once this is complete and the results have been validated using the same steps as Prost et al, we will attempt to validate the metabolomics-wise and sample-wise graphs as well.
We intend to have completed analysis by May.