Applications

Using Machine Learning to uncover patterns in Lifelines data

To date, cohort studies are often driven by a hypothesis, typically involving a limited number of pre-selected variables (1,2). For example, Lifelines was used to show that a pre-defined multi-variable diet score predicts type-2 diabetes incidence (3).
This situation may be compared to candidate gene studies in the pre-GWAS era, which were largely replaced by hypothesis-free scans for genetic associations when whole-genome genotyping became feasible.  In a similar way the deeply phenotyped Lifelines cohort now opens the possibility of identifying correlations between a range of environmental, phenotypic and genetic explanatory variables on the one hand, and phenotypic outcome variables on the other, in a hypothesis-free manner. However, by contrast to genotypes, the variables collected in the Lifelines cohort exhibit complex nonlinear and causal relationships, are of diverse type (binary, categorical, ordinal and continuous), and exhibit complex forms of missingness.  Therefore, there is a need for development of appropriate statistical methodology.

year of approval

2021

institute

  • University Medical Center Groningen

primary applicant

  • Lunter, G.