BIObanks Netherlands Internet Collaboration (BIONIC): harmonization of depression measurement for genome-wide association meta-analyses
Major depressive disorder (MDD) poses a large burden on individuals and society. Over the past decade, MDD has consistently placed among the top ten global burdens of disease (Institute for Health Metrics and Evaluation, 2017). In addition, with a lifetime prevalence of 19% in the Netherlands, MDD is one of the most common psychiatric disorders present in the population (Fedko et al., 2020). The disorder is mainly characterized by a continuously low mood and anhedonia, accompanied by a large variety of peripheral symptoms (e.g. feelings of worthlessness, sleeping problems, and suicidal ideation). Due to medical costs and productivity loss, the burden associated with MDD is not only related to well-being. With a mean annual cost of more than €2000 per patient, the disorder places a large economic strain on both patients and society as a whole (Smit et al., 2006; Gustavsson et al., 2011).
Contemporary treatment strategies for MDD tend to implement specific combinations of psychotherapy and antidepressant medication. While effective for some, at least 50% of individuals that have experienced MDD will suffer from recurrence, and experience an average of five to nine separate episodes across their lifespan (Kessler & Walters, 1998; Burcusa & Iacono, 2007; Hardeveld et al., 2013). In addition to the 5 - 10% of patients that never even reach the recovery phase, these patients form a large group of individuals that have to cope with the burden of MDD for the majority of their lifetime (Rao et al., 2010). MDD’s severity and imperfect treatment stress the importance of additional research into its underlying etiological architecture. With contemporary advances in DNA sequencing and gene mapping, MDD research turns its gaze towards genetics.
Twin studies and other heritability approaches have repeatedly shown MDD to be a complex trait that originates from both environmental and polygenic factors. Heritability of MDD has been estimated to lie in the range of 31-42% (e.g. Sullivan et al., 2000; Rice et al., 2002). Early MDD GWA efforts were hampered by its modest heritability, high prevalence, and inherent heterogeneity, in addition to the power issues that plagued the majority of association studies in the early gene finding era. Despite these early hardships, recent genome-wide association studies (GWASs) have been proving increasingly successful in uncovering genetic variants that contribute to the liability for depression (e.g. Wray et al., 2018; Howard et al., 2019). However, there are two major issues emerging from the conducted research on MDD genetics: 1) the phenotype adopted in these studies, and 2) the consideration of population-specific variants.
1) Recent questions have been raised on the validity of the phenotypes adopted in the successful MDD GWA studies (Abbasi, 2017; Cai et al., 2018). The questions generally address two issues. First, the adopted phenotypes are often largely based on brief self-report assessments, and sceptics argue that such self-reported depression may not fully align with a clinical DSM 5-based diagnosis. Second, assessment approaches have often shown distinct disparities across cohorts, a practice that forms a known detriment in GWA research (Manchia et al., 2013). Being prone to power issues, GWAS relies on a stark distinction between cases and controls. The possibility of large misclassification introduced by the use of brief self-report assessments, together with the use of a heterogeneous phenotype, can substantially weaken the strength of association signals found in these studies, hampering the identification of genetic variants. Furthermore, it has been proposed that subsequent GWA signals would not reflect genetic liability of MDD, but rather a more general inclination towards internalizing psychopathology or poor mental health (Schwabe et al., 2019). The GWAS authors themselves were aware of these limitations at the time of research, appropriately referring to their phenotype as major depression rather than MDD. A GWAS conducted with an actual DSM 5-based MDD phenotype would have large incremental value for our understanding of the etiology of depression.
2) Estimates of SNP-based heritability are still far removed from family-based heritability estimates (8.9% vs. 37%; Howard et al., 2019; Sullivan et al., 2000). Recent research suggests that this disparity may partly be accounted for by population-specific and low-frequency or rare variants, many of which are yet to be identified (Manolio et al., 2009; Gibson, 2011; Bomba et al., 2017). The difficulty of rare variant identification is threefold. A first reason is incorporated in the very definition of rare variants. A variant is considered rare when its frequency in a population lies between .5% and .05%. As rare variants are uncommon, their detection in GWAS would require either a study sample that is largely genetically homogeneous, or rare variants that have large effect sizes (McClellan & King, 2010). Typically, neither of these conditions apply to MDD GWA samples and undiscovered rare variants. Second, rare variants tend to be population-specific (Tennessen et al., 2012). Genotypic and phenotypic heterogeneity introduced by mixing different sample populations can result in decreased effect sizes and statistical power in genetic association analyses (Manchia et al., 2013). The use of mixed populations in GWAS thus hampers genetic variant identification. A third reason is related to imputation, a common GWAS practice that uses a densely genotyped reference panel to estimate ungenotyped variants in the discovery data. When a reference panel is based on a mix of diverse populations, the probability of phasing a rare variant is low (Howie, Marchini & Stephens, 2011). Consequently, the ungenotyped variants in MDD GWASs that require imputation are unlikely to be assigned a rare variant, hampering the establishment of their association with the phenotype. Therefore, a genetically homogeneous sample and a population-specific reference panel would facilitate the identification of MDD’s population-specific and possibly rare genetic variants.
We aim to tackle both issues outlined above using data obtained in the BIONIC project. An initiative funded by the BBMRI-consortium, BIONIC is a large-scale collaboration of scientific institutions and biobanks in the Netherlands that use a standardized online tool to assess MDD in individuals whose genetic information has already been obtained in previous collection efforts. This strategy greatly reduces the required sampling resources per individual, facilitating larger GWAS sample sizes and associated statistical power.
The BIONIC project can overcome the aforementioned issues in two ways. First, phenotypic accuracy is warranted through the standardized incorporation of DSM-5 criteria, while the issue of phenotypic heterogeneity is avoided through harmonized assessment of MDD across cohorts. MDD’s inherent heterogeneity can be investigated through BIONIC’s measurements of individual MDD symptoms. Second, both the availability of a Dutch imputation reference panel and the sample’s genetic homogeneity aid in the identification of population-specific and possibly rare genetic variants associated with MDD in the Netherlands.
In the first phase of the project, our team developed and validated a rapid online DSM-5 based MDD assessment tool which would serve as BIONIC’s main tool for data collection (Bot et al., 2017). In the second phase, Fedko et al. (2020) used a subset of the newly acquired MDD data to introduce the cohorts, and demonstrate the large alignment between estimates of prevalence and heritability found in BIONIC and those of previous MDD efforts. Now, in the third phase, we plan to identify the specific genetic variants associated with MDD and its individual symptoms in the Dutch population. To this end, we will perform a genome-wide association meta-analysis on the MDD and genetic data from all participating cohorts, after imputing the data to a combined reference panel of the population-specific Genome of the Netherlands (GoNL; Boomsma et al., 2015) and The Thousand Genomes project (1000G3; 1000 Genomes Project Consortium, 2010).
The main aim of this paper is to demonstrate the utility of nation-wide cohort collaboration and rapid online DSM 5-based assessment in the search for genetic variants of MDD, showcasing the newly established Dutch MDD dataset BIONIC. In addition, the power gained from the homogenous MDD assessment, together with the availability of a population-specific imputation reference panel, provides us with a unique opportunity to identify population-specific and possibly rare genetic variants associated with MDD in the Dutch population. Finally, to investigate MDD’s inherent heterogeneity, GWAMA’s will be conducted on the nine individual MDD symptoms.