Life’s two lotteries: disentangling the effects of genes and environments in human capital outcomes
In health, factors other than medical care, such as 1) behavior, 2) genetics (“nature”), and 3) socio-economic environments (“nurture”), account for at least 85% of preventable deaths [1]. Yet around 80% of total health expenditures is devoted to medical care and most of this is spent at the end of life [2], when the scope for preventive efforts has passed. Economic research suggests that investment in children, pregnant women, and their environments, is effective in both improving the health and economic well-being of individuals and reducing inequalities between socio-economic status (SES) groups [3, 4]. Differences in cognition and health outcomes between SES-groups can be detected early, before the first year of life, grow rapidly in childhood and then level off [5]. Further, children from disadvantaged SES backgrounds are more vulnerable to early life shocks and they experience negative shocks more frequently [6,7]. Yet, we have a limited understanding of how early-life shocks and environment interact with genetic endowments in reaching one’s potential [8].
The main objective of this proposal is to combine socio-economic, health, and genetic data to understand the origins of differences in human capital outcomes, defined as any measure of knowledge, skills, health and well-being of individuals [9]. To this end, we will use the rich data of the Lifelines cohort, which contains measures of environment, health, health behaviors, educational attainment, labor market outcomes, and genetic data for Lifelines respondents and their family members. Estimating the effect of genes on human capital-related outcomes has only recently become feasible thanks to rapidly increasing sample sizes of genome-wide association studies (GWAS). Such genetic “discovery” studies have now robustly identified highly significant (genome-wide) relationships between genetic variants (single-nucleotide polymorphisms, SNPs) and various outcomes. In this project, we will aggregate the effects of such SNPs into polygenic scores (PGSs) to study the effects of gene-environment interplay in early childhood on subsequent stages of life. We aim to identify modifiable aspects of the socio-economic and policy environments, and the optimal timing of interventions, to guide policymakers on how to best improve health and socio-economic outcomes and reduce inequalities in these outcomes between SES groups. Our research objectives are:
1. Estimate the effects of genes and environments on human capital in adulthood
Children inherit their genetic makeup from their parents, but also inherit the environment their parents create for them. Hence, a PGS (e.g., for education) is generally correlated with measures of the environment in which an individual grows up. The unique structure of Lifelines allows us to overcome this issue. In Lifelines, we can construct a respondent’s PGS, and also PGSs of their direct family members. Using within-family models, we will interact PGSs of Lifelines respondents that predict various human-capital outcomes (e.g., educational attainment, income, risk preferences) with measures of parental SES, and condition on their parental PGSs. This makes it possible to estimate the causal genetic effect (using the fact that the genetic makeup of the child is a random combination of that of his/her parents). This genetic effect is then no longer biased by environmental characteristics. Further, we will use sources of variation in parental SES of Lifelines respondents that is as good as random (using so-called natural experiments), such that causal statements regarding the effects of variation in components of parental SES can also be made. Examples of such sources of quasi-random (exogenous), variation that we could use are (a) job layoffs that are unrelated to the performance of an individual (but occur, for example, because of issues that relate to a company’s performance), (b) changes to minimum required years of schooling imposed in the past by the Dutch government, which affected different birth cohorts to a different extent, and (c) neighborhood-specific shocks to the income distribution generated by the postcode lottery. Because our research will be able to identify and separate the causal effects of genetic endowments and the causal effect of environmental characteristics, our research will be at the very frontier of current social science genetics research.
2. Evaluate the role of gene by environment interplay in early life
We will study how genes and environments interact in producing early-life outcomes such as birth weight, gestational length and childhood body-mass index. As for objective 1, we will use cutting-edge methods to explore both random components of genetic endowments and quasi-experimental variation in early-life environments. Examples of natural experiments that provide quasi experimental variation in early-life environment are (a) exposure to the flu-season during gestation, and (b) medical guidelines for assigning care to patients, such as classifications of very low birth weight (below 1.5kgs at birth), or of very preterm birth (<32 weeks at birth), that are used to sort newborns into receiving specialized care (e.g., incubator care unit). We will combine natural experiments that induce quasi-random changes in environment with measures of genetic endowments that are independent of parental genetic make-up using a family design. Combining causal genetic and causal environmental methods., this research too will be at the frontier. To ensure that the data on early-life variables is as extensive and reliable as possible we will link the Lifelines to perinatal registries of the Netherlands using the Perined dataset. This is especially important, as we will use a Regression Discontinuity Design, for which the determination of precise and granular data on variables recorded during the perinatal period is necessary.
3. Estimate a dynamic model of childhood development to understand mechanisms behind GxE interplay, make predictions and evaluate policy alternatives
We will combine a model of dynamic skills formation [10] with genetic data to elucidate genetic and environmental pathways in the development of childhood skills and health. Current models of child development rely on measures of initial child skills (proxied by variables measured at birth) that are partially determined by parental investments in utero. We will instead use the natural experiment of genetic recombination at conception to construct genetic measures of initial child endowments that are independent of parental characteristics and investments, by controlling once more for parental genotype. This is our first contribution. Our theoretical model will also be distinct from traditional skill-formation models by modeling how genetic endowments continuously interact with environments in shaping child skills and health, allowing the genetic component to influence outcomes in every time period (not only at birth or time zero). We will estimate this model on Lifelines data and conduct simulations. This is our second contribution. The literature on GxE interactions in childhood is still in its infancy. Only one article [11] combines a structural approach with genetic data, as we propose. No study as of yet has used randomized genetic variation, which is made possible by Lifelines’ family structure. Our proposed study will therefore be at the frontier of this research area too.
4. Investigating the genetic architecture using within-family GWAS
Traditional population based GWASs focus on identifying genetic variants associated with a phenotype of interest. However, GWAS results can also be used to investigate the extent to which two or more phenotypes are similar on a genetic level. The metric for this is genetic correlation, and can be interpreted as a proxy for the extent to which underlying biological processes are shared between phenotypes. The literature shows genetic correlations among most complex traits, including health outcomes. Increasingly, this type of information is leveraged by initiatives aiming to classify diseases more accurately, and to improve diagnosis, such as the Research Domain Criteria (RDoC) [12]. However, this method comes with limitations. First, the genetic correlation between two health outcomes does not merely reflect medically relevant similarities. It is known that many phenotypes are genetically correlated to SES, which complicates interpretation. Previous work by us showed significant changes in patterns of genetic correlations, among mental health traits, after removing SES-associated genetic variation [13]. This indicates that at least among mental health traits, a significant proportion of genetic correlation between two traits might not reflect similarity that is medically relevant. Second, it has been shown that results obtained from population based GWASs are affected by demography and indirect genetic effects [14]. Therefore, genetic correlations are biased. Thus, classification systems based on such genomic information such as the RDoC are, in part, misinformed.
We aim to improve on conventional genetic correlations by, i) conducting within-family GWAS (rather than conventional population based GWAS), thereby reducing bias introduced by demography and indirect genetic effects; and ii), removing medically irrelevant genetic variation from genetic correlations using the method of genomic SEM [15], thereby removing the genetic variance shared with SES from the genetic correlation between each pair of phenotypes.