Genome-wide association study on income
The purpose of this study is to generate a set of publicly available genome-wide association study (GWAS) results on socio-economic position (SEP) broadly defined (including e.g. income, occupations, educational attainment) that will provide researchers from various disciplines with new, better ways to study the causes and consequences of inequality and social mobility – two matters that are of fundamental importance for science and policy1. Differences in SEP are not only robust predictors of subjective well-being2,3, but low SEP is also a major risk factors for mental and physical diseases4 as well as lower life expectancy5. For example, an epidemiological study5 found in a sample of 26 million person-years from various cohorts that participants with low SEP had greater mortality compared to those with high SEP (hazard ratios 1.42, 95% CI 1.38-1.45 for men, 1.34, 1.28-1.39 for women). Low SEP was associated with a 25 month reduction in life expectancy between ages 40 and 85 years. The health effects of low SEP were partially mediated by higher prevalence of smoking, diabetes, physical inactivity, alcohol intake, and hypertension among individuals of low socio-economic status. Yet, low SEP remained a substantial risk for all-cause death even after controlling for these and other well-known risk factors (HR 1.26, 95% CI 1.21-1.32)5.
Similar results were recently reported by Chetty et al.6 based on a US sample consisting of 1.4 billion person-year observations. The gap in life expectancy between the richest 1% and poorest 1% of individuals in the US was 14.6 years (95% CI, 14.4 to 14.8 years) for men and 10.1 years (95% CI, 9.9 to 10.3 years) for women6. On average, the more advantaged individuals are, the better their health. Furthermore, higher income is associated with greater longevity throughout the entire income distribution, not only at the extremes.6,7
Low SEP is a proxy for material hardship that manifests itself in various forms including food insufficiency, eviction, utility disconnection, phone disconnection, inability to see a doctor, dilapidated or crowded housing8, all of which affect quality of life and have negative health implications. Furthermore, low SEP is also associated with social isolation, an important component of European measures of poverty and a determinant of all-cause mortality9. Paying attention to these robust health-related consequences of SEP is particularly important and timely now because the income and wealth gap between the richest and poorest people has been steadily rising in the past few decades in the US and many other countries10,11. Thus, understanding the structural causes of inequality, social mobility, and their links with health is of fundamental importance both as a matter of science and for interventions aiming to improve health outcomes, well-being, and longevity.1
It has long been recognized that parental SEP is a major determinant of a child’s expected trajectory in terms of cognitive and non-cognitive skill development, behaviors12, educational attainment13, career prospects, and adult income.11 In other words, differences in SEP are partially transmitted across generations. At the same time, education, income, personality, cognitive abilities, and occupational choices are all heritable to some extent and parents pass on both their environments and their genes to their offspring.14–19
Thus, a major challenge in understanding individual causes of SEP and social mobility is disentangling behavioral and environmental effects from possible genetic confounds and to study their interactions. Yet, the current scientific possibilities to do so are very limited. Until now, the primary tools to disentangle the effects of a parent’s genes from parental environment were adoption studies20 and children-of-twin studies21. However, few samples of this type exist, those that do are typically small, and these naturally occurring experiments are rarely representative of the entire range of environments. Furthermore, these datasets do not allow the investigation of any interactions between environments and specific biological pathways. As a result, scientific insights are still very limited about why and how social inequalities tend to persist within families throughout generations, why and how these inequalities translate into differences in health and mortality, and what the most effective ways are to help disadvantaged individuals.
Furthermore, understanding the causal effects of social inequality on health, well-being, and other outcomes remains a big scientific challenge. Many research questions in this realm can only be studied in field data that make it difficult or impossible to discern between causal effects and (spurious) correlations due to unobserved factors (e.g. genetic and environmental endowments). Convincing natural experiments that lead to exogenous variation in SEP are rare, which limits the possibilities to use instrumental variable or difference-in-difference approaches to identify causality. Furthermore, panel studies which allow to control for individual fixed-effects can eliminate unobserved variable bias from unobserved genetic heterogeneity among individuals, but they do not allow to investigate intergenerational transmission or the interplay between genetic and environmental effects.
With the event of well-powered GWAS results on a growing number of outcomes, new tools and new study designs are now possible that allow to zoom in on causal mechanisms, to disentangle environmental and genetic pathways, and to investigate their interactions. For example, GWAS results can be used to shed light on the genetic correlations among traits22 and to model their multivariate genetic relationships23. Under specific assumptions, GWAS results can also be used to identify causal effects between various outcomes (e.g. SEP and health) even in the presence of direct pleiotropic effects of genes24,25.
Furthermore, so-called polygenic scores (linear indexes that summarize the genetic signal for a particular phenotype across the entire genome) offer a powerful tool to investigate the interplay between genetic and environmental endowment26. Using samples of genotyped trios (i.e. mother, father, and child), these polygenic scores can also be employed to tease apart intergenerational transmission via direct genetic channels versus transmission via environmental factors that are correlated with the non-transmitted alleles of the parents27,28. Polygenic scores could also be used to control for genetic confounds in study designs based on field data29 or to increase statistical power to identify the effects of other variables of interest in a multivariate regression30. In particular, the combination of polygenic scores that are based on large-scale, well-powered GWAS like the one we are proposing here with the growing availability of genetic data in samples that contain high-quality, longitudinal measures of socio-economic environments, outcomes and behaviors (e.g. AddHealth, Understanding Society, Wisconsin Longitudinal Study, the Texas Twin Project, BASE-2) will inspire a new generation of empirical studies that will contribute towards our understanding of the sources and consequences of social inequality, as well as the identification of effective environmental interventions.
Thus, making well-powered GWAS results on various aspects of SES publically available opens up new possibilities for a wide array of research that addresses questions of fundamental importance to health and well-being.
One of the main purposes of our project is to conduct the first large-scale genome-wide association studies (GWAS) on individual income and occupations – two of the major pillars of SEP. Similar traits such as educational attainment15,31, social deprivation measured by the Townsend index, and household income32 have previously been studied using large-scale GWAS. Here, we will complement this line of research by looking at the full scope of measures of SEP in large-scale GWAS, by increasing previously available sample sizes, and by combing the results in a multivariate study-design using genomic SEM.