Enhancing discovery of genetic risk factors for disease using online pathway databases
Researchers are discovering new genetic determinants of disease by correlating variations in our DNA with disease outcomes. In studies of three chronic autoimmune disorders—rheumatoid arthritis, type 1 diabetes and Crohn’s disease—we demonstrate how incorporating knowledge of co-ordinated networks of genes, encoded in online databases, can inform statistical tests for correlation, and generate stronger support for links between genes and predisposition to disease.
Cross-Disciplinary Fellow Peter Carbonetto and colleaguesauthored on Thu, 31 October 2013
Genome-wide association studies are now a major driver for discovery of genetic factors underlying common diseases. Made possible by new technologies that survey genetic variation throughout the genome, these epidemiological studies have drawn connections to genes that were not previously suspected of playing a role in disease.
Underlying these discoveries is a deceptively straightforward strategy for screening risk factors; it is concerned with correlating exposures (the alterations in our DNA) with disease outcomes. However, one important lesson emerging from these studies is that many common chronic disorders are highly “polygenic”—that is, they are determined by a large number of genetic factors each with a small effect on disease risk. This means that the individual contributions to disease risk are so subtle they cannot be distinguished from correlations that occur by chance, and therefore standard data analysis procedures that assess each variant for correlation with disease status are poorly equipped to identify most of the DNA alterations conferring risk of disease.
In this work, we show that information about biological pathways retrieved from online databases (e.g. http://www.reactome.org) can be incorporated into disease models, and these models can reveal additional links between genes and disease using the same genetic polymorphism data. To map disease associations within biological pathways, we relax the thresholds for reporting “significant” associations, and the extent to which we relax these thresholds is estimated from the genetic data. In doing so, we provide a framework that integrates two important problems: (1) the problem of identifying sets of genes that are “enriched” for disease associations; and (2) the problem of promoting genetic polymorphisms near genes in these pathways. Among our findings, the most exciting is that we obtain support for genetic factors contributing to three chronic autoimmune disorders—Crohn’s disease, rheumatoid arthritis and type 1 diabetes—that are not supported by the genetic data alone. These associations are corroborated by other published studies, thereby providing validation for this new approach. In addition, our enrichment analysis points to a major role for IL-2 signaling genes in type 1 diabetes, which is an emerging topic in diabetes research.
Figure: Regions of the human genome that likely contain risk factors for Crohn’s disease (red labels), rheumatoid arthritis (green) and type 1 diabetes (yellow) based on our analysis of the WTCCC data (Ref 1). Labels with thicker edges indicate associations that show little support based on the data alone, and are only identified after incorporating knowledge about pathways into the analysis. Most of these associations are corroborated by other genome-wide association studies (these are marked by asterisks).
Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease. P Carbonetto, M Stephens. PLoS Genetics 9: e1003770 (2013).
Ref 1.Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Wellcome Trust Case Control Consortium. Nature 447: 661–678 (2007).