Linking genomic changes to phenotypic differences

A comparative genomics approach predicts which genomic regions are linked to natural phenotypic differences between species, leveraging specificity resulting from independent phenotypic losses.

HFSP Long-Term Fellow Michael Hiller and HFSP Young Investigator Grant holder Gill Bejerano and colleagues
authored on Mon, 01 October 2012

Evolution has led to an amazing diversity in phenotypes between species, exemplified by mammals that can fly and those that returned to the aquatic environment. Since DNA as the blueprint of life encodes the phenotypes of an organism, phenotypic differences between species must be due to differences in their DNA. Advances in sequencing technology now make it possible to read the entire DNA sequence (the genome) of many species. Nevertheless, we know very little about which genomic differences contribute to the phenotypic differences that underlie the diversity observed in nature. The main reason is that any pair of species, even when as closely related as human and chimpanzee, exhibits millions of genomic changes and numerous phenotypic changes. This many-to-many relationship makes it hard to associate particular genomic differences with particular phenotypic differences.

We introduced a novel computational approach that focuses on phenotypes that are repeatedly lost in independent species to gain specificity. Take vitamin C synthesis as an example: The majority of mammalian species can synthesize their own vitamin C. However, humans and a few other primates, the guinea pig and bats have lost this ability and require a dietary source of vitamin C to prevent the disease scurvy. In these species, the loss of this phenotype will eventually lead to loss of the genomic regions that encode the ability to synthesize vitamin C. In all other mammals that retain this ability, the genomic information for this phenotype will also be retained. Consequently, such repeated phenotypic losses result in a specific evolutionary signature in these genomes. Our approach searches for exactly this signature by detecting genomic regions that are absent in species lacking a given phenotype (in this case, vitamin C synthesis) and that are present in those species possessing this phenotype. Conceptually this approach is similar to forward genetics, where researchers map the mutation underlying a given phenotypic difference by crossing and mapping experiments, therefore we term this approach “forward genomics”.

To apply forward genomics to the vitamin C phenotype, we firstly had to overcome issues related to the incompleteness of genomic data. Certain regions in the genome are currently not fully sequenced or have an elevated sequencing error rate. As this perfectly mimics the real loss of genomic regions (which we are searching for), we had to rigorously distinguish these artifacts from real losses. Secondly, complete absence of a once-functional genomic region will only occur after a long evolutionary time. For more recent phenotypic losses, these once-functional regions are still detectable but have accumulated many more mutations. Therefore, genomic regions that are likely to be related to the loss of the phenotype will have accumulated more mutations in  those species lacking the phenotype. To detect such regions, we computationally reconstructed the sequence of the mammalian ancestor, which allows the number of mutations that occurred in every species for every genomic region to be quantified.

We used our forward genomics to screen the genomes of 27 mammals and found a single genomic locus that perfectly matches the vitamin C phenotype. This genomic locus overlaps the Gulo (gulonolactone (L-) oxidase) gene that encodes a key enzyme responsible for vitamin C synthesis. Interestingly, we found that Gulo is an inactivated gene in all and only the non-vitamin C synthesizing mammals, likely explaining this phenotypic loss.

Figure: The Gulo gene is inactivated in all and only non vitamin C synthesizing mammals.The figure shows the phylogenic tree of mammals (left) and a visualization of a sequence alignment of the entire Gulo gene (right). While mutations (red lines or blocks) happened in all species, the non vitamin C synthesizing species (red font) have accumulated many more mutations and some of them inactivate this gene, leading to the loss of this phenotype.

Can we apply forward genomics to other phenotypic differences apart from vitamin C synthesis? Firstly, we demonstrated the power of forward genomics on another repeatedly changed phenotype: The bile of guinea pigs and horses (two independent lineages) contains almost no phospholipids, in contrast to other mammals. Forward genomics detected the loss of the Abcb4 gene (a transporter that secretes phospholipids into bile) in guinea pigs and horses as the most likely genomic change underlying this phenotypic difference. Interestingly, Abcb4 mutations in humans lead to the same phenotype as in guinea pigs and horses, but – in contrast to these “healthy” species – this gene loss in human results in a severe disease. This suggests that guinea pigs and horses have acquired other changes to compensate for the deleterious consequences of losing the Abcb4 gene and that studying these compensatory mechanisms may lead to new strategies for ameliorating the consequences of Abcb4 mutations in humans. Secondly, we simulated genome evolution to obtain many test cases to further explore the performance of forward genomics. We could show that our approach can detect some genomic regions for the simulated vitamin C and biliary phospholipid phenotypic loss as well as a variety of other phenotypic loss scenarios. Thirdly, by inspecting large data sets of phenotypic information, we found that phenotypes with changes in independent lineages – a prerequisite for forward genomics to achieve specificity – are quite frequent and comprise about 40% of all measured phenotypes. Therefore, together with availability in the near future of hundreds and thousands of new genomes (Genome10K, i5k initiative), forward genomics can be applied to many more phenotypic differences and will help to explain how nature’s great phenotypic diversity is encoded in the DNA.


A “forward genomics” approach links genotype to phenotype using independent phenotypic losses among related species.  Hiller M, Schaar BT, Indjeian VB, Kingsley DM, Hagey LR, and Bejerano G.Cell Reports (September 2012), 10.1016/j.celrep.2012.08.032.

Other References

Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. Haussler, D., O'Brien, S., Ryder, O., Barker, F., Clamp, M., Crawford, A., Hanner, R., Hanotte, O., Johnson, W., McGuire, J., et al. (2009). J Hered100, 659-674.

i5k initiative: Creating a buzz about insect genomes. Robinson, G.E., Hackett, K.J., Purcell-Miramontes, M., Brown, S.J., Evans, J.D., Goldsmith, M.R., Lawson, D., Okamuro, J., Robertson, H.M., and Schneider, D.J. (2011). Science331, 1386.

Cell Reports link