Decoding human cytomegalovirus

A closer look at the human cytomegalovirus genome uncovers many novel viral open reading frames.

HFSP Long-Term Fellow Noam Stern-Ginossar and colleagues
authored on Mon, 17 December 2012

The HCMV genome was sequenced more than 20 years ago (~240 kb, making it the largest known human virus). However, like other complex viruses, our understanding of its protein coding potential, which has relied on sequence-based informatics searches, was far from complete and this was still a principle challenge in the field. To experimentally identify the range of HCMV translated open reading frames (ORFs) and to monitor their temporal expression, we have used a series of approaches based on a method termed ribosome profiling. Ribosome profiling utilizes deep sequencing technology to determine the spectrum of the mRNA fragments which are protected by ribosomes. This method is robust and accurate enabling the mapping of locations (at codon resolution) and the density of the ribosome on each mRNA in the cell and therefore allows systematic analysis of protein translation.

These experiments allowed us for the first time to experimentally define all the viral ORFs that are being translated during the course of infection without relying on any untested assumptions. Amazingly, we have identified 751 distinct viral ORFs, 641 of which were previously not suspected of being coding and a fraction of them was confirmed by mass spectrometry measurements. These new ORFs originate from a variety of sources including novel loci, antisense ORFs, short upstream ORFs (uORFs), messages thought to be non-coding, and a range of alternative protein products that overlap with annotated ORFs.

Our data also provided an opportunity to monitor viral protein translation throughout infection and revealed that most of the viral genes (canonical and newly identified ORFs) showed tight temporal regulation of protein synthesis levels.  We then asked how is such tight temporal regulation of the range of different ORFs, including many encoded by overlapping genomic regions, achieved?  Strikingly, examination of the viral transcripts during infection revealed the pervasive use of a distinct mode of gene regulation in which dynamic changes in 5’ ends of transcripts play a critical role in enabling tight temporal control of protein expression and in creating protein diversity. Like alternative splicing, this mechanism can expand protein diversity and contribute to organismal complexity by allowing multiple distinct polypeptides to be generated from a single genomic locus.

The genomic era began with the sequencing of the bacterial DNA virus, phi X, in 1977 and the mammalian DNA virus, SV40, the following year. Since then, extraordinary advances in sequencing technology have enabled the determination of a vast array of viral genomes. However, due to the high-density nature of these genomes, deciphering their protein coding potential remains a great challenge.  In our work we present the first experimentally-based analysis of the expressed proteome of a complex DNA virus, HCMV. Our work provides a framework for studying HCMV by establishing the viral proteome and its temporal regulation, providing a context for mutational studies and revealing the full range of HCMV antigenic potential. More broadly, our work establishes a paradigm for mapping and deciphering complex genomes.

Figure: Cells were infected with HCMV and harvested at different times after infection for ribosome footprint analysis (deep-sequencing of ribosome-protected mRNA fragments) using either Cycloheximide pretreatment to map ribosomes densities along messages or Harringtonine (a drug that causes ribosomes to accumulate only at translation initiation sites) to map translation initiation. On the right the ribosome occupancies and mRNA measurements are shown for one viral ORF.

Reference

Decoding human cytomegalovirus. Stern-Ginossar N, Weisburd B, Michalski A, Le VT, Hein MY, Huang, Ming S, Shen B, Qian SB, Hengel H, Mann M, Ingolia NT, Weissman JS. Science, 23 November 2012: 1088-1093.

Pubmed link