How many protein molecules do we have in our cells?

Proteins are crucial to the structure and function of living cells. Great effort is now being applied to understanding cells at a systemic level and knowledge of the proteins and their interactions are critical to understanding the integrative functions of biological systems. Former HFSP Long-Term Fellow, Christine Vogel, has developed important experimental and theoretical approaches to investigating the myriads of proteins in the cell. Following a productive and successful period in the laboratory of her HFSP host, Edward Marcotte, she has been appointed Assistant Professor at New York University, where she is applying a key, highly cited method that she co-developed, to different biological questions that revolve around the regulation of cellular protein production.

Christine Vogel obtained her PhD in Computational and Structural Biology at the MRC Laboratory of Molecular Biology in Cambridge, UK.  She was an HFSP Long-term post-doctoral fellow with Dr. Edward Marcotte at the University of Texas in Austin (USA). Since January 2011, she is an assistant professor in the Center for Genomics and Systems Biology, New York University.

Proteins are molecules with key roles in our cells: they receive, amplify, and transduce signals, regulate enzymatic reactions and entire biological pathways, produce and transport molecules, form cellular structures, etc, etc. Thus the functions of cells are crucially dependent on the correct production of protein molecules, a process that is highly regulated in all organisms, from bacteria to humans. The production of proteins consists of two major steps: ‘transcription’ of messenger RNA molecules from DNA, and the subsequent ‘translation’ of these RNAs into proteins.  If our cells were little kitchens, then the DNA would be the cookbook with all the recipes, RNAs the transcripts made from the book, and proteins the final meals that are made. But how many proteins are on average produced per messenger RNA?  How many molecules of a specific protein do we observe in a human cell? How do protein concentrations change in response to an extra-cellular signal? And how fast do these changes occur?

To answer questions like these we need methods that enable us to estimate the concentrations of many different proteins and allow us to follow a molecular response over time and to understand the cellular behavior at a global level. When I joined Dr. Edward Marcotte’s laboratory at the University of Texas at Austin, we realized the need for such a method.  Together with Drs. Peng Lu and Rong Wang, we developed a mass spectrometry based method called APEX (standing for Absolute Protein EXpression index) that allows for the relatively simple but still reliable measurement of protein concentrations in complex samples from any organism whose protein sequences are known.  The method was published in a paper in Nature Biotechnology (Lu et al, 2007), and was highlighted by the journal as one amongst the top ten most cited publications of the past five years (Baker and DeFrancesco, 2010).

In brief, the method counts how often mass spectra are observed which are associated with a protein of interest. Prior to a mass spectrometry experiment, all proteins in the sample are enzymatically cut into smaller, well-defined pieces called peptides. Bombarding the peptides with inert gas fragments them into even smaller pieces and eventually into their amino acid building blocks.  Mass spectrometry is then used to measure the masses of the ionized peptides and their fragments, and computer software helps us to identify the corresponding proteins from the measured masses. During each experiment, we collect many thousands of mass spectra - assembling these into proteins is a non-trivial task. 


Figure: The APEX method allows sophisticated analysis of thousands of proteins in cell samples.

However, not all peptides like to be ionized, and the inability to ionize interferes with detection of these peptides in the mass spectrometer. Furthermore, long proteins split into more peptides than short proteins and therefore have higher chances of detection. A crucial component of our APEX method is a correction for these confounding factors, both the differential ionizability of the peptides and different protein lengths. But after doing so, we were able to measure thousands of protein concentrations with an error smaller than two to three fold.  

But APEX has additional advantages. In contrast to other methods, APEX does not require isotopic labeling of the proteins in the sample which makes the method simple and easy to use at large scale, and amenable for application to human tissue samples.  With the help of our paper, label-free mass spectrometry methods are now becoming a widely accepted alternative to methods employing isotopes, and researchers begin to routinely incorporate the measurement of protein concentrations into their experimental design and models.

But we did not stop at method development and applied APEX to biological questions that we are interested in, and that now could be answered. Already in the original APEX paper, we showed that Nature is extremely efficient in recycling its messenger molecules: on average, ~5,000 protein molecules are produced per RNA in baker’s yeast, and about an order of magnitude less in the bacterium Escherichia coli. Such efficiency is amazing to us, but for the cell it opens possibilities to fine-tune protein production and to regulate concentrations depending on the precise needs. Recently, other researchers and ourselves confirmed this enormous potential for regulation by another observation (Laurent et al, 2010; Schrimpf et al, 2009): protein concentrations are conserved across organisms -- which means when comparing bacteria, yeast, worm and fly, their evolutionarily related proteins have similar concentrations. But surprisingly, we find that these protein concentrations are even more conserved than the concentrations of the corresponding messenger RNAs.  In other words, the final products, the proteins, are observed at seemingly more finely regulated levels than the intermediate RNA molecules.  What is happening in our cells on the way from RNA to protein that causes this effect? 

The APEX method not only works in smaller model organisms, but also in human cells. In another study, we combined the method with statistical analysis to compare the concentrations of >1,000 proteins with those of the matching RNAs collected from a brain cancer cell line (Vogel et al, 2010). Somewhat unexpectedly we found that protein translation seemed to be at least as important as transcription, emphasizing the need to examine this side of the processes that produce proteins in our cells. Given that many methods focus on transcription and mRNA concentrations but not as much on proteins, this is an important finding. While transcription explained about 30% of the variation in observed protein concentrations, translation and protein degradation explained another 30 to 40%.  Very recently, another entirely independent study now confirmed these conclusions (Schwanhausser et al, 2011): ‘transcription regulation is only half the story’ (Plotkin, 2010), and translation must not be forgotten.

In the future, I will continue my research at this exciting interface of high-resolution technology, computational and mathematical analysis, as well as biological experimentation. In my newly formed lab at New York University (USA), we continuously refine and improve the mass spectrometry methods to analyze protein concentrations, and use this technology to study how cells respond to external stress and how this is reflected in the regulation of protein expression.  In the human cell system, stress response directly relates to a number of diseases, e.g. cancer or neurodegeneration during Alzheimer’s or Parkinson’s. These stresses will be reproduced experimentally in the lab, and changes in protein concentrations monitored over time – with the goal of integrating these measurements into dynamical models and to understand, and eventually to manipulate, the inner workings of our cells.



Baker M, DeFrancesco L (2010) Five more years of Nature Biotechnology research. Nat Biotechnol 29: 221-2

Laurent J, Vogel C, Kwon T, Craig S, Boutz DR, Huse H, Nozue K, Walia H, Whiteley M, Ronald P, Marcotte EM (2010) Protein abundances are more conserved than mRNA abundances across diverse taxa. Proteomics 10: 4209-4212.

Lu P, Vogel C, Wang R, Yao X, Marcotte EM (2007) Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 25: 117-124.

Plotkin JB (2010) Transcriptional regulation is only half the story. Mol Syst Biol 6: 406.

Schrimpf SP, Weiss M, Reiter L, Ahrens CH, Jovanovic M, Malmstrom J, Brunner E, Mohanty S, Lercher MJ, Hunziker PE, Aebersold R, von Mering C, Hengartner MO (2009) Comparative Functional Analysis of the Caenorhabditis elegans and Drosophila melanogaster Proteomes. PLoS Biol 7: e48.

Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M (2011) Global quantification of mammalian gene expression control. Nature 473: 337-342.

Vogel C, Abreu Rde S, Ko D, Le SY, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO (2010) Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol Syst Biol 6: 400.