Understanding cell populations as multitasking systems

Newly developed technologies enable the expression of many genes in single cells to be measured, thus posing the challenge of analyzing and understanding this high-dimensional data. A recent theory on multi-objective task optimization suggests that this data should be arranged in simple geometrical shapes. Applying this theory allows the tasks and the trade-offs that the tissue faces to be revealed.

HFSP Program Grant holder Uri Alon and colleagues
authored on Fri, 13 November 2015

In the past, biological experiments usually pooled together millions of cells, masking the differences between individual cells. Current technology takes a big step forward by measuring gene expression from individual cells. Interpreting this data is challenging because we need to understand how cells are arranged in a high-dimensional gene expression space, whose coordinates are the expression of different genes.

Here, we test the recent Pareto theory that suggests that cells facing multiple tasks should be arranged in simple low-dimensional polygons such as lines, triangles, tetrahedrons and so on, generally called polytopes [1]. The vertices of the polytopes are gene programs optimal for each of the tasks. We find evidence for such simplicity in a variety of tissues—spleen, bone marrow and intestine—analyzed by different single-cell technologies. We find that cells are distributed inside polytopes, such as tetrahedrons or four-dimensional polytopes, with cells closest to each vertex responsible for different key tasks.

Click on image to enlarge.

Figure: Single cells fall on low dimensional polygons, revealing their tasks and the trade-offs between them. Data points are single cells, shown in the first 3 principal components of their gene expression coordinates. Each polygon vertex corresponds to a cellular task. From left to right: human colon intestinal cells and a subset of intestinal progenitor cells measured by single-cell qPCR [2]; mouse spleen dendritic cells measured by single-cell RNA-Seq [3].

For example, intestinal crypts are composed of intestinal stem cells which differentiate into absorptive cells (enterocytes and secretory cells), mainly goblet cells. We analyzed single-cell qPCR data of intestinal cells [2], and found that they fall within a tetrahedron in gene expression space. Three of the vertices of this tetrahedron have gene expression programs which correspond to the known tasks of the intestinal tissue: absorbing nutrients, secreting mucus layer, and renewing the tissue. Enterocytes, goblet cells and intestinal stem cells are found near these archetypes respectively. The fourth archetype does not match to any known cell type, despite having a clear characteristic gene expression profile enriched with genes related to development and embryonic patterning. We hypothesize that this archetype may indicate a step in differentiation between stem cells and enterocytes.

When inspecting only the intestinal progenitor cells that give rise to the other cell types in the crypt, we see they also fill a tetrahedron whose vertices correspond to several key sub-tasks. One archetype is enriched with pluripotency and division markers. As cells mature they move towards the plane created by the three other archetypes, which correspond to three differentiation tasks: stopping divisions, activating cell type specific genes, and reducing global gene expression. This tetrahedron is nearly uniformly filled, suggesting that progenitor cells span a continuum of gene expression states, and are not divided into distinct subtypes. The shape of this continuum allow us to reveal the tasks that the cell population performs and to describe each cell in terms of its distance from the four archetypes.

Another interesting example is dendritic cells in the spleen, which are known to carry out several immune functions. We studied a dataset of such cells acquired by single-cell RNA-Seq [3]. Using the Pareto perspective, we show that for this cell population, which is hard to divide into distinguishable subtypes, there is a trade-off between four immune tasks: response to virus through interferon pathways, formation of cytoskeletal features for maturation and phagocytosis, stimulation of lymphocytes by cytokine secretion and antigen presentation, and LPS mediated apoptosis.

The Pareto perspective can be generally used to understand the geometry of single-cell data and to infer the tasks of individual cells in a tissue. More generally, this study indicates that the concept of cell type may be expanded. In addition to separated clusters in gene-expression space, we suggest a new possibility: a continuum of states within a polyhedron, in which the vertices represent specialists at key tasks, with generalist cells lying in the middle.

Text by Yael Korem

Reference

Geometry of the Gene Expression Space of Individual Cells, Y. Korem, P. Szekely, Y. Hart, H. Sheftel, J. Hausser, A. Mayo, M. E. Rothenberg, T. Kalisky, and U. Alon, 2015, PLoS Comput Biol. 2015;11: e1004224. doi:10.1371/journal.pcbi.1004224

Other references

[1] Evolutionary Trade-Offs, Pareto Optimality, and the Geometry of Phenotype Space. Science. Shoval O, Sheftel H, Shinar G, Hart Y, Ramote O, Mayo A, et al. 2012;336: 1157–1160. doi:10.1126/science.1217405.

[2] Single-cell dissection of transcriptional heterogeneity in human colon tumors. Dalerba P, Kalisky T, Sahoo D, Rajendran PS, Rothenberg ME, Leyrat AA, et al. Nat Biotechnol. 2011;29: 1120–1127. doi:10.1038/nbt.2038.

[3] Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types. Jaitin DA, Kenigsberg E, Keren-Shaul H, Elefant N, Paul F, Zaretsky I, et al. Science. 2014;343: 776–779. doi:10.1126/science.1247651.

Pubmed link