Transcription factors recognize and bind to short DNA motifs. Such motifs, however, are insufficient to accurately predict where in the genome a transcription factor is bound, as most transcription factors bind to fewer than 10% of their motifs in any given cell type. Moreover, transcription factors exhibit cell type-specific binding patterns, despite no change in the DNA binding motif or genomic sequence. It is known that transcription factors influence each other’s binding, either through direct interactions, competition for binding sites, or indirectly by altering the organisation and accessibility of the chromatin around which the genome is wrapped. These interactions should be reflected in a ‘grammar': a logic in how the organisation of individual transcription factor binding motifs shapes how they are interpreted in a given cellular context.
It has traditionally been difficult to make sense of this grammar because it has been difficult to figure out how changes in DNA sequence affect transcription factor binding in a native genomic context. Existing approaches have either studied transcription factors in non-native environments like test tubes and plasmids that lack important interactions with chromatin or have inferred relationships from genome-wide surveys of transcription factor binding in which each binding site has numerous surrounding DNA sequence differences that could be influencing its binding. In recent work, Szczesnik et al have developed an approach enabling sensitive measurement of the binding of a transcription factor at thousands of custom-designed variable DNA sequences by using CRISPR-Cas9 technology to transplant these sequences into a fixed, defined site in the genome. By surveying the changes in transcription factor binding induced by minimal sequence alterations while controlling for effects of the surrounding DNA sequence and chromatin composition, they have revealed new insights into the grammar governing the binding of a key transcription factor, Tcf7l2, in mouse embryonic stem cells.
The authors find that binding of Tcf7l2 to transplanted 99-base pair DNA sequences faithfully recapitulates its binding pattern at native versions of those sequences. They then dissect what is contributing to Tcf7l2 binding by systematically shifting the Tcf7l2 motif in three-base pair increments across dozens of such 99-bp Tcf7l2-bound sequences. They find that Tcf7l2 binding is dramatically curtailed when nearby binding sites for two master regulators of embryonic stem cell fate, Oct4 and Klf4, are disrupted. Moreover, Tcf7l2 binding strength is heavily influenced by the distance between its motif and those of its Oct4 and Klf4 cofactors. Tcf7l2 binding is strongest in a window between 20-50 base pairs away from these cofactors and, most fascinatingly, oscillates between strong and weak as its motif is shifted within this window with an 11-base pair periodicity that matches the turn of the DNA helix. The most likely explanation is that Tcf7l2 binding is maximised when it occupies the same side of the DNA as its cofactors within an appropriate distance to allow protein-protein interaction. These findings highlight the surprising importance of motif spacing on transcription factor binding in the genome and pave the way for more nuanced understanding of what governs the sequence architecture of gene regulatory elements.
This work was made possible by HFSP funding, which paired Dr. Sherwood’s expertise in high-throughput CRISPR-Cas9 screening with Dr. Ho’s expertise in applying machine learning to genomic datasets.