
Orphan cpg islands amplify poised enhancer regulatory activity and determine target gene responsiveness
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:

ABSTRACT CpG islands (CGIs) represent a widespread feature of vertebrate genomes, being associated with ~70% of all gene promoters. CGIs control transcription initiation by conferring nearby
promoters with unique chromatin properties. In addition, there are thousands of distal or orphan CGIs (oCGIs) whose functional relevance is barely known. Here we show that oCGIs are an
essential component of poised enhancers that augment their long-range regulatory activity and control the responsiveness of their target genes. Using a knock-in strategy in mouse embryonic
stem cells, we introduced poised enhancers with or without oCGIs within topologically associating domains harboring genes with different types of promoters. Analysis of the resulting cell
lines revealed that oCGIs act as tethering elements that promote the physical and functional communication between poised enhancers and distally located genes, particularly those with large
CGI clusters in their promoters. Therefore, by acting as genetic determinants of gene–enhancer compatibility, CGIs can contribute to gene expression control under both physiological and
potentially pathological conditions. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through
your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this
journal Receive 12 print issues and online access $209.00 per year only $17.42 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now
Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer
support SIMILAR CONTENT BEING VIEWED BY OTHERS PROMOTER-PROXIMAL CTCF BINDING PROMOTES DISTAL ENHANCER-DEPENDENT GENE ACTIVATION Article 04 January 2021 THE CHROMATIN, TOPOLOGICAL AND
REGULATORY PROPERTIES OF PLURIPOTENCY-ASSOCIATED POISED ENHANCERS ARE CONSERVED IN VIVO Article Open access 16 July 2021 INCREASED ENHANCER–PROMOTER INTERACTIONS DURING DEVELOPMENTAL
ENHANCER ACTIVATION IN MAMMALS Article 20 March 2024 DATA AVAILABILITY All the 4C–seq data generated in this study are available through the GEO (GSE156465). All the generated transgenic ESC
lines are available upon request. REFERENCES * Spitz, F. & Furlong, E. E. M. Transcription factors: from enhancer binding to developmental control. _Nat. Rev. Genet._ 13, 613–626
(2012). Article CAS PubMed Google Scholar * Kvon, E. Z. Using transgenic reporter assays to functionally characterize enhancers in animals. _Genomics_ 106, 185–192 (2015). Article CAS
PubMed Google Scholar * Furlong, E. E. M. & Levine, M. Developmental enhancers and chromosome topology. _Science_ 361, 1341–1345 (2018). Article CAS PubMed PubMed Central Google
Scholar * Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. _Nature_ 485, 376–380 (2012). Article CAS PubMed PubMed Central
Google Scholar * Laugsch, M. et al. Modeling the pathological long-range regulatory effects of human structural variation with patient-specific hiPSCs. _Cell Stem Cell_ 24, 736–752.e12
(2019). Article CAS PubMed Google Scholar * Rao, S. S. P. et al. Cohesin loss eliminates all loop domains. _Cell_ 171, 305–320.e24 (2017). Article CAS PubMed PubMed Central Google
Scholar * Nora, P. et al. Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. _Cell_ 169, 930–944 (2017). Article CAS PubMed
PubMed Central Google Scholar * Ghavi-Helm, Y. et al. Highly rearranged chromosomes reveal uncoupling between genome topology and gene expression. _Nat. Genet._ 51, 1272–1282 (2019).
Article CAS PubMed PubMed Central Google Scholar * Kraft, K. et al. Serial genomic inversions induce tissue-specific architectural stripes, gene misexpression and congenital
malformations. _Nat. Cell Biol._ 21, 305–310 (2019). Article CAS PubMed Google Scholar * Kikuta, H. et al. Genomic regulatory blocks encompass multiple neighboring genes and maintain
conserved synteny in vertebrates. _Genome Res._ 17, 545–555 (2007). Article CAS PubMed PubMed Central Google Scholar * Arnold, C. D. et al. Genome-wide assessment of sequence-intrinsic
enhancer responsiveness at single-base-pair resolution. _Nat. Biotechnol._ 35, 136–144 (2016). Article PubMed PubMed Central CAS Google Scholar * Haberle, V. et al. Transcriptional
cofactors display specificity for distinct types of core promoters. _Nature_ 570, 122–126 (2019). Article CAS PubMed Google Scholar * Spielmann, M., Lupiáñez, D. G. & Mundlos, S.
Structural variation in the 3D genome. _Nat. Rev. Genet._ 19, 453–467 (2018). Article CAS PubMed Google Scholar * Cruz-Molina, S. et al. PRC2 facilitates the regulatory topology required
for poised enhancer function during pluripotent stem cell differentiation. _Cell Stem Cell_ 20, 689–705.e9 (2017). Article CAS PubMed Google Scholar * Rada-Iglesias, A. et al. A unique
chromatin signature uncovers early developmental enhancers in humans. _Nature_ 470, 279–283 (2011). Article CAS PubMed Google Scholar * Deaton, A. M. & Bird, A. CpG islands and the
regulation of transcription. _Genes Dev._ 25, 1010–1022 (2011). Article CAS PubMed PubMed Central Google Scholar * Bell, J. S. K. & Vertino, P. M. Orphan CpG islands define a novel
class of highly active enhancers. _Epigenetics_ 12, 449–464 (2017). Article PubMed PubMed Central Google Scholar * Illingworth, R. S. et al. Orphan CpG islands identify numerous
conserved promoters in the mammalian genome. _PLoS Genet._ 6, e1001134 (2010). Article PubMed PubMed Central CAS Google Scholar * Steinhaus, R., Gonzalez, T., Seelow, D. & Robinson,
P. N. Pervasive and CpG-dependent promoter-like characteristics of transcribed enhancers. _Nucleic Acids Res._ 48, 5306–5317 (2020). Article CAS PubMed PubMed Central Google Scholar *
Bogdanović, O. et al. Active DNA demethylation at enhancers during the vertebrate phylotypic period. _Nat. Genet._ 48, 417–426 (2016). Article PubMed PubMed Central CAS Google Scholar *
Long, H. K. et al. Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates. _eLife_ 2, e00348 (2013). Article PubMed PubMed
Central Google Scholar * Lenhard, B., Sandelin, A. & Carninci, P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. _Nat. Rev. Genet._ 13,
233–245 (2012). Article CAS PubMed Google Scholar * Williams, K. et al. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. _Nature_ 473, 343–349 (2011).
Article CAS PubMed PubMed Central Google Scholar * Blackledge, N. P. et al. Variant PRC1 complex-dependent H2A ubiquitylation drives PRC2 recruitment and polycomb domain formation.
_Cell_ 157, 1445–1459 (2014). Article CAS PubMed PubMed Central Google Scholar * Aljazi, M. B., Gao, Y., Wu, Y., Mias, G. I. & He, J. Cell signaling coordinates global PRC2
recruitment and developmental gene expression in murine embryonic stem cells. _iScience_ 23, 101646 (2020). Article CAS PubMed PubMed Central Google Scholar * Habibi, E. et al.
Whole-genome bisulfite sequencing of two distinct interconvertible DNA methylomes of mouse embryonic stem cells. _Cell Stem Cell_ 13, 360–369 (2013). Article CAS PubMed Google Scholar *
Zylicz,J. J. et al. Chromatin dynamics and the role of G9a in gene regulation and enhancer silencing during early mouse development. _eLife_ 4, e09571 (2015). Article PubMed PubMed Central
Google Scholar * Lee, S. M. et al. Intragenic CpG islands play important roles in bivalent chromatin assembly of developmental genes. _Proc. Natl Acad. Sci. USA_ 114, E1885–E1894 (2017).
CAS PubMed PubMed Central Google Scholar * Bolt, C. C. & Duboule, D. The regulatory landscapes of developmental genes. _Development_ 147, dev171736 (2020). Article CAS PubMed
PubMed Central Google Scholar * Blackledge, N. P. & Klose, R. CpG island chromatin. _Epigenetics_ 2294, 147–152 (2011). Article CAS Google Scholar * Turberfield, A. H. et al. KDM2
proteins constrain transcription from CpG island gene promoters independently of their histone demethylase activity. _Nucleic Acids Res._ 47, 9005–9023 (2019). Article CAS PubMed PubMed
Central Google Scholar * Arab, K. et al. GADD45A binds R-loops and recruits TET1 to CpG island promoters. _Nat. Genet._ 51, 217–223 (2019). Article CAS PubMed PubMed Central Google
Scholar * Diez, R. & Storey, K. G. Markers in vertebrate neurogenesis. _Nat. Rev. Neurosci._ 2, 835–839 (2001). Google Scholar * Bentovim, L., Harden, T. T. & DePace, A. H.
Transcriptional precision and accuracy in development: from measurements to models and mechanisms. _Development_ 144, 3855–3866 (2017). Article CAS PubMed PubMed Central Google Scholar
* Boyes, J. & Bird, A. DNA methylation inhibits transcription indirectly via a methyl-CpG binding protein. _Cell_ 64, 1123–1134 (1991). Article CAS PubMed Google Scholar * Klemm, S.
L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. _Nat. Rev. Genet._ 20, 207–220 (2019). Article CAS PubMed Google Scholar * You, J. S. et al.
OCT4 establishes and maintains nucleosome-depleted regions that provide additional layers of epigenetic regulation of its target genes. _Proc. Natl Acad. Sci. USA_ 108, 14497–14502 (2011).
Article CAS PubMed PubMed Central Google Scholar * Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. _Nature_ 480, 490–495 (2011).
Article CAS PubMed Google Scholar * Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. _Nature_ 465, 182–187 (2010). Article CAS PubMed PubMed
Central Google Scholar * Mas, G. & Di Croce, L. The role of Polycomb in stem cell genome architecture. _Curr. Opin. Cell Biol._ 43, 87–95 (2016). Article CAS PubMed Google Scholar
* Yan, J. et al. Histone H3 lysine 4 monomethylation modulates long-range chromatin interactions at enhancers. _Cell Res._ 28, 204–220 (2018). Article CAS PubMed PubMed Central Google
Scholar * Denholtz, M. et al. Long-range chromatin contacts in embryonic stem cells reveal a role for pluripotency factors and polycomb proteins in genome organization. _Cell Stem Cell_ 13,
602–616 (2013). Article CAS PubMed Google Scholar * Wang, J. et al. A protein interaction network for pluripotency of embryonic stem cells. _Nature_ 444, 364–368 (2006). Article CAS
PubMed Google Scholar * Pachano, T., Crispatzu, G. & Rada-Iglesias, A. Polycomb proteins as organizers of 3D genome architecture in embryonic stem cells. _Brief. Funct. Genomics_ 18,
358–366 (2019). CAS PubMed Google Scholar * Bantignies, F. et al. Polycomb-dependent regulatory contacts between distant Hox loci in _Drosophila_. _Cell_ 144, 214–226 (2011). Article CAS
PubMed Google Scholar * Isono, K. et al. SAM domain polymerization links subnuclear clustering of PRC1 to gene silencing. _Dev. Cell_ 26, 565–577 (2013). Article CAS PubMed Google
Scholar * Loubiere, V., Papadopoulos, G. L., Szabo, Q., Martinez, A. M. & Cavalli, G. Widespread activation of developmental gene expression characterized by PRC1-dependent chromatin
looping. _Sci. Adv._ 6, eaax4001 (2020). Article CAS PubMed PubMed Central Google Scholar * Benabdallah, N. S. et al. Decreased enhancer-promoter proximity accompanying enhancer
activation. _Mol. Cell_ 76, 473–484 (2019). Article CAS PubMed PubMed Central Google Scholar * Lim, B., Heist, T., Levine, M. & Fukaya, T. Visualization of transvection in living
_Drosophila_ embryos. _Mol. Cell_ 70, 287–296.e6 (2018). Article CAS PubMed PubMed Central Google Scholar * Beck, S. et al. Implications of CpG islands on chromosomal architectures and
modes of global gene regulation. _Nucleic Acids Res._ 46, 4382–4391 (2018). Article CAS PubMed PubMed Central Google Scholar * Liu, S. et al. From 1D sequence to 3D chromatin dynamics
and cellular functions: a phase separation perspective. _Nucleic Acids Res._ 46, 9367–9383 (2018). Article CAS PubMed PubMed Central Google Scholar * Kurup, J. T., Han, Z., Jin, W.
& Kidder, B. L. H4K20me3 methyltransferase SUV420H2 shapes the chromatin landscape of pluripotent embryonic stem cells. _Development_ 147, dev188516 (2020). Article CAS PubMed PubMed
Central Google Scholar * Andersson, R., Sandelin, A. & Danko, C. G. A unified architecture of transcriptional regulatory elements. _Trends Genet._ 31, 426–433 (2015). Article CAS
PubMed Google Scholar * Lloret-Llinares, M. et al. The RNA exosome contributes to gene expression regulation during stem cell differentiation. _Nucleic Acids Res._ 46, 11502–11513 (2018).
Article CAS PubMed PubMed Central Google Scholar * Local, A. et al. Identification of H3K4me1-associated proteins at mammalian enhancers. _Nat. Genet._ 50, 73–82 (2018). Article CAS
PubMed Google Scholar * Etchegaray, J. P. et al. The histone deacetylase SIRT6 restrains transcription elongation via promoter-proximal pausing. _Mol. Cell_ 75, 683–699 (2019). Article
CAS PubMed PubMed Central Google Scholar * Hirabayashi, S. et al. NET-CAGE characterizes the dynamics and topology of human transcribed _cis_-regulatory elements. _Nat. Genet._ 51,
1369–1379 (2019). Article CAS PubMed Google Scholar * Schoenfelder, S. et al. Polycomb repressive complex PRC1 spatially constrains the mouse embryonic stem cell genome. _Nat. Genet._
47, 1179–1186 (2015). Article CAS PubMed PubMed Central Google Scholar * Butler, J. E. F. & Kadonaga, J. T. Enhancer–promoter specificity mediated by DPE or TATA core promoter
motifs. _Genes Dev._ 15, 2515–2519 (2001). Article CAS PubMed PubMed Central Google Scholar * Gómez-Marín, C. et al. Evolutionary comparison reveals that diverging CTCF sites are
signatures of ancestral topological associating domains borders. _Proc. Natl Acad. Sci. USA_ 112, 7542–7547 (2015). Article PubMed PubMed Central CAS Google Scholar * O’Brien, L. L. et
al. Transcriptional regulatory control of mammalian nephron progenitors revealed by multi-factor cistromic analysis and genetic studies. _PLoS Genet._ 14, e1007181 (2018). Article PubMed
PubMed Central CAS Google Scholar * Catarino, R. R. & Stark, A. Assessing sufficiency and necessity of enhancer activities for gene expression and the mechanisms of transcription
activation. _Genes Dev._ 32, 202–223 (2018). Article CAS PubMed PubMed Central Google Scholar * Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic
rewiring of gene-enhancer interactions. _Cell_ 161, 1012–1025 (2015). Article PubMed PubMed Central CAS Google Scholar * Kragesteen, B. K. et al. Dynamic 3D chromatin architecture
contributes to enhancer specificity and limb morphogenesis. _Nat. Genet._ 50, 1463–1473 (2018). Article CAS PubMed Google Scholar * Li, X. & Noll, M. Compatibility between enhancers
and promoters determines the transcriptional specificity of gooseberry and gooseberry neuro in the _Drosophila_ embryo. _EMBO J._ 13, 400–406 (1994). Article PubMed PubMed Central Google
Scholar * Zabidi, M. A. et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. _Nature_ 518, 556–559 (2015). Article CAS PubMed Google
Scholar * Mahmoudi, T., Katsani, K. R. & Verrijzer, C. P. GAGA can mediate enhancer function in _trans_ by linking two separate DNA molecules. _EMBO J._ 21, 1775–1781 (2002). Article
CAS PubMed PubMed Central Google Scholar * Calhoun, V. C. & Levine, M. Long-range enhancer-promoter interactions in the Scr-Antp interval of the _Drosophila_ Antennapedia complex.
_Proc. Natl Acad. Sci. USA_ 100, 9878–9883 (2003). Article CAS PubMed PubMed Central Google Scholar * Calhoun, V. C., Stathopoulos, A. & Levine, M. Promoter-proximal tethering
elements regulate enhancer-promoter specificity in the _Drosophila_ Antennapedia complex. _Proc. Natl Acad. Sci. USA_ 99, 9243–9247 (2002). Article CAS PubMed PubMed Central Google
Scholar * Boyle, S. et al. A central role for canonical PRC1 in shaping the 3D nuclear landscape. _Genes Dev._ 34, 931–949 (2020). Article CAS PubMed PubMed Central Google Scholar *
Perino, M. et al. MTF2 recruits Polycomb Repressive Complex 2 by helical-shape-selective DNA binding. _Nat. Genet._ 50, 1002–1010 (2018). Article CAS PubMed Google Scholar * Beltran, M.
et al. The interaction of PRC2 with RNA or chromatin is mutually antagonistic. _Genome Res._ 26, 896–907 (2016). Article CAS PubMed PubMed Central Google Scholar * Crispatzu, G. et al.
The chromatin, topological and regulatory properties of pluripotency-associated poised enhancers are conserved in vivo. Preprint at _bioRxiv_ https://doi.org/10.1101/2021.01.18.427085
(2021). * Shrinivas, K. et al. Enhancer features that drive formation of transcriptional condensates. _Mol. Cell_ 75, 549–561.e7 (2019). Article CAS PubMed PubMed Central Google Scholar
* Dimitrova, E. et al. FBXl19 recruits CDK-Mediator to CpG islands of developmental genes priming them for activation during lineage commitment. _eLife_ 7, e37084 (2018). Article PubMed
PubMed Central Google Scholar * Long, H. K., Blackledge, N. P. & Klose, R. J. ZF-CxxC domain-containing proteins, CpG islands and the chromatin connection. _Biochem. Soc. Trans._ 41,
727–740 (2013). Article CAS PubMed PubMed Central Google Scholar * Mastrangelo, I. A., Courey, A. J., Wall, J. S., Jackson, S. P. & Hough, P. V. C. DNA looping and Sp1 multimer
links: a mechanism for transcriptional synergism and enhancement. _Proc. Natl Acad. Sci. USA_ 88, 5670–5674 (1991). Article CAS PubMed PubMed Central Google Scholar * Su, W., Jackson,
S., Tjian, R. & Echols, H. DNA looping between sites for transcriptional activation: self-association of DNA-bound Sp1. _Genes Dev._ 5, 820–826 (1991). Article CAS PubMed Google
Scholar * Hartl, D. et al. CG dinucleotides enhance promoter activity independent of DNA methylation. _Genome Res._ 29, 554–563 (2019). Article CAS PubMed PubMed Central Google Scholar
* Wang, Y. et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. _Genome Biol._ 19, 151 (2018). Article PubMed
PubMed Central CAS Google Scholar * Bonev, B. et al. Multiscale 3D genome rewiring during mouse neural development. _Cell_ 171, 557–572.e24 (2017). Article CAS PubMed PubMed Central
Google Scholar * Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. _Mol. Cell_
38, 576–589 (2010). Article CAS PubMed PubMed Central Google Scholar * Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. _Genome Biol._ 12, R83
(2011). Article CAS PubMed PubMed Central Google Scholar * Gouti, M. et al. In vitro generation of neuromesodermal progenitors reveals distinct roles for wnt signalling in the
specification of spinal cord and paraxial mesoderm identity. _PLoS Biol._ 12, e1001937 (2014). Article PubMed PubMed Central CAS Google Scholar * Matsuda, K. & Kondoh, H.
Dkk1-dependent inhibition of Wnt signaling activates Hesx1 expression through its 5′ enhancer and directs forebrain precursor development. _Genes Cells_ 19, 374–385 (2014). Article CAS
PubMed Google Scholar * Yao, X. et al. Tild-CRISPR allows for efficient and precise gene knockin in mouse and human cells. _Dev. Cell_ 45, 526–536.e5 (2018). Article CAS PubMed Google
Scholar * Giresi, P. G., Kim, J., McDaniell, R. M., Iyer, V. R. & Lieb, J. D. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from
human chromatin. _Genome Res._ 17, 877–885 (2007). Article CAS PubMed PubMed Central Google Scholar * Requena, F. et al. NOMePlot: analysis of DNA methylation and nucleosome occupancy
at the single molecule. _Sci. Rep._ 9, 8140 (2019). Article PubMed PubMed Central CAS Google Scholar * Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. _J. Mol.
Biol._ 196, 261–282 (1987). Article CAS PubMed Google Scholar * Madeira, F. et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. _Nucleic Acids Res._ 47, W636–W641
(2019). Article CAS PubMed PubMed Central Google Scholar * Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. _Bioinformatics_ 26,
841–842 (2010). Article CAS PubMed PubMed Central Google Scholar * Karolchik, D. et al. The UCSC Table Browser data retrieval tool. _Nucleic Acids Res._ 32, 493–496 (2004). Article CAS
Google Scholar * Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. _Bioinformatics_ 32,
3047–3048 (2016). Article CAS PubMed PubMed Central Google Scholar * Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data.
_Bioinformatics_ 30, 2114–2120 (2014). Article CAS PubMed PubMed Central Google Scholar * Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. _EMBnet
J._ 17, 10–12 (2011). Article Google Scholar * Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. _Nat. Methods_ 9, 357–359 (2012). Article CAS PubMed PubMed
Central Google Scholar * Li, H. et al. The Sequence Alignment/Map format and SAMtools. _Bioinformatics_ 25, 2078–2079 (2009). Article PubMed PubMed Central CAS Google Scholar * Feng,
J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. _Nat. Protoc._ 7, 1728–1740 (2012). Article CAS PubMed Google Scholar * Pagès, H. BSgenome:
software infrastructure for efficient representation of full genomes and their SNPs. R package version 1.56.0 (2020). * Wang, J. et al. Nascent RNA sequencing analysis provides insights into
enhancer-mediated gene regulation. _BMC Genomics_ 19, 633 (2018). Article PubMed PubMed Central CAS Google Scholar * Ramírez, F. et al. deepTools2: a next generation web server for
deep-sequencing data analysis. _Nucleic Acids Res._ 44, W160–W165 (2016). Article PubMed PubMed Central CAS Google Scholar * Cliff, N. Dominance statistics: ordinal analyses to answer
ordinal questions. _Psychol. Bull._ 114, 494–509 (1993). Article Google Scholar * Macbeth, G., Razumiejczyk, E. & Ledesma, R. D. Cliff´s Delta Calculator: a non-parametric effect size
program for two groups of observations. _Univ. Psychol._ 10, 545–555 (2011). Article Google Scholar * Bush, S. J., McCulloch, M. E. B., Summers, K. M., Hume, D. A. & Clark, E. L.
Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries. _BMC Bioinformatics_ 18, 301 (2017). Article PubMed PubMed Central CAS Google
Scholar * Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. _Bioinformatics_ 36, 311–316 (2020). Article CAS PubMed Google
Scholar * Flyamer, I. M., Illingworth, R. S. & Bickmore, W. A. Coolpup.py: versatile pile-up analysis of Hi-C data. _Bioinformatics_ 36, 2980–2985 (2020). Article CAS PubMed PubMed
Central Google Scholar * Bailey, T. L. et al. MEME Suite: tools for motif discovery and searching. _Nucleic Acids Res._ 37, 202–208 (2009). Article CAS Google Scholar * Krueger, F.
& Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. _Bioinformatics_ 27, 1571–1572 (2011). Article CAS PubMed PubMed Central Google
Scholar * Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. _Nat. Methods_ 9, 357–359 (2012). Article CAS PubMed PubMed Central Google Scholar * Kent, W. J.,
Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. _Bioinformatics_ 26, 2204–2207 (2010). Article CAS PubMed
PubMed Central Google Scholar * Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. _Bioinformatics_ 30, 1006–1007 (2014). Article PubMed
CAS Google Scholar * Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. _Nature_ 515, 402–405 (2014). Article CAS PubMed PubMed
Central Google Scholar Download references ACKNOWLEDGEMENTS We thank the Rada-Iglesias laboratory members for insightful comments and critical reading of the manuscript. T.P. is supported
by a doctoral fellowship from the DAAD (Germany). V.S.-G. is supported by a doctoral fellowship from the University of Cantabria (Spain). Work in the Rada-Iglesias laboratory was supported
by the EMBO Young Investigator Programme; CMMC intramural funding (Germany); the German Research Foundation (DFG) (Research Grant no. RA 2547/2-1); ‘Programa STAR-Santander Universidades,
Campus Cantabria Internacional de la convocatoria CEI 2015 de Campus de Excelencia Internacional’ (Spain); the Spanish Ministry of Science, Innovation and Universities (Research Grant nos.
PGC2018-095301-B-I00 and RED2018-102553-T REDEVNEURAL 3.0); and the European Research Council (ERC CoG ‘_PoisedLogic_’; grant no. 862022). The Landeira laboratory is funded by grants from
the Spanish Ministry of Science and Innovation (grant nos. BFU2016-75233-P and PID2019-108108GB-I00) and the Andalusian Regional Government (grant no. PC-0246-2017). AUTHOR INFORMATION
AUTHORS AND AFFILIATIONS * Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany Tomas Pachano, Tore Bleckwehl & Alvaro Rada-Iglesias * Institute of
Biomedicine and Biotechnology of Cantabria (IBBTEC), CSIC/Universidad de Cantabria/SODERCAN, Santander, Spain Tomas Pachano, Víctor Sánchez-Gaya, Thais Ealo, Maria Mariner-Faulí, Patricia
Respuela, María Muñoz-San Martín, Endika Haro & Alvaro Rada-Iglesias * Centre for Genomics and Oncological Research (GENYO), Granada, Spain Helena G. Asenjo & David Landeira *
Department of Biochemistry and Molecular Biology II, Faculty of Pharmacy, University of Granada, Granada, Spain Helena G. Asenjo & David Landeira * Instituto de Investigación
Biosanitaria ibs.GRANADA, Hospital Virgen de las Nieves, Granada, Spain Helena G. Asenjo & David Landeira * Max Planck Institute for Molecular Biomedicine, Muenster, Germany Sara
Cruz-Molina * Center for Biomics, Erasmus University Medical Center, Rotterdam, the Netherlands Wilfred F. J. van IJcken * Cologne Excellence Cluster for Cellular Stress Responses in
Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany Alvaro Rada-Iglesias Authors * Tomas Pachano View author publications You can also search for this author inPubMed
Google Scholar * Víctor Sánchez-Gaya View author publications You can also search for this author inPubMed Google Scholar * Thais Ealo View author publications You can also search for this
author inPubMed Google Scholar * Maria Mariner-Faulí View author publications You can also search for this author inPubMed Google Scholar * Tore Bleckwehl View author publications You can
also search for this author inPubMed Google Scholar * Helena G. Asenjo View author publications You can also search for this author inPubMed Google Scholar * Patricia Respuela View author
publications You can also search for this author inPubMed Google Scholar * Sara Cruz-Molina View author publications You can also search for this author inPubMed Google Scholar * María
Muñoz-San Martín View author publications You can also search for this author inPubMed Google Scholar * Endika Haro View author publications You can also search for this author inPubMed
Google Scholar * Wilfred F. J. van IJcken View author publications You can also search for this author inPubMed Google Scholar * David Landeira View author publications You can also search
for this author inPubMed Google Scholar * Alvaro Rada-Iglesias View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS T.P. and A.R.-I.
conceptualized the project. Experimental investigations were performed by T.P., T.E., M.M.-F., H.G.A., P.R., M.M.-S. and E.H. T.P., V.S.-G. and T.B. performed data analyses. T.P. and A.R.-I.
wrote, reviewed and edited the manuscript. S.C.-M., W.F.J.v.I., D.L. and A.R.-I. were responsible for obtaining resources. A.R.-I. was responsible for supervision and funding acquisition.
CORRESPONDING AUTHOR Correspondence to Alvaro Rada-Iglesias. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PEER REVIEW
INFORMATION _Nature Genetics_ thanks Darío Lupiáñez, Robin Andersson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are
available. PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. EXTENDED DATA EXTENDED DATA FIG. 1 GENETIC
AND EPIGENETIC FEATURES OF THE OCGIS ASSOCIATED WITH PES. A, Comparison of CpG%, observed/expected CpG ratio, GC% and sequence length between random regions (n = 436000), NMIs associated to
_PE-distal_ (_PE-NMIs_; n = 345) and NMIs associated to the _devTSS_ (_devTSS-NMIs_; n = 1476) (Methods). The p-values were calculated using two-sided unpaired Wilcoxon tests with Bonferroni
correction for multiple testing; black numbers indicate median fold-changes; green numbers indicate non-negligible Cliff Delta effect sizes. The coloured area of the violin plot represents
the expression values distribution and the center line represents the median. B, H3K27me3 ChIP-seq levels14,24 around: _PE-distal_ with overlapping TFBS/p300 peaks and CAP-CGIs (n = 135),
_PE-distal_ with TFBS/p300 peaks separated by 1bp-1kb from CAP-CGIs (n = 65), _PE-distal_ with TFBS/p300 peaks separated by 1-3kb from CAP-CGIs (n = 53), _PE-distal_ without CAP-CGIs within
3kb (n = 254) and AEs without CAP-CGI within 3kb (n = 8115). C, % of CpG methylation at CAP-CGI associated with _PE-distal_ (PE-CAP-CGI; n = 276) and CAP-CGI associated with the TSS of
developmental genes (devTSS-CAP-CGI; n = 1926) in the indicated cell types (Methods). D, For the identification of the _PE Sox1(+35)CGI_ deletion, primer pairs flanking each of the deletion
breakpoints (1 + 3 and 4 + 2), located within the deleted region (5 + 6) or amplifying a large or small fragment depending on the absence or presence of the deletion (1 + 2) were used. E,
H3K27me3 levels at _PE Sox1(+35)_ were measured by ChIP-qPCR in WT ESCs and in n = 2 independent _PE Sox1(+35)CGI_−_/_− ESCs clones using primers adjacent to the deleted region. The bars
display the mean of n = 3 technical replicates (black dots). F, Independent biological replicate for the data presented in Fig. 1d. _Sox1_ expression was investigated by RT-qPCR in ESCs and
AntNPC with the indicated genotypes. n = 2 independent _PE Sox1 CGI_−_/_− ESC clones (circles and diamonds) and n = 1 _PE Sox1_−_/_− clone were studied. For each cell line, n = 2 replicates
of the AntNPC differentiation were performed. Expression values were normalized to two housekeeping genes (_Eef1a_ and _Hprt_) and are presented as fold-changes with respect to WT ESCs. The
coloured area of the violin plot represents the expression values distribution and the center line represents the median. EXTENDED DATA FIG. 2 ENGINEERING OF PES MODULES WITHIN THE GATA6-TAD
AND FOXA2-TAD. A, Epigenomic and genomic features of two previously characterized PEs14 (_PE Six3(_−_133)_; _PE Lmx1b(+59)_) in which the oCGIs overlap with conserved sequences bound by
p300 and, thus, likely to contain relevant TFBS. B, The different _PE Sox1(+35)_ insertions were identified using primer pairs flanking the insertion borders (1 + 3 and 4 + 2; 1 + 5 and 6 +
2; 1 + 3 and 6 + 2), amplifying potential duplications (4 + 3, 3 + 2 and 4 + 1; 6 + 5, 5 + 2 and 6 + 1) and amplifying a large or small fragment depending on the absence or presence of the
insertion (1 + 2), respectively. The PCR results obtained for WT ESCs and for two ESC clonal lines with homozygous insertions of the _PE Sox1(+35)_ modules in the _Gata6_-TAD are shown. C,
Independent biological replicate for the data presented in Fig. 2b. D-E, Strategy used to insert the _PE Wnt8b(+21)_ (d) or the _PE Sox1(+35)_ (e) components into the _Gata6_-TAD (d) or
_Foxa2_-TAD (e), respectively. The right panels shows the TADs in which _Gata6_ (d) or _Foxa2_ (e) are included according to publically available Hi-C data80,81, with the red triangle
indicating the integration site of the PE modules, approximately 100 Kb downstream of _Gata6_ (d) or _Foxa2_ (e). F-G, For identifying the successful insertion of the different _PE
Sox1(+35)_ (f) or _PE Wnt8b__(+21)_ (g) modules, primer pairs flanking the insertion borders (1 + 3 and 4 + 2; 1 + 5 and 6 + 2; 1 + 3 and 6 + 2), amplifying potential duplications (4 + 3, 3
+ 2 and 4 + 1; 6 + 5, 5 + 2 and 6 + 1) and amplifying a large or small fragment depending on the absence or presence of the insertion (1 + 2), respectively, were used. The PCR results
obtained for two ESC clonal lines with homozygous insertions of the indicated PE modules in the _Foxa2_-TAD (f) or _Gata6_-TAD (g), respectively, are shown. H-I, Independent biological
replicates for the data shown in Fig. 2c (h) and Fig. 2d (i). In (c), (h) and (i), the expression differences between AntNPCs with the TFBS + CGI module and AntNPCs with the other PE modules
were calculated using two-sided non-paired t-tests (**: foldchange>2 & p<0.001; *: foldchange> 2 & p<0.05; ns: not significant; fold-change<2 or p>0.05). EXTENDED
DATA FIG. 3 PES ARE ENRICHED IN CPG-RICH MOTIFS AND ARE BOUND BY CXXC-DOMAIN CONTAINING PROTEINS. A, Comparison of the TF motifs enriched in either PEs with a CAP-CGI in <3kb and active
enhancers without CAP-CGIs in <3kb. Motif enrichment analyses were performed with _Homer_82 (left) and _AME_107 (right). B, ChIP-seq signals for KDM2B31 (upper panel) and TET132 (lower
panel) are shown around: _PE-distal_ with overlapping TFBS/p300 peaks and CAP-CGIs (n = 135), _PE-distal_ with TFBS/p300 peaks separated by 1bp-1kb from CAP-CGIs (n = 65), _PE-distal_ with
TFBS/p300 peaks separated by 1-3kb from CAP-CGIs (n = 53) and _PE-distal_ without CAP-CGIs within 3kb (n = 254). ChIP-seq profile plots were generated using either the p300 peaks (left) or
the CAP-CGIs (right) associated with the PEs as midpoints. EXTENDED DATA FIG. 4 ENGINEERING OF ESC LINES CONTAINING THE _PE SOX1(+35)_ TFBS AND AN ARTIFICIAL CGI WITHIN THE _GATA6_-TAD. A,
Strategy used to insert the _PE Sox1(+35)TFBS_ alone or together with an aCGI into the _Gata6_-TAD. The upper left panel shows the epigenomic and genetic features of the _PE Sox1(+35)_. The
lower left panel shows the _PE Sox1(+35)_ modules inserted into the _Gata6_-TAD. The right panel shows the _Gata6_-TAD according to publically available Hi-C data80,81. The red triangle
indicates the integration site of the _PE Sox1(+35)_ modules approximately 100 Kb downstream of _Gata6_. B, For the identification of the _PE Sox1(+35)TFBS+aCGI_ insertion, primer pairs
flanking the insertion borders (1+3 and 4+2), amplifying potential duplications (4 + 3 and 4 + 4) and amplifying a large or small fragment depending on the absence or presence of the
insertion (1 + 2), respectively, were used. The PCR results obtained for two ESC clonal lines with homozygous insertions of _PE Sox1(+35)TFBS+aCGI_ in the _Gata6_-TAD are shown. C,
Independent biological replicate for the data presented in Fig. 2f. The expression differences between AntNPCs with the TFBS+CGI module and AntNPCs with the other PE modules were calculated
using two-sided non-paired t-tests (*: foldchange> 2 & p<0.05; ns: not significant; fold-change<2 or p>0.05). D, For the identification of the aCGI insertion alone, primer
pairs flanking the insertion borders (1 + 3 and 4 + 2), amplifying potential duplications (4 + 3 and 4 + 4) and amplifying a large or small fragment depending on the absence or presence of
the insertion (1 + 2), respectively, were used. The PCR results obtained from two ESC clonal lines with heterozygous insertions of aCGI in the _Gata6_-TAD are shown. E, The expression of
_Gata6_ and _Sox1_ was measured by RT-qPCR in cells that were either WT or heterozygous for the aCGI insertion in the _Gata6_-TAD (two different clones; circles and diamonds). For each cell
line, n = 2 replicates of the AntNPC differentiation were performed. The results obtained in n = 2 independent biological replicates are presented in each panel (Rep1 and Rep2). EXTENDED
DATA FIG. 5 _GATA6_ EXPRESSION PATTERNS IN CELL LINES WITH THE _PE SOX1(+35)_ MODULES INSERTED WITHIN THE _GATA6_-TAD. A, _Gata6_ and _Sox1_ expression was measured by RT-qPCR in ESCs and at
intermediate stages of AntNPC differentiation (Day 3 and Day 4). The analysed cells were either WT or homozygous for the insertions of the different _PE Sox1(+35)_ modules within the
_Gata6_-TAD. For the cells with the PE module insertions, n = 1 clonal cell line was studied. For each cell line, n = 2 replicates of the AntNPC differentiation were performed. Expression
values were normalized to two housekeeping genes (_Eef1a_ and _Hprt_) and are presented as fold-changes with respect to WT ESCs. _B_, Quantification of cells expressing GATA6 or SOX1
according to immunofluorescence assays as the ones shown in Fig. 2g. The analysed cells were either WT of homozygous for the insertions of the different _PE Sox1(+35)_ modules within the
_Gata6_-TAD. C, The expression patterns of GATA6 (upper panel) and SOX1 (lower panel) were investigated by immunofluorescence in WT ESCs or AntNPCs that were either WT, homozygous for the
insertion of the _PE Sox1(+35)TFBS_ + _aCGI_ in the _Gata6_-TAD or heterozygous for the insertion of the aCGI alone in the _Gata6_-TAD. Nuclei were stained with DAPI. Scale bar = 100µm. D,
Quantification of cells expressing GATA6 or SOX1 according to the immunofluorescence assays described in (c). In (b) and (d), the bars display the mean of n = 3 technical replicates (black
dots). EXTENDED DATA FIG. 6 EPIGENETIC AND TOPOLOGICAL CHARACTERIZATION OF THE _GATA6_-TAD CELL LINES. A, Bisulfite sequencing data presented in Fig. 3a for the indicated _Gata6_-TAD cell
lines. The circles correspond to individual CpG dinucleotides located within the TFBS module. Unmethylated CpGs are shown in white, methylated CpGs in black and not-covered CpGs in gray. B,
Chromatin accessibility at the endogenous _PE Sox1(+35)_ and the _Gata6_-TAD insertion site (P1 and P2) were measured by FAIRE-qPCR in cells with the indicated genotypes. C, DNA methylation
and nucleosome occupancy at the TFBS were simultaneously analyzed by NOMe-PCR in the indicated _Gata6_-TAD ESC lines. In the upper panels, the black and white circles represent methylated or
unmethylated CpG sites, respectively. In the lower panels, the blue or white circles represent accessible or inaccessible GpC sites for the GpC methyltransferase, respectively. Red bars
represent inaccessible regions large enough to accommodate a nucleosome. The dotted line indicates where the TFBS starts. The grey shaded area represents a nucleosome-depleted region. D,
Scatter plots showing population-averaged nucleosome occupancy (red) and DNA methylation (black) levels within the TFBS in the indicated _Gata6_-TAD ESC lines. The grey shaded area
represents a nucleosome depleted region. E-F, H3K4me1, H3K4me3, H2AK119ub, CBX7 and PHC1 levels at the endogenous _PE Sox1(+35)_ and the _Gata6_-TAD insertion site (P1 and P2) were measured
by ChIP-qPCR in cells with the indicated genoytpes. ChIP-qPCR signals were calculated as described in Fig. 3. G, 4C-seq experiments were performed using the _Gata6_ promoter as a viewpoint
in AntNPC with the indicated genotypes. H, Pile-up plots showing average Hi-C7,52 signals in ESC between two groups of PE-gene pairs: PEs and developmental genes with CGI-rich promoters; PEs
and genes with CGI-poor promoters. For each PE-gene pair, both the PE and the gene were located within the same TAD. Left panels include all the considered PE-gene pairs (n = 401 pairs for
developmental genes; n = 900 for CGI-poor promoters; middle panels includes PE-gene pairs with the same genomic size in the two groups (n = 401 pairs); right panels consist of PE-gene pairs
with the same genomic size and genes with expression levels <1 FPKM9 (n = 290 pairs) (Methods). EXTENDED DATA FIG. 7 GENERATION OF CELL LINES WITH ENGINEERED _PE SOX1(+35)_ MODULES WITHIN
THE _GRIA1-TAD_ AND GLOBAL CHARACTERIZATION OF H3K27AC AND ERNA LEVELS AT ACTIVE ENHANCERS. A, ESC clonal lines with insertions of the different _PE Sox1(+35)_ modules were identified using
primer pairs flanking the insertion borders (1 + 3 and 4 + 2; 1 + 5 and 6 + 2; 1 + 3 and 6 + 2), amplifying potential duplications (4 + 3, 3 + 2 and 4 + 1; 6 + 5, 5 + 2 and 6 + 1) and
amplifying a large or small fragment depending on the absence or presence of the insertion (1 + 2), respectively. The PCR results obtained for WT ESCs or two ESC clonal lines with homozygous
insertions of the different _PE Sox1(+35)_ modules in the _Gria1_-TAD are shown. B, Independent biological replicate for the data presented in Fig. 4b. The expression differences between
AntNPCs with the TFBS + CGI module and AntNPCs with the other PE modules were calculated using two-sided non-paired t-tests (ns: not significant; fold-change<2 or p>0.05). C, Bisulfite
sequencing analyses of ESC lines with the indicated _PE Sox1(+35)_ modules inserted in the _Gria1_-TAD. The circles correspond to individual CpG dinucleotides located within the TFBS:
unmethylated CpGs (white), methylated CpGs (black) and not-covered CpGs (gray) are shown. The plot on the right summarizes the DNA methylation levels measured within the TFBS in the
indicated ESC lines. D, Active enhancers (AEs) identified in ESCs based on the presence of distal H3K27ac peaks were classified into three categories (Methods): Class I (AEs in TADs
containing only poorly expressed genes; n = 271(left); n = 340 (middle, right); Class II (AEs in TADs with at least one highly expressed gene; n = 271(left); n = 2353(middle); n =
340(right)); Class III (AEs whose closest genes in the same TAD is highly expressed; n = 271(left); n = 1262(middle); n = 340(right)). The violin plots show the H3K27ac and eRNA levels in
ESC for each AE category. P-values were calculated using unpaired Wilcoxon tests with Bonferroni correction for multiple testing; the numbers in black indicate the median fold-changes
between the indicated groups; the coloured numbers correspond to Cliff Delta effect sizes: negligible (red) and non-negligible (green). In the left and right panels, eRNA levels for the
three enhancers classes are compared after correcting for H3K27ac differences (Methods). EXTENDED DATA FIG. 8 GENERATION AND CHARACTERIZATION OF CELL LINES WITH PE INSERTIONS AT THE _GRIA1_
AND _SOX7/RP1L1_ TADS. A, H2AK229ub and SUZ12 levels at the endogenous _PE Sox1(+35)_, the _Gria1_ promoter and the _Gria1_-TAD insertion site (P1 and P2; Fig. 4d) were measured by ChIP-qPCR
in ESCs with the indicated genotypes. ChIP-qPCR signals were calculated as in Fig. 3. B, ESC clonal lines in which a pCGI was inserted 380bp upstream of the _Gria1_-TSS in cells with the
indicated _PE Sox1(+35)_ modules 100Kb upstream from _Gria1_ were identified using the indicated primer pairs. PCR results for clonal ESC lines with the indicated double homozygous
insertions are shown. C, eRNA levels at the endogenous _PE Sox1(+35)_ and the _Gria1_-TAD insertion site (P1 and P2) were measured by RT-qPCR in cells with the indicated genotypes.
Expression values were calculated as in Fig. 3. D, Strategy to insert the indicated _PE Sox1(+35)_ modules 380bp upstream (red triangle) of the _Gria1_-TSS. E, ESC clonal lines with the _PE
Sox1(+35)_ modules 380bp upstream of the _Gria1-_TSS were identified using the indicated primer pairs. PCR for ESC clonal lines with homozygous insertions of the indicated _PE Sox1(+35)_
modules are shown. F, Independent biological replicate for the data presented in Fig. 5e. G, ESC clonal lines with the _PE Sox1(+35)_ modules within the _Sox7/Rp1l1_-TAD were identified
using primers flanking the insertion borders (1 + 3 and 4 + 2; 1 + 3 and 6 + 2), amplifying potential duplications (4 + 3, 3 + 2 and 4 + 1) and amplifying a large or small fragment depending
on the absence or presence of the insertion (1 + 2), respectively. PCR results for ESC clonal lines with homozygous insertions of the indicated _PE Sox1(+35)_ modules are shown. H,
Independent biological replicate for the data presented in Fig. 5g. In (a) and (c), the bars display the mean of n = 3 technical replicates (black dots). In (f) and (h), the expression
differences between AntNPCs with the TFBS + CGI module or the other PE modules were calculated using two-sided non-paired t-tests (***: foldchange> 2 & p<0.0001; ns: not
significant; fold-change<2 or p>0.05). EXTENDED DATA FIG. 9 GENERATION OF ESC LINES WITH STRUCTURAL VARIANTS. A, ESC lines with the _Six3/Six2_ TAD boundary deletion were identified
using primers flanking the deleted region (1 + 3 and 4 + 2), amplifying the deleted fragment (5 + 6) and amplifying a large or small fragment depending on the absence or presence of the
deletion (1 + 2), respectively. The PCR results for two ESC clonal lines with 36Kb homozygous deletions (_del36_) are shown. B, ESC lines with the _Six3/Six2_ inversion were identified using
primer pairs flanking the inverted region (1 + 3, 4 + 2, 1 + 4 and 3 + 2) and amplifying potential duplications (4 + 3, 3 + 3 and 4 + 4). The PCR results for two ESC clonal lines with 110Kb
homozygous inversions (_inv110_) are shown. C, Epigenomic and genetic features of a CTCF binding site112 (CBS; highlighted in grey) located upstream of the _PE Six1(_−_133)_ (highlighted in
yellow). D, ESC lines with the CBS deletion were identified using primers flanking the deleted region (1 + 2) or located in the CBS (3 + 4). The PCR results for two ESC clonal lines with
homozygous CBS deletions are shown. E, The expression of _Six3_ and _Six2_ was measured by RT-qPCR in cells with the indicated genotypes. For each of the engineered structural variants, n =
2 independent clonal cell lines were generated (circles and diamonds). In each plot, the number of circles and/or diamonds corresponds to the number of AntNPC differentiations performed. The
results obtained in n = 2 independent biological replicates are presented in each panel (Rep1 and Rep2). Expression values are presented as fold-changes with respect to WT ESCs. F, ESC
lines with the _Lmx1a_-TAD boundary inversion were identified using primers flanking the inverted region (1 + 3, 4 + 2, 1 + 4 and 3 + 2) and amplifying potential deletions (1 + 4) or
duplications (4 + 3, 3 + 3 and 4 + 4). The PCR results for three ESC clonal lines with 260 Kb homozygous inversions (_inv260_) are shown. EXTENDED DATA FIG. 10 EXAMPLES OF HUMAN CONGENITAL
DISEASES CAUSED BY STRUCTURAL VARIANTS THAT DISRUPT DEVELOPMENTAL LOCI WITH PE-ASSOCIATED OCGIS. A, Upper panel: heterozygous inversion in a patient with Branchio-oculo-facial syndrome
(BOFS)5. Lower panel: epigenomic and genetic features of _TFAP2A_ neural crest (NC) cognate enhancers (left), 6q16.2 genes (middle) and _TFAP2A_ (right). In the lower left panel, enhancer
reporter assays in chicken embryos are shown for two representative _TFAP2A_ enhancers5. Computational CGI and NMIs are represented as green rectangles. The inversion places one _TFAP2A_
allele into a novel TAD and impairs its normal expression in NC cells due to the physical disconnection from its enhancers. _TFAP2A_ has a promoter with a large CGI cluster and marked with a
broad H3K27me3 domain in ESCs. Some _TFAP2A_ NC enhancers are associated with oCGIs and marked with H3K27me3 in ESCs. Moreover, this inversion places genes originally found within the
6q16.2 locus in proximity of the _TFAP2A_ NC enhancers within a shuffled domain. The promoters of these 6q16.2 genes (i.e _GPR63_ and _NDUFAF4_) contain a short CGI centered on their TSSs.
In agreement with our findings, none of the 6q16.2 genes is responsive to the _TFAP2A_ NC enhancers5. B, Upper panel: deletion found in families with brachydactyly involving a TAD boundary
located between the _EPHA4_ and the _PAX3_ loci63. Lower panel: epigenomic and genetic features of the _Epha4_ cognate enhancers in the mouse E11.5 limb (left) and in human ESCs (right).
Representative reporter assay in E11.5 mouse embryos for the hs1507 element is shown in the middle63. The deletion includes _EPHA4_, a gene highly expressed in the developing limb, and the
TAD boundary separating the _EPHA4_ and _PAX3_ TADs. As a result, enhancers that control _EPHA4_ expression in the limb establish ectopic interactions with _PAX3_ (that is enhancer adoption)
and strongly induce its expression in the limb. The _PAX3_ promoter contains a large CGI cluster and is marked with H3K27me3 in ESCs, while one of the major _EPHA4_ enhancers (hs1507) is
associated with an oCGI and is marked with H3K27me3 in ESCs. The high responsiveness of _PAX3_ to the _EPHA4_ enhancers is in agreement with our findings. SUPPLEMENTARY INFORMATION REPORTING
SUMMARY PEER REVIEW INFORMATION SUPPLEMENTARY DATA 1 List of oligonucleotides and antibodies. SUPPLEMENTARY DATA 2 List of knock-in donor sequences. RIGHTS AND PERMISSIONS Reprints and
permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Pachano, T., Sánchez-Gaya, V., Ealo, T. _et al._ Orphan CpG islands amplify poised enhancer regulatory activity and determine target gene
responsiveness. _Nat Genet_ 53, 1036–1049 (2021). https://doi.org/10.1038/s41588-021-00888-x Download citation * Received: 05 August 2020 * Accepted: 17 May 2021 * Published: 28 June 2021 *
Issue Date: July 2021 * DOI: https://doi.org/10.1038/s41588-021-00888-x SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link
Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative