Orphan cpg islands amplify poised enhancer regulatory activity and determine target gene responsiveness

Orphan cpg islands amplify poised enhancer regulatory activity and determine target gene responsiveness


Play all audios:


ABSTRACT CpG islands (CGIs) represent a widespread feature of vertebrate genomes, being associated with ~70% of all gene promoters. CGIs control transcription initiation by conferring nearby


promoters with unique chromatin properties. In addition, there are thousands of distal or orphan CGIs (oCGIs) whose functional relevance is barely known. Here we show that oCGIs are an


essential component of poised enhancers that augment their long-range regulatory activity and control the responsiveness of their target genes. Using a knock-in strategy in mouse embryonic


stem cells, we introduced poised enhancers with or without oCGIs within topologically associating domains harboring genes with different types of promoters. Analysis of the resulting cell


lines revealed that oCGIs act as tethering elements that promote the physical and functional communication between poised enhancers and distally located genes, particularly those with large


CGI clusters in their promoters. Therefore, by acting as genetic determinants of gene–enhancer compatibility, CGIs can contribute to gene expression control under both physiological and


potentially pathological conditions. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through


your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this


journal Receive 12 print issues and online access $209.00 per year only $17.42 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now


Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer


support SIMILAR CONTENT BEING VIEWED BY OTHERS PROMOTER-PROXIMAL CTCF BINDING PROMOTES DISTAL ENHANCER-DEPENDENT GENE ACTIVATION Article 04 January 2021 THE CHROMATIN, TOPOLOGICAL AND


REGULATORY PROPERTIES OF PLURIPOTENCY-ASSOCIATED POISED ENHANCERS ARE CONSERVED IN VIVO Article Open access 16 July 2021 INCREASED ENHANCER–PROMOTER INTERACTIONS DURING DEVELOPMENTAL


ENHANCER ACTIVATION IN MAMMALS Article 20 March 2024 DATA AVAILABILITY All the 4C–seq data generated in this study are available through the GEO (GSE156465). All the generated transgenic ESC


lines are available upon request. REFERENCES * Spitz, F. & Furlong, E. E. M. Transcription factors: from enhancer binding to developmental control. _Nat. Rev. Genet._ 13, 613–626


(2012). Article  CAS  PubMed  Google Scholar  * Kvon, E. Z. Using transgenic reporter assays to functionally characterize enhancers in animals. _Genomics_ 106, 185–192 (2015). Article  CAS 


PubMed  Google Scholar  * Furlong, E. E. M. & Levine, M. Developmental enhancers and chromosome topology. _Science_ 361, 1341–1345 (2018). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. _Nature_ 485, 376–380 (2012). Article  CAS  PubMed  PubMed Central 


Google Scholar  * Laugsch, M. et al. Modeling the pathological long-range regulatory effects of human structural variation with patient-specific hiPSCs. _Cell Stem Cell_ 24, 736–752.e12


(2019). Article  CAS  PubMed  Google Scholar  * Rao, S. S. P. et al. Cohesin loss eliminates all loop domains. _Cell_ 171, 305–320.e24 (2017). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Nora, P. et al. Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. _Cell_ 169, 930–944 (2017). Article  CAS  PubMed 


PubMed Central  Google Scholar  * Ghavi-Helm, Y. et al. Highly rearranged chromosomes reveal uncoupling between genome topology and gene expression. _Nat. Genet._ 51, 1272–1282 (2019).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Kraft, K. et al. Serial genomic inversions induce tissue-specific architectural stripes, gene misexpression and congenital


malformations. _Nat. Cell Biol._ 21, 305–310 (2019). Article  CAS  PubMed  Google Scholar  * Kikuta, H. et al. Genomic regulatory blocks encompass multiple neighboring genes and maintain


conserved synteny in vertebrates. _Genome Res._ 17, 545–555 (2007). Article  CAS  PubMed  PubMed Central  Google Scholar  * Arnold, C. D. et al. Genome-wide assessment of sequence-intrinsic


enhancer responsiveness at single-base-pair resolution. _Nat. Biotechnol._ 35, 136–144 (2016). Article  PubMed  PubMed Central  CAS  Google Scholar  * Haberle, V. et al. Transcriptional


cofactors display specificity for distinct types of core promoters. _Nature_ 570, 122–126 (2019). Article  CAS  PubMed  Google Scholar  * Spielmann, M., Lupiáñez, D. G. & Mundlos, S.


Structural variation in the 3D genome. _Nat. Rev. Genet._ 19, 453–467 (2018). Article  CAS  PubMed  Google Scholar  * Cruz-Molina, S. et al. PRC2 facilitates the regulatory topology required


for poised enhancer function during pluripotent stem cell differentiation. _Cell Stem Cell_ 20, 689–705.e9 (2017). Article  CAS  PubMed  Google Scholar  * Rada-Iglesias, A. et al. A unique


chromatin signature uncovers early developmental enhancers in humans. _Nature_ 470, 279–283 (2011). Article  CAS  PubMed  Google Scholar  * Deaton, A. M. & Bird, A. CpG islands and the


regulation of transcription. _Genes Dev._ 25, 1010–1022 (2011). Article  CAS  PubMed  PubMed Central  Google Scholar  * Bell, J. S. K. & Vertino, P. M. Orphan CpG islands define a novel


class of highly active enhancers. _Epigenetics_ 12, 449–464 (2017). Article  PubMed  PubMed Central  Google Scholar  * Illingworth, R. S. et al. Orphan CpG islands identify numerous


conserved promoters in the mammalian genome. _PLoS Genet._ 6, e1001134 (2010). Article  PubMed  PubMed Central  CAS  Google Scholar  * Steinhaus, R., Gonzalez, T., Seelow, D. & Robinson,


P. N. Pervasive and CpG-dependent promoter-like characteristics of transcribed enhancers. _Nucleic Acids Res._ 48, 5306–5317 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Bogdanović, O. et al. Active DNA demethylation at enhancers during the vertebrate phylotypic period. _Nat. Genet._ 48, 417–426 (2016). Article  PubMed  PubMed Central  CAS  Google Scholar  *


Long, H. K. et al. Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates. _eLife_ 2, e00348 (2013). Article  PubMed  PubMed


Central  Google Scholar  * Lenhard, B., Sandelin, A. & Carninci, P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. _Nat. Rev. Genet._ 13,


233–245 (2012). Article  CAS  PubMed  Google Scholar  * Williams, K. et al. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. _Nature_ 473, 343–349 (2011).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Blackledge, N. P. et al. Variant PRC1 complex-dependent H2A ubiquitylation drives PRC2 recruitment and polycomb domain formation.


_Cell_ 157, 1445–1459 (2014). Article  CAS  PubMed  PubMed Central  Google Scholar  * Aljazi, M. B., Gao, Y., Wu, Y., Mias, G. I. & He, J. Cell signaling coordinates global PRC2


recruitment and developmental gene expression in murine embryonic stem cells. _iScience_ 23, 101646 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  * Habibi, E. et al.


Whole-genome bisulfite sequencing of two distinct interconvertible DNA methylomes of mouse embryonic stem cells. _Cell Stem Cell_ 13, 360–369 (2013). Article  CAS  PubMed  Google Scholar  *


Zylicz,J. J. et al. Chromatin dynamics and the role of G9a in gene regulation and enhancer silencing during early mouse development. _eLife_ 4, e09571 (2015). Article  PubMed  PubMed Central


  Google Scholar  * Lee, S. M. et al. Intragenic CpG islands play important roles in bivalent chromatin assembly of developmental genes. _Proc. Natl Acad. Sci. USA_ 114, E1885–E1894 (2017).


CAS  PubMed  PubMed Central  Google Scholar  * Bolt, C. C. & Duboule, D. The regulatory landscapes of developmental genes. _Development_ 147, dev171736 (2020). Article  CAS  PubMed 


PubMed Central  Google Scholar  * Blackledge, N. P. & Klose, R. CpG island chromatin. _Epigenetics_ 2294, 147–152 (2011). Article  CAS  Google Scholar  * Turberfield, A. H. et al. KDM2


proteins constrain transcription from CpG island gene promoters independently of their histone demethylase activity. _Nucleic Acids Res._ 47, 9005–9023 (2019). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Arab, K. et al. GADD45A binds R-loops and recruits TET1 to CpG island promoters. _Nat. Genet._ 51, 217–223 (2019). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Diez, R. & Storey, K. G. Markers in vertebrate neurogenesis. _Nat. Rev. Neurosci._ 2, 835–839 (2001). Google Scholar  * Bentovim, L., Harden, T. T. & DePace, A. H.


Transcriptional precision and accuracy in development: from measurements to models and mechanisms. _Development_ 144, 3855–3866 (2017). Article  CAS  PubMed  PubMed Central  Google Scholar 


* Boyes, J. & Bird, A. DNA methylation inhibits transcription indirectly via a methyl-CpG binding protein. _Cell_ 64, 1123–1134 (1991). Article  CAS  PubMed  Google Scholar  * Klemm, S.


L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. _Nat. Rev. Genet._ 20, 207–220 (2019). Article  CAS  PubMed  Google Scholar  * You, J. S. et al.


OCT4 establishes and maintains nucleosome-depleted regions that provide additional layers of epigenetic regulation of its target genes. _Proc. Natl Acad. Sci. USA_ 108, 14497–14502 (2011).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. _Nature_ 480, 490–495 (2011).


Article  CAS  PubMed  Google Scholar  * Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. _Nature_ 465, 182–187 (2010). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Mas, G. & Di Croce, L. The role of Polycomb in stem cell genome architecture. _Curr. Opin. Cell Biol._ 43, 87–95 (2016). Article  CAS  PubMed  Google Scholar 


* Yan, J. et al. Histone H3 lysine 4 monomethylation modulates long-range chromatin interactions at enhancers. _Cell Res._ 28, 204–220 (2018). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Denholtz, M. et al. Long-range chromatin contacts in embryonic stem cells reveal a role for pluripotency factors and polycomb proteins in genome organization. _Cell Stem Cell_ 13,


602–616 (2013). Article  CAS  PubMed  Google Scholar  * Wang, J. et al. A protein interaction network for pluripotency of embryonic stem cells. _Nature_ 444, 364–368 (2006). Article  CAS 


PubMed  Google Scholar  * Pachano, T., Crispatzu, G. & Rada-Iglesias, A. Polycomb proteins as organizers of 3D genome architecture in embryonic stem cells. _Brief. Funct. Genomics_ 18,


358–366 (2019). CAS  PubMed  Google Scholar  * Bantignies, F. et al. Polycomb-dependent regulatory contacts between distant Hox loci in _Drosophila_. _Cell_ 144, 214–226 (2011). Article  CAS


  PubMed  Google Scholar  * Isono, K. et al. SAM domain polymerization links subnuclear clustering of PRC1 to gene silencing. _Dev. Cell_ 26, 565–577 (2013). Article  CAS  PubMed  Google


Scholar  * Loubiere, V., Papadopoulos, G. L., Szabo, Q., Martinez, A. M. & Cavalli, G. Widespread activation of developmental gene expression characterized by PRC1-dependent chromatin


looping. _Sci. Adv._ 6, eaax4001 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  * Benabdallah, N. S. et al. Decreased enhancer-promoter proximity accompanying enhancer


activation. _Mol. Cell_ 76, 473–484 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Lim, B., Heist, T., Levine, M. & Fukaya, T. Visualization of transvection in living


_Drosophila_ embryos. _Mol. Cell_ 70, 287–296.e6 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Beck, S. et al. Implications of CpG islands on chromosomal architectures and


modes of global gene regulation. _Nucleic Acids Res._ 46, 4382–4391 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Liu, S. et al. From 1D sequence to 3D chromatin dynamics


and cellular functions: a phase separation perspective. _Nucleic Acids Res._ 46, 9367–9383 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Kurup, J. T., Han, Z., Jin, W.


& Kidder, B. L. H4K20me3 methyltransferase SUV420H2 shapes the chromatin landscape of pluripotent embryonic stem cells. _Development_ 147, dev188516 (2020). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Andersson, R., Sandelin, A. & Danko, C. G. A unified architecture of transcriptional regulatory elements. _Trends Genet._ 31, 426–433 (2015). Article  CAS 


PubMed  Google Scholar  * Lloret-Llinares, M. et al. The RNA exosome contributes to gene expression regulation during stem cell differentiation. _Nucleic Acids Res._ 46, 11502–11513 (2018).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Local, A. et al. Identification of H3K4me1-associated proteins at mammalian enhancers. _Nat. Genet._ 50, 73–82 (2018). Article  CAS 


PubMed  Google Scholar  * Etchegaray, J. P. et al. The histone deacetylase SIRT6 restrains transcription elongation via promoter-proximal pausing. _Mol. Cell_ 75, 683–699 (2019). Article 


CAS  PubMed  PubMed Central  Google Scholar  * Hirabayashi, S. et al. NET-CAGE characterizes the dynamics and topology of human transcribed _cis_-regulatory elements. _Nat. Genet._ 51,


1369–1379 (2019). Article  CAS  PubMed  Google Scholar  * Schoenfelder, S. et al. Polycomb repressive complex PRC1 spatially constrains the mouse embryonic stem cell genome. _Nat. Genet._


47, 1179–1186 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Butler, J. E. F. & Kadonaga, J. T. Enhancer–promoter specificity mediated by DPE or TATA core promoter


motifs. _Genes Dev._ 15, 2515–2519 (2001). Article  CAS  PubMed  PubMed Central  Google Scholar  * Gómez-Marín, C. et al. Evolutionary comparison reveals that diverging CTCF sites are


signatures of ancestral topological associating domains borders. _Proc. Natl Acad. Sci. USA_ 112, 7542–7547 (2015). Article  PubMed  PubMed Central  CAS  Google Scholar  * O’Brien, L. L. et


al. Transcriptional regulatory control of mammalian nephron progenitors revealed by multi-factor cistromic analysis and genetic studies. _PLoS Genet._ 14, e1007181 (2018). Article  PubMed 


PubMed Central  CAS  Google Scholar  * Catarino, R. R. & Stark, A. Assessing sufficiency and necessity of enhancer activities for gene expression and the mechanisms of transcription


activation. _Genes Dev._ 32, 202–223 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic


rewiring of gene-enhancer interactions. _Cell_ 161, 1012–1025 (2015). Article  PubMed  PubMed Central  CAS  Google Scholar  * Kragesteen, B. K. et al. Dynamic 3D chromatin architecture


contributes to enhancer specificity and limb morphogenesis. _Nat. Genet._ 50, 1463–1473 (2018). Article  CAS  PubMed  Google Scholar  * Li, X. & Noll, M. Compatibility between enhancers


and promoters determines the transcriptional specificity of gooseberry and gooseberry neuro in the _Drosophila_ embryo. _EMBO J._ 13, 400–406 (1994). Article  PubMed  PubMed Central  Google


Scholar  * Zabidi, M. A. et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. _Nature_ 518, 556–559 (2015). Article  CAS  PubMed  Google


Scholar  * Mahmoudi, T., Katsani, K. R. & Verrijzer, C. P. GAGA can mediate enhancer function in _trans_ by linking two separate DNA molecules. _EMBO J._ 21, 1775–1781 (2002). Article 


CAS  PubMed  PubMed Central  Google Scholar  * Calhoun, V. C. & Levine, M. Long-range enhancer-promoter interactions in the Scr-Antp interval of the _Drosophila_ Antennapedia complex.


_Proc. Natl Acad. Sci. USA_ 100, 9878–9883 (2003). Article  CAS  PubMed  PubMed Central  Google Scholar  * Calhoun, V. C., Stathopoulos, A. & Levine, M. Promoter-proximal tethering


elements regulate enhancer-promoter specificity in the _Drosophila_ Antennapedia complex. _Proc. Natl Acad. Sci. USA_ 99, 9243–9247 (2002). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Boyle, S. et al. A central role for canonical PRC1 in shaping the 3D nuclear landscape. _Genes Dev._ 34, 931–949 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Perino, M. et al. MTF2 recruits Polycomb Repressive Complex 2 by helical-shape-selective DNA binding. _Nat. Genet._ 50, 1002–1010 (2018). Article  CAS  PubMed  Google Scholar  * Beltran, M.


et al. The interaction of PRC2 with RNA or chromatin is mutually antagonistic. _Genome Res._ 26, 896–907 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Crispatzu, G. et al.


The chromatin, topological and regulatory properties of pluripotency-associated poised enhancers are conserved in vivo. Preprint at _bioRxiv_ https://doi.org/10.1101/2021.01.18.427085


(2021). * Shrinivas, K. et al. Enhancer features that drive formation of transcriptional condensates. _Mol. Cell_ 75, 549–561.e7 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar


  * Dimitrova, E. et al. FBXl19 recruits CDK-Mediator to CpG islands of developmental genes priming them for activation during lineage commitment. _eLife_ 7, e37084 (2018). Article  PubMed 


PubMed Central  Google Scholar  * Long, H. K., Blackledge, N. P. & Klose, R. J. ZF-CxxC domain-containing proteins, CpG islands and the chromatin connection. _Biochem. Soc. Trans._ 41,


727–740 (2013). Article  CAS  PubMed  PubMed Central  Google Scholar  * Mastrangelo, I. A., Courey, A. J., Wall, J. S., Jackson, S. P. & Hough, P. V. C. DNA looping and Sp1 multimer


links: a mechanism for transcriptional synergism and enhancement. _Proc. Natl Acad. Sci. USA_ 88, 5670–5674 (1991). Article  CAS  PubMed  PubMed Central  Google Scholar  * Su, W., Jackson,


S., Tjian, R. & Echols, H. DNA looping between sites for transcriptional activation: self-association of DNA-bound Sp1. _Genes Dev._ 5, 820–826 (1991). Article  CAS  PubMed  Google


Scholar  * Hartl, D. et al. CG dinucleotides enhance promoter activity independent of DNA methylation. _Genome Res._ 29, 554–563 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar


  * Wang, Y. et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. _Genome Biol._ 19, 151 (2018). Article  PubMed 


PubMed Central  CAS  Google Scholar  * Bonev, B. et al. Multiscale 3D genome rewiring during mouse neural development. _Cell_ 171, 557–572.e24 (2017). Article  CAS  PubMed  PubMed Central 


Google Scholar  * Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. _Mol. Cell_


38, 576–589 (2010). Article  CAS  PubMed  PubMed Central  Google Scholar  * Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. _Genome Biol._ 12, R83


(2011). Article  CAS  PubMed  PubMed Central  Google Scholar  * Gouti, M. et al. In vitro generation of neuromesodermal progenitors reveals distinct roles for wnt signalling in the


specification of spinal cord and paraxial mesoderm identity. _PLoS Biol._ 12, e1001937 (2014). Article  PubMed  PubMed Central  CAS  Google Scholar  * Matsuda, K. & Kondoh, H.


Dkk1-dependent inhibition of Wnt signaling activates Hesx1 expression through its 5′ enhancer and directs forebrain precursor development. _Genes Cells_ 19, 374–385 (2014). Article  CAS 


PubMed  Google Scholar  * Yao, X. et al. Tild-CRISPR allows for efficient and precise gene knockin in mouse and human cells. _Dev. Cell_ 45, 526–536.e5 (2018). Article  CAS  PubMed  Google


Scholar  * Giresi, P. G., Kim, J., McDaniell, R. M., Iyer, V. R. & Lieb, J. D. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from


human chromatin. _Genome Res._ 17, 877–885 (2007). Article  CAS  PubMed  PubMed Central  Google Scholar  * Requena, F. et al. NOMePlot: analysis of DNA methylation and nucleosome occupancy


at the single molecule. _Sci. Rep._ 9, 8140 (2019). Article  PubMed  PubMed Central  CAS  Google Scholar  * Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. _J. Mol.


Biol._ 196, 261–282 (1987). Article  CAS  PubMed  Google Scholar  * Madeira, F. et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. _Nucleic Acids Res._ 47, W636–W641


(2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. _Bioinformatics_ 26,


841–842 (2010). Article  CAS  PubMed  PubMed Central  Google Scholar  * Karolchik, D. et al. The UCSC Table Browser data retrieval tool. _Nucleic Acids Res._ 32, 493–496 (2004). Article  CAS


  Google Scholar  * Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. _Bioinformatics_ 32,


3047–3048 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data.


_Bioinformatics_ 30, 2114–2120 (2014). Article  CAS  PubMed  PubMed Central  Google Scholar  * Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. _EMBnet


J._ 17, 10–12 (2011). Article  Google Scholar  * Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. _Nat. Methods_ 9, 357–359 (2012). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Li, H. et al. The Sequence Alignment/Map format and SAMtools. _Bioinformatics_ 25, 2078–2079 (2009). Article  PubMed  PubMed Central  CAS  Google Scholar  * Feng,


J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. _Nat. Protoc._ 7, 1728–1740 (2012). Article  CAS  PubMed  Google Scholar  * Pagès, H. BSgenome:


software infrastructure for efficient representation of full genomes and their SNPs. R package version 1.56.0 (2020). * Wang, J. et al. Nascent RNA sequencing analysis provides insights into


enhancer-mediated gene regulation. _BMC Genomics_ 19, 633 (2018). Article  PubMed  PubMed Central  CAS  Google Scholar  * Ramírez, F. et al. deepTools2: a next generation web server for


deep-sequencing data analysis. _Nucleic Acids Res._ 44, W160–W165 (2016). Article  PubMed  PubMed Central  CAS  Google Scholar  * Cliff, N. Dominance statistics: ordinal analyses to answer


ordinal questions. _Psychol. Bull._ 114, 494–509 (1993). Article  Google Scholar  * Macbeth, G., Razumiejczyk, E. & Ledesma, R. D. Cliff´s Delta Calculator: a non-parametric effect size


program for two groups of observations. _Univ. Psychol._ 10, 545–555 (2011). Article  Google Scholar  * Bush, S. J., McCulloch, M. E. B., Summers, K. M., Hume, D. A. & Clark, E. L.


Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries. _BMC Bioinformatics_ 18, 301 (2017). Article  PubMed  PubMed Central  CAS  Google


Scholar  * Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. _Bioinformatics_ 36, 311–316 (2020). Article  CAS  PubMed  Google


Scholar  * Flyamer, I. M., Illingworth, R. S. & Bickmore, W. A. Coolpup.py: versatile pile-up analysis of Hi-C data. _Bioinformatics_ 36, 2980–2985 (2020). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Bailey, T. L. et al. MEME Suite: tools for motif discovery and searching. _Nucleic Acids Res._ 37, 202–208 (2009). Article  CAS  Google Scholar  * Krueger, F.


& Andrews, S. R. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. _Bioinformatics_ 27, 1571–1572 (2011). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. _Nat. Methods_ 9, 357–359 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  * Kent, W. J.,


Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. _Bioinformatics_ 26, 2204–2207 (2010). Article  CAS  PubMed


  PubMed Central  Google Scholar  * Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. _Bioinformatics_ 30, 1006–1007 (2014). Article  PubMed 


CAS  Google Scholar  * Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. _Nature_ 515, 402–405 (2014). Article  CAS  PubMed  PubMed


Central  Google Scholar  Download references ACKNOWLEDGEMENTS We thank the Rada-Iglesias laboratory members for insightful comments and critical reading of the manuscript. T.P. is supported


by a doctoral fellowship from the DAAD (Germany). V.S.-G. is supported by a doctoral fellowship from the University of Cantabria (Spain). Work in the Rada-Iglesias laboratory was supported


by the EMBO Young Investigator Programme; CMMC intramural funding (Germany); the German Research Foundation (DFG) (Research Grant no. RA 2547/2-1); ‘Programa STAR-Santander Universidades,


Campus Cantabria Internacional de la convocatoria CEI 2015 de Campus de Excelencia Internacional’ (Spain); the Spanish Ministry of Science, Innovation and Universities (Research Grant nos.


PGC2018-095301-B-I00 and RED2018-102553-T REDEVNEURAL 3.0); and the European Research Council (ERC CoG ‘_PoisedLogic_’; grant no. 862022). The Landeira laboratory is funded by grants from


the Spanish Ministry of Science and Innovation (grant nos. BFU2016-75233-P and PID2019-108108GB-I00) and the Andalusian Regional Government (grant no. PC-0246-2017). AUTHOR INFORMATION


AUTHORS AND AFFILIATIONS * Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany Tomas Pachano, Tore Bleckwehl & Alvaro Rada-Iglesias * Institute of


Biomedicine and Biotechnology of Cantabria (IBBTEC), CSIC/Universidad de Cantabria/SODERCAN, Santander, Spain Tomas Pachano, Víctor Sánchez-Gaya, Thais Ealo, Maria Mariner-Faulí, Patricia


Respuela, María Muñoz-San Martín, Endika Haro & Alvaro Rada-Iglesias * Centre for Genomics and Oncological Research (GENYO), Granada, Spain Helena G. Asenjo & David Landeira *


Department of Biochemistry and Molecular Biology II, Faculty of Pharmacy, University of Granada, Granada, Spain Helena G. Asenjo & David Landeira * Instituto de Investigación


Biosanitaria ibs.GRANADA, Hospital Virgen de las Nieves, Granada, Spain Helena G. Asenjo & David Landeira * Max Planck Institute for Molecular Biomedicine, Muenster, Germany Sara


Cruz-Molina * Center for Biomics, Erasmus University Medical Center, Rotterdam, the Netherlands Wilfred F. J. van IJcken * Cologne Excellence Cluster for Cellular Stress Responses in


Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany Alvaro Rada-Iglesias Authors * Tomas Pachano View author publications You can also search for this author inPubMed 


Google Scholar * Víctor Sánchez-Gaya View author publications You can also search for this author inPubMed Google Scholar * Thais Ealo View author publications You can also search for this


author inPubMed Google Scholar * Maria Mariner-Faulí View author publications You can also search for this author inPubMed Google Scholar * Tore Bleckwehl View author publications You can


also search for this author inPubMed Google Scholar * Helena G. Asenjo View author publications You can also search for this author inPubMed Google Scholar * Patricia Respuela View author


publications You can also search for this author inPubMed Google Scholar * Sara Cruz-Molina View author publications You can also search for this author inPubMed Google Scholar * María


Muñoz-San Martín View author publications You can also search for this author inPubMed Google Scholar * Endika Haro View author publications You can also search for this author inPubMed 


Google Scholar * Wilfred F. J. van IJcken View author publications You can also search for this author inPubMed Google Scholar * David Landeira View author publications You can also search


for this author inPubMed Google Scholar * Alvaro Rada-Iglesias View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS T.P. and A.R.-I.


conceptualized the project. Experimental investigations were performed by T.P., T.E., M.M.-F., H.G.A., P.R., M.M.-S. and E.H. T.P., V.S.-G. and T.B. performed data analyses. T.P. and A.R.-I.


wrote, reviewed and edited the manuscript. S.C.-M., W.F.J.v.I., D.L. and A.R.-I. were responsible for obtaining resources. A.R.-I. was responsible for supervision and funding acquisition.


CORRESPONDING AUTHOR Correspondence to Alvaro Rada-Iglesias. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PEER REVIEW


INFORMATION _Nature Genetics_ thanks Darío Lupiáñez, Robin Andersson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are


available. PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. EXTENDED DATA EXTENDED DATA FIG. 1 GENETIC


AND EPIGENETIC FEATURES OF THE OCGIS ASSOCIATED WITH PES. A, Comparison of CpG%, observed/expected CpG ratio, GC% and sequence length between random regions (n = 436000), NMIs associated to


_PE-distal_ (_PE-NMIs_; n = 345) and NMIs associated to the _devTSS_ (_devTSS-NMIs_; n = 1476) (Methods). The p-values were calculated using two-sided unpaired Wilcoxon tests with Bonferroni


correction for multiple testing; black numbers indicate median fold-changes; green numbers indicate non-negligible Cliff Delta effect sizes. The coloured area of the violin plot represents


the expression values distribution and the center line represents the median. B, H3K27me3 ChIP-seq levels14,24 around: _PE-distal_ with overlapping TFBS/p300 peaks and CAP-CGIs (n = 135),


_PE-distal_ with TFBS/p300 peaks separated by 1bp-1kb from CAP-CGIs (n = 65), _PE-distal_ with TFBS/p300 peaks separated by 1-3kb from CAP-CGIs (n = 53), _PE-distal_ without CAP-CGIs within


3kb (n = 254) and AEs without CAP-CGI within 3kb (n = 8115). C, % of CpG methylation at CAP-CGI associated with _PE-distal_ (PE-CAP-CGI; n = 276) and CAP-CGI associated with the TSS of


developmental genes (devTSS-CAP-CGI; n = 1926) in the indicated cell types (Methods). D, For the identification of the _PE Sox1(+35)CGI_ deletion, primer pairs flanking each of the deletion


breakpoints (1 + 3 and 4 + 2), located within the deleted region (5 + 6) or amplifying a large or small fragment depending on the absence or presence of the deletion (1 + 2) were used. E,


H3K27me3 levels at _PE Sox1(+35)_ were measured by ChIP-qPCR in WT ESCs and in n = 2 independent _PE Sox1(+35)CGI_−_/_− ESCs clones using primers adjacent to the deleted region. The bars


display the mean of n = 3 technical replicates (black dots). F, Independent biological replicate for the data presented in Fig. 1d. _Sox1_ expression was investigated by RT-qPCR in ESCs and


AntNPC with the indicated genotypes. n = 2 independent _PE Sox1 CGI_−_/_− ESC clones (circles and diamonds) and n = 1 _PE Sox1_−_/_− clone were studied. For each cell line, n = 2 replicates


of the AntNPC differentiation were performed. Expression values were normalized to two housekeeping genes (_Eef1a_ and _Hprt_) and are presented as fold-changes with respect to WT ESCs. The


coloured area of the violin plot represents the expression values distribution and the center line represents the median. EXTENDED DATA FIG. 2 ENGINEERING OF PES MODULES WITHIN THE GATA6-TAD


AND FOXA2-TAD. A, Epigenomic and genomic features of two previously characterized PEs14 (_PE Six3(_−_133)_; _PE Lmx1b(+59)_) in which the oCGIs overlap with conserved sequences bound by


p300 and, thus, likely to contain relevant TFBS. B, The different _PE Sox1(+35)_ insertions were identified using primer pairs flanking the insertion borders (1 + 3 and 4 + 2; 1 + 5 and 6 + 


2; 1 + 3 and 6 + 2), amplifying potential duplications (4 + 3, 3 + 2 and 4 + 1; 6 + 5, 5 + 2 and 6 + 1) and amplifying a large or small fragment depending on the absence or presence of the


insertion (1 + 2), respectively. The PCR results obtained for WT ESCs and for two ESC clonal lines with homozygous insertions of the _PE Sox1(+35)_ modules in the _Gata6_-TAD are shown. C,


Independent biological replicate for the data presented in Fig. 2b. D-E, Strategy used to insert the _PE Wnt8b(+21)_ (d) or the _PE Sox1(+35)_ (e) components into the _Gata6_-TAD (d) or


_Foxa2_-TAD (e), respectively. The right panels shows the TADs in which _Gata6_ (d) or _Foxa2_ (e) are included according to publically available Hi-C data80,81, with the red triangle


indicating the integration site of the PE modules, approximately 100 Kb downstream of _Gata6_ (d) or _Foxa2_ (e). F-G, For identifying the successful insertion of the different _PE


Sox1(+35)_ (f) or _PE Wnt8b__(+21)_ (g) modules, primer pairs flanking the insertion borders (1 + 3 and 4 + 2; 1 + 5 and 6 + 2; 1 + 3 and 6 + 2), amplifying potential duplications (4 + 3, 3 


+ 2 and 4 + 1; 6 + 5, 5 + 2 and 6 + 1) and amplifying a large or small fragment depending on the absence or presence of the insertion (1 + 2), respectively, were used. The PCR results


obtained for two ESC clonal lines with homozygous insertions of the indicated PE modules in the _Foxa2_-TAD (f) or _Gata6_-TAD (g), respectively, are shown. H-I, Independent biological


replicates for the data shown in Fig. 2c (h) and Fig. 2d (i). In (c), (h) and (i), the expression differences between AntNPCs with the TFBS + CGI module and AntNPCs with the other PE modules


were calculated using two-sided non-paired t-tests (**: foldchange>2 & p<0.001; *: foldchange> 2 & p<0.05; ns: not significant; fold-change<2 or p>0.05). EXTENDED


DATA FIG. 3 PES ARE ENRICHED IN CPG-RICH MOTIFS AND ARE BOUND BY CXXC-DOMAIN CONTAINING PROTEINS. A, Comparison of the TF motifs enriched in either PEs with a CAP-CGI in <3kb and active


enhancers without CAP-CGIs in <3kb. Motif enrichment analyses were performed with _Homer_82 (left) and _AME_107 (right). B, ChIP-seq signals for KDM2B31 (upper panel) and TET132 (lower


panel) are shown around: _PE-distal_ with overlapping TFBS/p300 peaks and CAP-CGIs (n = 135), _PE-distal_ with TFBS/p300 peaks separated by 1bp-1kb from CAP-CGIs (n = 65), _PE-distal_ with


TFBS/p300 peaks separated by 1-3kb from CAP-CGIs (n = 53) and _PE-distal_ without CAP-CGIs within 3kb (n = 254). ChIP-seq profile plots were generated using either the p300 peaks (left) or


the CAP-CGIs (right) associated with the PEs as midpoints. EXTENDED DATA FIG. 4 ENGINEERING OF ESC LINES CONTAINING THE _PE SOX1(+35)_ TFBS AND AN ARTIFICIAL CGI WITHIN THE _GATA6_-TAD. A,


Strategy used to insert the _PE Sox1(+35)TFBS_ alone or together with an aCGI into the _Gata6_-TAD. The upper left panel shows the epigenomic and genetic features of the _PE Sox1(+35)_. The


lower left panel shows the _PE Sox1(+35)_ modules inserted into the _Gata6_-TAD. The right panel shows the _Gata6_-TAD according to publically available Hi-C data80,81. The red triangle


indicates the integration site of the _PE Sox1(+35)_ modules approximately 100 Kb downstream of _Gata6_. B, For the identification of the _PE Sox1(+35)TFBS+aCGI_ insertion, primer pairs


flanking the insertion borders (1+3 and 4+2), amplifying potential duplications (4 + 3 and 4 + 4) and amplifying a large or small fragment depending on the absence or presence of the


insertion (1 + 2), respectively, were used. The PCR results obtained for two ESC clonal lines with homozygous insertions of _PE Sox1(+35)TFBS+aCGI_ in the _Gata6_-TAD are shown. C,


Independent biological replicate for the data presented in Fig. 2f. The expression differences between AntNPCs with the TFBS+CGI module and AntNPCs with the other PE modules were calculated


using two-sided non-paired t-tests (*: foldchange> 2 & p<0.05; ns: not significant; fold-change<2 or p>0.05). D, For the identification of the aCGI insertion alone, primer


pairs flanking the insertion borders (1 + 3 and 4 + 2), amplifying potential duplications (4 + 3 and 4 + 4) and amplifying a large or small fragment depending on the absence or presence of


the insertion (1 + 2), respectively, were used. The PCR results obtained from two ESC clonal lines with heterozygous insertions of aCGI in the _Gata6_-TAD are shown. E, The expression of


_Gata6_ and _Sox1_ was measured by RT-qPCR in cells that were either WT or heterozygous for the aCGI insertion in the _Gata6_-TAD (two different clones; circles and diamonds). For each cell


line, n = 2 replicates of the AntNPC differentiation were performed. The results obtained in n = 2 independent biological replicates are presented in each panel (Rep1 and Rep2). EXTENDED


DATA FIG. 5 _GATA6_ EXPRESSION PATTERNS IN CELL LINES WITH THE _PE SOX1(+35)_ MODULES INSERTED WITHIN THE _GATA6_-TAD. A, _Gata6_ and _Sox1_ expression was measured by RT-qPCR in ESCs and at


intermediate stages of AntNPC differentiation (Day 3 and Day 4). The analysed cells were either WT or homozygous for the insertions of the different _PE Sox1(+35)_ modules within the


_Gata6_-TAD. For the cells with the PE module insertions, n = 1 clonal cell line was studied. For each cell line, n = 2 replicates of the AntNPC differentiation were performed. Expression


values were normalized to two housekeeping genes (_Eef1a_ and _Hprt_) and are presented as fold-changes with respect to WT ESCs. _B_, Quantification of cells expressing GATA6 or SOX1


according to immunofluorescence assays as the ones shown in Fig. 2g. The analysed cells were either WT of homozygous for the insertions of the different _PE Sox1(+35)_ modules within the


_Gata6_-TAD. C, The expression patterns of GATA6 (upper panel) and SOX1 (lower panel) were investigated by immunofluorescence in WT ESCs or AntNPCs that were either WT, homozygous for the


insertion of the _PE Sox1(+35)TFBS_ + _aCGI_ in the _Gata6_-TAD or heterozygous for the insertion of the aCGI alone in the _Gata6_-TAD. Nuclei were stained with DAPI. Scale bar = 100µm. D,


Quantification of cells expressing GATA6 or SOX1 according to the immunofluorescence assays described in (c). In (b) and (d), the bars display the mean of n = 3 technical replicates (black


dots). EXTENDED DATA FIG. 6 EPIGENETIC AND TOPOLOGICAL CHARACTERIZATION OF THE _GATA6_-TAD CELL LINES. A, Bisulfite sequencing data presented in Fig. 3a for the indicated _Gata6_-TAD cell


lines. The circles correspond to individual CpG dinucleotides located within the TFBS module. Unmethylated CpGs are shown in white, methylated CpGs in black and not-covered CpGs in gray. B,


Chromatin accessibility at the endogenous _PE Sox1(+35)_ and the _Gata6_-TAD insertion site (P1 and P2) were measured by FAIRE-qPCR in cells with the indicated genotypes. C, DNA methylation


and nucleosome occupancy at the TFBS were simultaneously analyzed by NOMe-PCR in the indicated _Gata6_-TAD ESC lines. In the upper panels, the black and white circles represent methylated or


unmethylated CpG sites, respectively. In the lower panels, the blue or white circles represent accessible or inaccessible GpC sites for the GpC methyltransferase, respectively. Red bars


represent inaccessible regions large enough to accommodate a nucleosome. The dotted line indicates where the TFBS starts. The grey shaded area represents a nucleosome-depleted region. D,


Scatter plots showing population-averaged nucleosome occupancy (red) and DNA methylation (black) levels within the TFBS in the indicated _Gata6_-TAD ESC lines. The grey shaded area


represents a nucleosome depleted region. E-F, H3K4me1, H3K4me3, H2AK119ub, CBX7 and PHC1 levels at the endogenous _PE Sox1(+35)_ and the _Gata6_-TAD insertion site (P1 and P2) were measured


by ChIP-qPCR in cells with the indicated genoytpes. ChIP-qPCR signals were calculated as described in Fig. 3. G, 4C-seq experiments were performed using the _Gata6_ promoter as a viewpoint


in AntNPC with the indicated genotypes. H, Pile-up plots showing average Hi-C7,52 signals in ESC between two groups of PE-gene pairs: PEs and developmental genes with CGI-rich promoters; PEs


and genes with CGI-poor promoters. For each PE-gene pair, both the PE and the gene were located within the same TAD. Left panels include all the considered PE-gene pairs (n = 401 pairs for


developmental genes; n = 900 for CGI-poor promoters; middle panels includes PE-gene pairs with the same genomic size in the two groups (n = 401 pairs); right panels consist of PE-gene pairs


with the same genomic size and genes with expression levels <1 FPKM9 (n = 290 pairs) (Methods). EXTENDED DATA FIG. 7 GENERATION OF CELL LINES WITH ENGINEERED _PE SOX1(+35)_ MODULES WITHIN


THE _GRIA1-TAD_ AND GLOBAL CHARACTERIZATION OF H3K27AC AND ERNA LEVELS AT ACTIVE ENHANCERS. A, ESC clonal lines with insertions of the different _PE Sox1(+35)_ modules were identified using


primer pairs flanking the insertion borders (1 + 3 and 4 + 2; 1 + 5 and 6 + 2; 1 + 3 and 6 + 2), amplifying potential duplications (4 + 3, 3 + 2 and 4 + 1; 6 + 5, 5 + 2 and 6 + 1) and


amplifying a large or small fragment depending on the absence or presence of the insertion (1 + 2), respectively. The PCR results obtained for WT ESCs or two ESC clonal lines with homozygous


insertions of the different _PE Sox1(+35)_ modules in the _Gria1_-TAD are shown. B, Independent biological replicate for the data presented in Fig. 4b. The expression differences between


AntNPCs with the TFBS + CGI module and AntNPCs with the other PE modules were calculated using two-sided non-paired t-tests (ns: not significant; fold-change<2 or p>0.05). C, Bisulfite


sequencing analyses of ESC lines with the indicated _PE Sox1(+35)_ modules inserted in the _Gria1_-TAD. The circles correspond to individual CpG dinucleotides located within the TFBS:


unmethylated CpGs (white), methylated CpGs (black) and not-covered CpGs (gray) are shown. The plot on the right summarizes the DNA methylation levels measured within the TFBS in the


indicated ESC lines. D, Active enhancers (AEs) identified in ESCs based on the presence of distal H3K27ac peaks were classified into three categories (Methods): Class I (AEs in TADs


containing only poorly expressed genes; n = 271(left); n = 340 (middle, right); Class II (AEs in TADs with at least one highly expressed gene; n = 271(left); n = 2353(middle); n = 


340(right)); Class III (AEs whose closest genes in the same TAD is highly expressed; n = 271(left); n = 1262(middle); n = 340(right)). The violin plots show the H3K27ac and eRNA levels in


ESC for each AE category. P-values were calculated using unpaired Wilcoxon tests with Bonferroni correction for multiple testing; the numbers in black indicate the median fold-changes


between the indicated groups; the coloured numbers correspond to Cliff Delta effect sizes: negligible (red) and non-negligible (green). In the left and right panels, eRNA levels for the


three enhancers classes are compared after correcting for H3K27ac differences (Methods). EXTENDED DATA FIG. 8 GENERATION AND CHARACTERIZATION OF CELL LINES WITH PE INSERTIONS AT THE _GRIA1_


AND _SOX7/RP1L1_ TADS. A, H2AK229ub and SUZ12 levels at the endogenous _PE Sox1(+35)_, the _Gria1_ promoter and the _Gria1_-TAD insertion site (P1 and P2; Fig. 4d) were measured by ChIP-qPCR


in ESCs with the indicated genotypes. ChIP-qPCR signals were calculated as in Fig. 3. B, ESC clonal lines in which a pCGI was inserted 380bp upstream of the _Gria1_-TSS in cells with the


indicated _PE Sox1(+35)_ modules 100Kb upstream from _Gria1_ were identified using the indicated primer pairs. PCR results for clonal ESC lines with the indicated double homozygous


insertions are shown. C, eRNA levels at the endogenous _PE Sox1(+35)_ and the _Gria1_-TAD insertion site (P1 and P2) were measured by RT-qPCR in cells with the indicated genotypes.


Expression values were calculated as in Fig. 3. D, Strategy to insert the indicated _PE Sox1(+35)_ modules 380bp upstream (red triangle) of the _Gria1_-TSS. E, ESC clonal lines with the _PE


Sox1(+35)_ modules 380bp upstream of the _Gria1-_TSS were identified using the indicated primer pairs. PCR for ESC clonal lines with homozygous insertions of the indicated _PE Sox1(+35)_


modules are shown. F, Independent biological replicate for the data presented in Fig. 5e. G, ESC clonal lines with the _PE Sox1(+35)_ modules within the _Sox7/Rp1l1_-TAD were identified


using primers flanking the insertion borders (1 + 3 and 4 + 2; 1 + 3 and 6 + 2), amplifying potential duplications (4 + 3, 3 + 2 and 4 + 1) and amplifying a large or small fragment depending


on the absence or presence of the insertion (1 + 2), respectively. PCR results for ESC clonal lines with homozygous insertions of the indicated _PE Sox1(+35)_ modules are shown. H,


Independent biological replicate for the data presented in Fig. 5g. In (a) and (c), the bars display the mean of n = 3 technical replicates (black dots). In (f) and (h), the expression


differences between AntNPCs with the TFBS + CGI module or the other PE modules were calculated using two-sided non-paired t-tests (***: foldchange> 2 & p<0.0001; ns: not


significant; fold-change<2 or p>0.05). EXTENDED DATA FIG. 9 GENERATION OF ESC LINES WITH STRUCTURAL VARIANTS. A, ESC lines with the _Six3/Six2_ TAD boundary deletion were identified


using primers flanking the deleted region (1 + 3 and 4 + 2), amplifying the deleted fragment (5 + 6) and amplifying a large or small fragment depending on the absence or presence of the


deletion (1 + 2), respectively. The PCR results for two ESC clonal lines with 36Kb homozygous deletions (_del36_) are shown. B, ESC lines with the _Six3/Six2_ inversion were identified using


primer pairs flanking the inverted region (1 + 3, 4 + 2, 1 + 4 and 3 + 2) and amplifying potential duplications (4 + 3, 3 + 3 and 4 + 4). The PCR results for two ESC clonal lines with 110Kb


homozygous inversions (_inv110_) are shown. C, Epigenomic and genetic features of a CTCF binding site112 (CBS; highlighted in grey) located upstream of the _PE Six1(_−_133)_ (highlighted in


yellow). D, ESC lines with the CBS deletion were identified using primers flanking the deleted region (1 + 2) or located in the CBS (3 + 4). The PCR results for two ESC clonal lines with


homozygous CBS deletions are shown. E, The expression of _Six3_ and _Six2_ was measured by RT-qPCR in cells with the indicated genotypes. For each of the engineered structural variants, n = 


2 independent clonal cell lines were generated (circles and diamonds). In each plot, the number of circles and/or diamonds corresponds to the number of AntNPC differentiations performed. The


results obtained in n = 2 independent biological replicates are presented in each panel (Rep1 and Rep2). Expression values are presented as fold-changes with respect to WT ESCs. F, ESC


lines with the _Lmx1a_-TAD boundary inversion were identified using primers flanking the inverted region (1 + 3, 4 + 2, 1 + 4 and 3 + 2) and amplifying potential deletions (1 + 4) or


duplications (4 + 3, 3 + 3 and 4 + 4). The PCR results for three ESC clonal lines with 260 Kb homozygous inversions (_inv260_) are shown. EXTENDED DATA FIG. 10 EXAMPLES OF HUMAN CONGENITAL


DISEASES CAUSED BY STRUCTURAL VARIANTS THAT DISRUPT DEVELOPMENTAL LOCI WITH PE-ASSOCIATED OCGIS. A, Upper panel: heterozygous inversion in a patient with Branchio-oculo-facial syndrome


(BOFS)5. Lower panel: epigenomic and genetic features of _TFAP2A_ neural crest (NC) cognate enhancers (left), 6q16.2 genes (middle) and _TFAP2A_ (right). In the lower left panel, enhancer


reporter assays in chicken embryos are shown for two representative _TFAP2A_ enhancers5. Computational CGI and NMIs are represented as green rectangles. The inversion places one _TFAP2A_


allele into a novel TAD and impairs its normal expression in NC cells due to the physical disconnection from its enhancers. _TFAP2A_ has a promoter with a large CGI cluster and marked with a


broad H3K27me3 domain in ESCs. Some _TFAP2A_ NC enhancers are associated with oCGIs and marked with H3K27me3 in ESCs. Moreover, this inversion places genes originally found within the


6q16.2 locus in proximity of the _TFAP2A_ NC enhancers within a shuffled domain. The promoters of these 6q16.2 genes (i.e _GPR63_ and _NDUFAF4_) contain a short CGI centered on their TSSs.


In agreement with our findings, none of the 6q16.2 genes is responsive to the _TFAP2A_ NC enhancers5. B, Upper panel: deletion found in families with brachydactyly involving a TAD boundary


located between the _EPHA4_ and the _PAX3_ loci63. Lower panel: epigenomic and genetic features of the _Epha4_ cognate enhancers in the mouse E11.5 limb (left) and in human ESCs (right).


Representative reporter assay in E11.5 mouse embryos for the hs1507 element is shown in the middle63. The deletion includes _EPHA4_, a gene highly expressed in the developing limb, and the


TAD boundary separating the _EPHA4_ and _PAX3_ TADs. As a result, enhancers that control _EPHA4_ expression in the limb establish ectopic interactions with _PAX3_ (that is enhancer adoption)


and strongly induce its expression in the limb. The _PAX3_ promoter contains a large CGI cluster and is marked with H3K27me3 in ESCs, while one of the major _EPHA4_ enhancers (hs1507) is


associated with an oCGI and is marked with H3K27me3 in ESCs. The high responsiveness of _PAX3_ to the _EPHA4_ enhancers is in agreement with our findings. SUPPLEMENTARY INFORMATION REPORTING


SUMMARY PEER REVIEW INFORMATION SUPPLEMENTARY DATA 1 List of oligonucleotides and antibodies. SUPPLEMENTARY DATA 2 List of knock-in donor sequences. RIGHTS AND PERMISSIONS Reprints and


permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Pachano, T., Sánchez-Gaya, V., Ealo, T. _et al._ Orphan CpG islands amplify poised enhancer regulatory activity and determine target gene


responsiveness. _Nat Genet_ 53, 1036–1049 (2021). https://doi.org/10.1038/s41588-021-00888-x Download citation * Received: 05 August 2020 * Accepted: 17 May 2021 * Published: 28 June 2021 *


Issue Date: July 2021 * DOI: https://doi.org/10.1038/s41588-021-00888-x SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link


Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative