
Recurrent deletions in clonal hematopoiesis are driven by microhomology-mediated end joining
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:

ABSTRACT The mutational mechanisms underlying recurrent deletions in clonal hematopoiesis are not entirely clear. In the current study we inspect the genomic regions around recurrent
deletions in myeloid malignancies, and identify microhomology-based signatures in _CALR_, _ASXL1_ and _SRSF2_ loci. We demonstrate that these deletions are the result of double stand break
repair by a PARP1 dependent microhomology-mediated end joining (MMEJ) pathway. Importantly, we provide evidence that these recurrent deletions originate in pre-leukemic stem cells. While DNA
polymerase theta (POLQ) is considered a key component in MMEJ repair, we provide evidence that pre-leukemic MMEJ (preL-MMEJ) deletions can be generated in _POLQ_ knockout cells. In
contrast, aphidicolin (an inhibitor of replicative polymerases and replication) treatment resulted in a significant reduction in preL-MMEJ. Altogether, our data indicate an association
between POLQ independent MMEJ and clonal hematopoiesis and elucidate mutational mechanisms involved in the very first steps of leukemia evolution. SIMILAR CONTENT BEING VIEWED BY OTHERS
INHERENT GENOME INSTABILITY UNDERLIES TRISOMY 21-ASSOCIATED MYELOID MALIGNANCIES Article 20 January 2024 SINGLE-CELL RNA SEQUENCING OF A NEW TRANSGENIC T(8;21) PRELEUKEMIA MOUSE MODEL
REVEALS REGULATORY NETWORKS PROMOTING LEUKEMIC TRANSFORMATION Article Open access 14 October 2023 REMOVAL OF TREX1 ACTIVITY ENHANCES CRISPR–CAS9-MEDIATED HOMOLOGOUS RECOMBINATION Article
Open access 12 August 2024 INTRODUCTION Human aged hematopoietic stem and progenitor cells (HSPCs) are prone to clonal expansion due to the acquisition of recurrent somatic mutations1,2.
This phenomenon is known as age related clonal hematopoiesis (ARCH)3,4,5. Somatic pre-leukemic mutations (pLMs) do not usually spread randomly across the possible physical positions of a
gene, but rather occur at apparent mutational hotspots. The majority of pLMs are nonsynonymous single nucleotide variants (SNVs), however other pLMs are due to recurrent insertions or
deletions (indels)3. While the mechanistic explanation for SNVs in cancer has been studied6, the mechanisms leading to recurrent indels in cancer are less understood. While different indel
signatures were previously identified in cancer genomes6, only two main mutational processes for somatic indels in cancer are mechanistically characterized. The first of which is polymerase
slippage, that frequently occurs in repetitive elements and long repeats (microsatellite (MS) signature)7, while the second is by the error prone process of double strand break (DSB) DNA
repair8,9. In a recently published study6, mutation signatures from 4,645 whole-genome and 19,184 exome sequences from different tumor types were analyzed. While 97% of all indels identified
in hypermutated cancer genomes carried MS indel signatures in thymine mononucleotide repeats, signatures associated with defective DSB repair were less abundant and mainly reported in
BRCA-related tumors (ovarian, breast and cervical carcinomas) owing to deficiencies in the homologous recombination pathway. While other specific indel signatures are associated with tobacco
smoking, exposure to UV light and aging, the exact mechanisms underlying them remain to be elucidated. The study of ARCH and pre-leukemia has been mainly focused on the phenotypic
consequences of pLMs, whereas the mutational processes underlying indels signatures in myeloid malignancies and pre-leukemia remain poorly understood. The current study sought to identify
deletion signatures in myeloid malignancies that would shed light on the origins and mutational processes promoting these variants. In this work we demonstrate that the most common recurrent
deletions in clonal hematopoiesis are the result of PARP1 dependent repair of DSBs by a sub-pathway of the MMEJ that is POLQ independent. RESULTS MOST COMMON DELETIONS IN MYELOID
MALIGNANCIES SHARE AN MH-BASED SIGNATURE To study deletion signatures in myeloid malignancies we analyzed targeted sequencing data (COSMIC). This analysis revealed that the most common
somatic deletions in myeloid malignancies share a similar signature (Fig. 1a) in which two pre-existing identical sequences (e.g. microhomologies (MHs)) are flanking the deletions (Fig. 1b).
The most common somatic MH-based deletions were found in _CALR_, _ASXL1_ and _SRSF2_ genes (Fig. 1a). We validated these results by analyzing deletion signatures in a well-defined targeted
sequencing cohort of 1540 adult-AML samples10. In this cohort MH-based deletions in _ASXL1_ and _SRSF2_ were the most recurrent deletions in AML (Fig. 1c). In a sequencing Myeloproliferative
neoplasms (MPN) cohort11 of 2035 patients, the authors identified by PCR a _CALR_ MH-based 52 bp deletion in 16.2% of 1321 Essential thrombocytosis (ET) patients and 13.6% of 309
Myelofibrosis (MF) patients. This validates that _CALR_ MH-based 52 bp deletion is the most commonly reported deletion in these two clinical entities. Importantly, analysis of this cohort
identified a recurrent MH-based deletion in the _NFE-2_ gene (Fig. 1d). An analysis of an unbiased whole-exome sequencing dataset derived from 562 AMLs (BeatAML)12 validated the high
occurrence rates of _ASXL1_ and _SRSF2_ MH-based deletions, and exposed additional non-recurrent MH-based deletions in _TET2_, _DNMT3a_, _CEBPA_ and _RUNX1_ genes (Supplementary Fig. 1a).
Taken together, our analyses suggest that the most common deletions in myeloid malignancies are MH-based deletions. We hypothesized that specific mutational mechanisms may underlie these
recurrent deletions together with selective pressures. To elucidate the interplay between selective pressures and specific mutational mechanisms, we analyzed somatic mutations reported in
the _ASXL1_ gene. HIGH RECURRENCE RATES OF _ASXL1_ MH-BASED DELETION IN MYELOID MALIGNANCIES ARE DRIVEN BY SPECIFIC MUTATIONAL MECHANISMS Truncating events occur across the entire exon 12 of
the _ASXL1_ gene and have been suggested to have a gain-of-function role in promoting myeloid malignancies13. However, three truncating events were significantly more abundant than others
(Supplementary Fig. 2): the nonsense mutation p.R693*, the insertion p.G646fs*12 and the MH-based deletion p.E635fs*15. We compared the frequencies of these common events between
hematological and solid tumors. Significant differences were observed in the prevalence of the MH-based deletion (p.E635fs*15) between hematological (153/376 deletion cases) and
non-hematological (4/103 cases) cancers (_P_ < 0.00001) (Fig. 2a, b). While the MH-based deletion leads to a similar truncated ASXL1 protein as the other two variants (650–700 amino
acids), we did not observe similar differences in the frequencies of neither p.R693* nor p.G646fs*12. While we cannot rule out a selective advantage for the MH-based deletion truncated ASXL1
protein specifically in the hematopoietic system, a possible interpretation of these results is that specific mutational mechanisms contribute to leukemogenesis in the myeloid malignancies’
cell of origin. To address this hypothesis, we first aimed at identifying the recurrent MH-based deletions’ cell of origin. MULTIPOTENT HSCS ARE THE CELLS OF ORIGIN OF SOMATIC MH-BASED
DELETIONS Somatic mutations in _CALR_, _ASXL1_ and _SRSF2_ genes have been shown by others to be pre-leukemic lesions originating in early multipotent hematopoietic stem cells14,15. We
wished to validate that multipotent HSCs are the cells of origin for the three recurrent MH-based deletions in these genes. To this end, we first analyzed published sequencing data from
healthy individuals and pre-AML cases14. We identified the recurrent MH-based deletion in _ASXL1_ in three of 124 pre-AML cases 10.7, 8.8 and 1.7 years prior to AML diagnoses (Supplementary
Table 1) and none among the 676 controls14. Additionally, we purified T-cells derived from five AML samples harboring MH-based deletions in _ASXL1_ and _SRSF2_ genes (Supplementary Table 2).
Somatic MH-based deletions were identified by next-generation sequencing (NGS) in paired T-cells at low allele frequencies (Fig. 3a, Supplementary Table 3). Similarly, we identified the
recurrent MH-based _CALR_ deletion in isolated HSPCs and mature cells from two different Myelofibrosis (MF) cases (Supplementary Table 2). This deletion was identified among HSCs, more
committed progenitors, and mature myeloid and lymphoid cells (Fig. 3b). We further transplanted CD34 positive cells from one of the cases into NOD/SCID/IL-2Rgc-null (NSG) mice. After 16
weeks multi-lineage graft was observed with the _CALR_ deletion being found in both myeloid and lymphoid cells (Fig. 3c). Taken together, these data highlight the fact that recurrent
MH-based deletions in _ASXL1_, _SRSF2_ and _CALR_ genes originate in early multipotent HSCs and are part of clonal hematopoiesis. As we provide evidence that these deletions may be the
result of specific mutational mechanisms, we aimed to gain insights into these mechanisms by modeling the generation of recurrent MH-based deletions. CRISPR/CAS9 MEDIATED DSBS RECAPITULATE
RECURRENT MH-BASED DELETIONS IN MYELOID MALIGNANCIES Since MH-based deletion signatures are considered to be the result of mutagenic DSB repair8, we studied the generation of these recurrent
deletions in vitro by introducing DSBs using the CRISPR/Cas9 system. We introduced sequential DSBs along the hotspot regions of the _CALR_, _ASXL1_ and _SRSF2_ genes in K562 CML cell line.
Specific DSBs successfully recapitulated recurrent MH-based deletions in _ASXL1_ and _SRSF2_ genes in K562 cells (Fig. 4a, b). However, we were unable to recapitulate the _CALR_ recurrent
frameshift MH-based deletion in vitro at high allele frequency (Supplementary Fig. 3). We validated these results by introducing specific DSBs in four different hematologic cell lines of
different genomic and cytogenetic backgrounds. Recurrent MH-based deletions in _ASXL1_ and _SRSF2_ were successfully obtained in all of these cells (Fig. 4c, d, Supplementary Fig. 4). We
further introduced these DSBs in primary human CD34 + HSPCs isolated from six individuals between 30 and 63 years of age (Supplementary Table 4). Remarkably, high frequencies of recurrent
MH-based deletions in _ASXL1_ and _SRSF2_ genes were obtained in all six primary samples (Fig. 4e, f, Supplementary Fig. 4). Our CRISPR/Cas9 experiments of sequential DSBs along the _ASXL1_
and _SRSF2_ genes provide evidence that the relative frequencies of the different deletions (including the recurrent MH-based deletions) are dependent on specific DSB positions (Fig. 4a, b).
As we hypothesized that specific DSB repair pathways would also contribute to the obtained indel landscape, we next aimed to understand which repair machinery is involved in generating the
recurrent MH-based deletions. RECURRENT MH-BASED DELETIONS IN MYELOID MALIGNANCIES ARE THE RESULT OF PARP1 MEDIATED MMEJ REPAIR We demonstrated that specific DSBs in _ASXL1_ and _SRSF2_
genes lead to similar indel distributions across many cell lines and primary cells of different genetic backgrounds. We therefore continued to study the DSB repair machineries leading to
MH-based deletions in K562 cells. To address this, we manipulated key players in the MMEJ and the classical non-homologous end-joining (c-NHEJ) mutagenic DSB repair pathways. In vitro
inhibition of the MMEJ pathway by PARP1 inhibitor (Rucaparib camsylate) prior to DSB induction, similar to a previously descried method16, resulted in significantly reduced allele fractions
of both _ASXL1_ and _SRSF2_ recurrent MH-based deletions (Fig. 5a–f, Supplementary Fig. 5). This provides evidence that these deletions are the result of PARP1 mediated MMEJ repair. To
further validate our results, we inhibited the c-NHEJ pathway by generating K562 _LIG4_ −/− cells. _LIG4_ knockout resulted in a significant increase in the allele fractions of the recurrent
MH-based deletions, further ruling out the role of c-NHEJ in generating these variants (Fig. 5c, f, Supplementary Fig. 5). Interestingly, _LIG4_ −/− cells did not produce short deletions
and insertions close to the breakpoint, suggesting that these indels are due to LIG4 mediated c-NHEJ repair. Moreover, high dosages of rucaparib treatment reduced insertions at the
breakpoints, while short deletions were obtained. This indicates that PARP1 may not be specific to MMEJ and it may have some role in c-NHEJ repair, as was previously described17. Altogether,
our results suggest that recurrent MH-based deletions in _ASXL1_ and _SRSF2_ are the consequence of PARP1 mediated and LIG4 independent MMEJ repair. We thereafter refer to these deletions
as pre-leukemic MMEJ deletions (preL-MMEJ deletions). Of note, while _CALR_ recurrent MH-based deletion could be obtained at very low allele fractions in Wild type (WT) K562 cells, a 20-fold
increase in allele fraction was observed among _LIG4_ −/− cells (Supplementary Fig. 6). DNA Polymerase theta (POLQ) was shown to be a key participant in MMEJ8,18,19 and also to play a role
in CRISPR/Cas9 mediated MMEJ repair20. We therefore aimed to assess POLQ contribution to the preL-MMEJ deletions. PREL-MMEJ DELETIONS CAN BE OBTAINED IN _POLQ_ KNOCKOUT CELLS We generated
three distinct K562 _POLQ_ −/− cells harboring frameshift mutations at exon 14, 16 and 18 of the _POLQ_ gene. Each one of these mutations presumably leads to a premature stop codon upstream
or inside the polymerase domain, previously shown to be required for end joining repair21,22. In _SRSF2_, a significant decrease in total fractions of MH-based deletions together with an
increase in short deletions, validated a role for POLQ in MMEJ (Fig. 6f, g). In contrast, _POLQ_ knockout resulted in a mild and mostly insignificant decrease in the fractions of both
preL-MMEJ deletions (Fig. 6c, h, Supplementary Fig. 7). This suggests that POLQ has a limited role in the pathway leading to preL-MMEJ deletions. We therefore hypothesized that other DNA
polymerases may collaborate with PARP1 and be involved in the pathway leading to preL-MMEJ deletions in humans. In order to identify such an involvement, we analyzed gene expression data of
single human HSCs. INHIBITION OF REPLICATIVE DNA POLYMERASES BY APHIDICOLIN REDUCES THE FORMATION OF PREL-MMEJ DELETIONS We next studied the gene expression profiles of human single
bone-marrow (BM) progenitor cells as was previously described23. We analyzed BM CD34+ profiles from the Human Cell Atlas Consortium’s immune census dataset
(https://preview.data.humancellatlas.org/) (Supplementary Fig. 8) and focused on multipotent HSCs expressing CD34 and AVP markers, and proliferating MPPs (cells of origin of MH-based
deletions) (Supplementary Fig. 9). We noticed that as HSCs enter cell replication, they upregulate components of the c-NHEJ, MMEJ, and HR pathways (Supplementary Fig. 10). Our experimental
results demonstrated that inhibition of _PARP1_ by rucaparib camsylate resulted in a decreased production of preL-MMEJ deletions in vitro. As we also provide evidence that the preL-MMEJ
deletions originate in multipotent HSCs, we assessed for a possible correlation between the expression levels of _PARPs_ and a list of human DNA polymerases24 specifically in HSCs and MPPs,
for polymerases that are not correlated throughout all progenitor states. Among this sub-population, _PARP1_ expression levels were shown to significantly correlate only in HSCs with _POLQ_,
but also with _POLD1_, _POLE_ and _POLE4_ gene expression levels (Fig. 7a). _POLD1_ and _POLE_ genes encode for the catalytic subunits of the B-family DNA polymerases delta and epsilon
respectively, which are the major replicases that carry out DNA replication in eukaryotes25. We next aimed to experimentally assess whether these replicative DNA polymerases contribute to
the formation of preL-MMEJ deletions. To address this issue, we treated K562 WT and _POLQ_ −/− cells with low and high doses of aphidicolin, a potent inhibitor of eukaryotic replicative
B-family DNA polymerases26,27,28,29. Remarkably, aphidicolin treatment significantly reduced the fractions of preL-MMEJ deletions in both WT and _POLQ_ −/− cells (Fig. 7b–g, Supplementary
Fig. 11). The relative contribution of _POLQ_ knockout to this reduction seemed to be negligible compared to aphidicolin treatment (Fig. 7b–g). A parallel increase was observed in the
fraction of c-NHEJ associated short deletions in both genomic loci (Supplementary Fig. 11b, e). Altogether, these results suggest a sub-pathway of the MMEJ repair, leading to preL-MMEJ
deletions. This sub-pathway seemed to be mediated by PARP1, active in the absence of POLQ, and successfully inhibited by aphidicolin treatment. DISCUSSION In the current study, we establish
that the three most common pre-leukemic somatic deletions _ASXL1_ c.1900_1922del23, _SRSF2_ c.284_307del24 and _CALR_ c.1092_1143del52 (termed here preL-MMEJ deletions) share a similar
deletion signature suggesting similar underlying mutational processes (Fig. 1). We provide evidence that these hotspot-deletions occur not just due to selective advantage, but also as a
result of unique mutational mechanisms (Fig. 2) and that they originate in multipotent HSCs (Fig. 3). All three preL-MMEJ deletions were successfully recapitulated following DSBs (Fig. 4,
Supplementary Fig. 6) that are repaired by the PARP1 dependent MMEJ (Fig. 5). Knockout of _POLQ_ gene (which encodes the main polymerase involved in MMEJ) did not significantly reduced
preL-MMEJ deletions (Fig. 6). Single cell RNA-seq data of human HSCs suggest that the MMEJ pathway is activated as HSCs replicate, and exposed a correlation between the gene expression of
_PARP1_ and _POLQ, POLD1_ and _POLE_ (Fig. 7a). Finally, inhibition of the replicative polymerases and consequently cellular replication by aphidicolin resulted in a significant reduction of
preL-MMEJ deletions (Fig. 7). Collectively, our data provide insights into mutational mechanisms in HSCs and the early stages of clonal hematopoiesis. In the current study we provide
evidence that MMEJ is a major driver in early leukemia evolution. HSCs appear to use MMEJ over c-NHEJ repair as reflected by the much higher prevalence of the preL-MMEJ deletions compared to
other short deletions. Our analysis of single cell RNA-seq data together with in vitro experimental results indicate that the MMEJ pathway is active during cell replication. Our findings
demonstrate that synchronizing cells at the G1/S boundary by aphidicolin substantially reduced MMEJ, while c-NHEJ repair was relatively increased. This is in line with previous reports
demonstrating a significant elevated MMEJ activity during S and G2 cell cycle phases owing to CtIP phosphorylation as cells enter S-phase30,31. Phosphorylated CtIP is in turn stimulating the
MRN complex mediated end-resection, which is a critical step in the initiation of both MMEJ and HR32. However, the full biological scenario in which DSBs occur in HSCs remains unclear and
is important to understand in order to potentially prevent preL-MMEJ deletions. One possible scenario as was previously suggested33 is that different types of physiological stress lead to
DNA damage and consequently to the exit of HSCs from dormancy. An alternative scenario is that aged HSCs carry more DSBs and Gamma H2AX foci due to altered dynamics of DNA replication
forks34. In both cases, MMEJ may be the preferred repair choice as it is available and efficient during cell replication. Future studies should shed more light on the origins of the DNA
damage leading to preL-MMEJ, either due to extrinsic physiological stress or age related replicative stress. Furthermore, as the mechanism of MMEJ underlying preL-MMEJ deletions is not fully
resolved in the current study, a more accurate description of the sub-pathway responsible for preL-MMEJ deletions is needed. Here we demonstrate that preL-MMEJ deletions are the result of
PARP1 mediated and POLQ independent sub-pathway of the MMEJ. While PARP1 is known to regulate the MMEJ pathway32 it also plays a role in c-NHEJ17 and single strand break (SSB) repair35. We
cannot rule out that human preL-MMEJ might be the result of SSB. Strand synthesis during MMEJ should require the involvement of DNA polymerases, we propose a model in which replication
associated DNA polymerases are involved in preL-MMEJ (Fig. 8). However, future studies are warranted to assess whether aphidicolin related reduction of preL-MMEJ is due to a direct
inhibition of replicative polymerases or as a consequence of cell cycle arrest. PreL-MMEJ deletions are typically identified among the elderly. An important factor contributing to DSB repair
choice, might be the age of the cell of origin in which the DSBs occur. In our CRISPR/Cas9 based model, similar frequencies of preL-MMEJ deletions were obtained in young and aged human
HSPCs. This might be due to the fact that our model system is not mimicking the exact biological context in which preL-MMEJ arise. It remains unclear whether preL-MMEJ deletions can occur in
HSCs at any age and expand due to selective advantage at older age or that preL-MMEJ deletions preferentially occur in aged HSCs. To elucidate this, the phylogenetic origins of preL-MMEJ
deletions can be studied in single cells as was previously done36 to determine the exact age in which they originate. While preL-MMEJ deletions in _ASXL1_ and _SRSF2_ are the most recurrent
deletions in AML, they are identified in a relatively small proportion of AML patients (~2%). However, these deletions signatures are not the sole hallmark of MMEJ repair. It was recently
shown that other genetic alterations such as blunt-end deletions, templated insertions37 and copy number variations (CNV)38 were also the result of POLQ mediated MMEJ. Large CNVs containing
_DNMT3a_ and _TET2_ genes can be found among healthy individuals39, however, the mutational mechanisms promoting them are underexplored. Interestingly, large numbers of AML patients harbor
recurrent somatic insertions that are templated from nearby genomic sequences. These duplications can be found along _CALR_, _ASXL1_ and _SRSF2_ hotspot regions, as well as in _NPM1_, _FLT3_
and _MLL_ genes. Altogether, it is possible that MMEJ related contribution to genetic alterations in pre-leukemia and leukemia are underestimated. In the current study, we aimed at
understanding the biological processes driving early mutations in myeloid malignancies. Our findings support the growing evidence that some cancer mutations do not occur randomly but rather
their physical positions and patterns are determined by more than the selective advantage they provide. DSBs followed by MMEJ repair might shape the mutational landscape observed in myeloid
malignancies. In line with this, recent studies demonstrated hyperactivity of the MMEJ pathway in _IDH2_40 and _FLT3-ITD_ mutated AMLs41 and the sensitivity of some AML42 and MPN cells43 to
PARP1 inhibition. Such sensitivity could be explained by the dependency of HSPCs on MMEJ and the synthetic lethality of PARP1 inhibitors. Further characterization of these findings is
required to potentially intervene with the MMEJ pathway and prevent somatic mutagenesis associated with clonal hematopoiesis. METHODS SAMPLES De-identified primary peripheral blood samples
were obtained with informed consent from Acute Myeloid Leukemia (AML) and Myelofibrosis (MF) patients through the Leukemia Tissue Bank at Princess Margaret Cancer Centre in accordance with
regulated procedures approved by the Research Ethics Board of the University Health Network (REB 01-0573-C). De-identified mobilized peripheral blood autologous transplant products were
obtained with informed consent from Non-Hodgkin Lymphoma (NHL), Multiple Myeloma (MM) and Amyloidosis patients through the Leukemia Tissue Bank at Princess Margaret Cancer Centre in
accordance with regulated procedures approved by the Research Ethics Board of the University Health Network (ethics committee protocol # 15-9633), and Weizmann institute of science (IRB
protocol #337-1). All patients provided written informed consent for the usage of their samples for research purposes and for the usage of their clinical and biological data. We complied
with all relevant ethical regulations for work with human participants. PRIMARY CD34 + ENRICHMENT AND PRE-ELECTROPORATION CULTURING CD34 + cells were isolated from mononuclear cells derived
from mobilized peripheral blood stem cells (PBSC) autologous transplant products by using EasySep Human CD34 Positive Selection Kit II (StemCell Technologies, 18056). Cell were cultured for
48 h before electroporation in StemSpan™ Serum-Free Expansion Medium II (SFEM II) (StemCell Technologies, 09605) with streptomycin (20 mg/mL), penicillin (20 unit/mL) and the following human
cytokines (all from GenScript unless stated otherwise, catalog numbers and dilution used in parentheses): FLT3L (Z02926, 100 ng/mL), G-CSF (Z02980, 10 ng/mL), SCF (Z02692, 100 ng/mL), TPO
(Pepro-Tech, 300-18, 25 ng/mL) and IL-6 (Z03034, 10 ng/mL). Cells were cultured at a density of 2.5*10^5 cells/ml in 96-well U-bottom plates. CELL LINES PRE-ELECTROPORATION CULTURING K562,
Marimo, MOLM-14, OCI-AML-2 and OCI-AML-3 cell lines were used in this study. All cell lines were obtained from ATCC, were authenticated by whole-exome sequencing and tested negative for
Mycoplasma contamination. All cell lines were sub-cultured 2 days before electroporation in RPMI 1640 Medium containing L-Glutamine (Biological Industries, 01-100-1 A) with 10% FBS,
streptomycin (20 mg/mL) and penicillin (20 unit/mL) at a density of 3*10^5 cells/ml. CRISPR/CAS9 EXPERIMENTS 20 bp sgRNA sequences were designed along the genomic loci of interest using
DESKGEN algorithm (https://www.deskgen.com/landing/#/login). Sequential DSBs that are described in Fig. 4a, b and Supplementary Fig. 3 were performed using the px330 plasmid system, similar
to a previously described method44,45. All relevant Px330 plasmid preparations details are described under section 4.1. Electroporation reactions using px330 plasmids were performed at 2 ug
purified plasmids per reaction. All other CRISPR/Cas9 experiments were done using sgRNAs guide 2 (_ASXL1_) and 5 (_SRSF2_) that were synthesized from IDT and are detailed under section 4.2.
PX330 PLASMID PREPARATIONS For experiments involving sequential DSBs along the _ASXL1_, _SRSF2_ and _CALR_ sequences, sense and antisense oligonucleotides for each sgRNA with overhangs
compatible to Bbsi-digested px330 were designed and ordered from IDT. Each oligos pair was further phosphorylated and annealed using T4 PNK (NEB, M0201S) and T4 Ligation Buffer (NEB,
B0202S). Phosphorylation and annealing reactions were performed at 37 °C for 30 min, followed by 95 °C for 5 min and ramping down to 25 °C at 5 °C /min. Annealed oligo pairs were then
ligated into a previously Bbsi digested px330 plasmid. Per reaction, 50 ng digested px330 was mixed with 1:250 diluted oligo duplex with 2X quick ligation buffer and quick ligase (NEB,
M2200S) at 16 °C overnight. BioSuper DH5α competent cells (Bio-lab, cat. no. 959758026600) were transformed with Px330. Bacteria was re-suspended and plated on LB agar ampicillin dishes and
incubated at 37 °C over-night. Colonies were then screened and grew in 2-3 ml LB + Ampicillin at 37 °C overnight in a shaker (250 rpm). For each colony (sgRNA), plasmid DNA was extracted
using the QIAprep Spin Miniprep standard protocol (Qiagen, cat. No. 27104). To validate the presence of the desired inserts, Sanger sequencing reactions were performed for each plasmid using
the U6 promoter primer ACTATCATATGCTTACCGTAAC. Electroporation reactions using purified px330 plasmids were done at 2ug plasmid per reaction. RNP COMPLEX PREPARATIONS All other CRISPR/Cas9
experiments were done using sgRNAs guide 2 (_ASXL1_) and 5 (_SRSF2_) that were synthesized from IDT. Lyophilized sgRNAs were re-suspended in IDTE buffer (PH 7.5) to a final concentration of
100 uM. RNP complex for each reaction were generated by mixing 1.2 ul sgRNA, 1.7 ul Cas9 protein (IDT) and 2.1 ul PBS followed by incubation for 10 min at 20 degrees. ELECTROPORATION
REACTIONS All electroporation reactions were performed using the 16-strip Lonza 4D nucleofector kit. Pre-electroporated cells were washed in PBS and spun down at 350xg for 10 min. Between
2*10^5 – 1*10^6 cells per reaction were re-suspended in 20 ul SF solution (K562, MARIMO, MOLM-14 and OCI-AML2 cell lines), SE solution (OCI-AML3) or P3 solution (primary CD34 + cells) and
added to the RNP complex or 2ug px330 plasmid. FF-120, DN-100, DP-115, DN-100, EO-100 and DZ-100 electroporation programs were used for K562, MARIMO, MOLM-14, OCI-AML2, OCI-AML3 and primary
CD34 + cells respectively. Immediately after electroporation, pre-warmed media were added and cells were cultured at the same conditions as the pre-electroporation culturing for additional 2
days (primary CD34 + ) or 4 days (all other cell lines) before they were lysed for NGS sequencing. INHIBITORS TREATMENT In experiments involving MMEJ inhibition, K562 cells were
sub-cultured 48 or 2 h before electroporation in a medium containing different dosages of rucaparib camsylate (Sigma, PZ0036) or aphidicolin (Sigma, A4487) respectively. Control cells were
sub-cultured in a medium containing vehicle (DMSO). 48 h following electroporation cells were washed with PBS and re-suspended in fresh clean media. Cells were lysed 5 days after
electroporation for subsequent NGS sequencing. K562 KNOCKOUT CELL LINES GENERATION K562 cells were electroporated using sgRNA guide targeting _LIG4_ or _POLQ_ genes followed by a sorting for
live single cells using BD FACSMelody™ Cell Sorter (BD Biosciences). Sorted cells were plated onto 96-well plates containing 100ul/well RPMI 1640 Medium with L-Glutamine (Biological
Industries, 01-100-1 A), 10% FBS, streptomycin (20 mg/mL) and penicillin (20 unit/mL). 7 days after sorting, 100 ul fresh media were added to each well. Cells were further maintained by
replacing 100ul medium from each well once a week. Cell colonies were lysed 28 days after sorting for subsequent NGS sequencing. For cell lysis, cells were spun at 2000g for 10 min, cells
pellets were mixed with 30 ul of 50 mM NaOH and heated at 99 °C for 10 min. Then, the reactions were cooled down at room temperature and 2 ul 1 M Tris PH = 8 was added to each reaction. NGS
sequencing and analysis were performed as described under sections 7 and 8. Colonies containing bi-allelic frameshift indels at the genomic loci of interest were further isolated and
expanded (Supplementary Fig. 12). NGS LIBRARY AND TARGETED SEQUENCING For all NGS libraries, we used cell lysis products that served as a template for PCR amplification and library
preparations. Dual indexed illumina Libraries were generated using two-step PCR procedure. 1st PCR primer prefix sequences and 2nd PCR primer sequences were used, similar to a previously
described method46. All relevant details are as follows: Target-specific primers were designed by Primer3plus (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi) and were
ordered with the described 5’ prefixes46 (IDT). 1st PCR was applied to target the regions of interest. The reaction mixture was composed of a PCR ready mix (using NEBNext® Ultra™ II Q5®
Master Mix, NEB, M0544L), a cell lysis product and a final primer concentration of 1uM each. PCR protocol was as follows: 98 °C for 30 s, followed by 40 amplification cycles of 98 °C for 10
s,65 °C for 30 s and a final elongation at 65 °C for 5 min. Following dilution of the 1st PCR products with nuclease free water (1:1000), a 2nd PCR was performed using primers composed of
Illumina sequencing primers, indexes and adapters, under the same conditions as the 1st PCR with the exceptions of final primer concentration of 0.5uM each and 20 cycles of amplifications.
Full sgRNA and primer sequences that were used throughout this study are provided in Supplementary Table 5. Barcoded 2nd PCR products were pooled together at equal volume. Pooled library
sizes were selected (2% gel, BluePippin, Sage Science) and sent for 2 × 150-bp deep sequencing (Miseq System, Illumina). VARIANT CALLING 2 × 150-bp pair-end reads deep sequencing data
(~5000X depth) from Illumina platform were converted to fastq format. Minimap2.1 algorithm47 was applied for alignment of the processed fastq files to hg19 genome based targeted sequences
resulting in sam files that were further sorted and indexed using pysam 0.15.1 (https://github.com/pysam-developers/pysam). All reads from sorted bam files were assigned to new read groups
using picard 2.8.3 ‘AddOrReplaceReadGroups’ command (http://broadinstitute.github.io/picard). In order to avoid misalignments, local realignment was preformed using GATK3.7
‘RealignerTargetCreator’ and ‘IndelRealigner’ commands48. Mpileup files were generated by samtools 1.8 followed by SNVs and small indels detection using varscan2.3.9 ‘pileup2cns’ command to
generate VCF files containing consensus variant calls49. HSPCS CELL SORTING Mononuclear cells (106 cells per 100 ul) from peripheral blood samples of two myelofibrosis (MF) patients
underwent CD34 enrichment by magnetic beads (Miltneyi Inc.). Both CD34 positive and negative cell fractions underwent fluorescence-activated cell sorting as was previously described50. Cells
were stained with the following antibodies (all from BD Biosciences unless stated otherwise, catalog numbers and dilution used in parentheses): anti-CD45RA-FITC (555488, 1:25),
anti-CD38-PE-Cy7 (335790, 1:200), anti-CD10-Alexa-700 (624040, 1:10), anti-CD7-Pacific Blue (642916, 1:50), anti-CD45-V500 (560777, 1:200), anti-CD34-APC-Cy7 (custom made by BD, CD34 clone
581, 1:100), anti-CD34-PerCP-Efluor 710 (e-Bioscience 46-0344-42, 1:100), anti-CD33-PC5 (Beckman Coulter PNIM2647U, 1:100), anti-CD19-PE (340364, 1:200), anti-CD3-FITC (349201, 1:100),
anti-CD56-Alexafluor 647 (557711, 1:100), Streptavidin-QD605 (Invitrogen Q10101MP, 1:200), anti-CD8-APC-H7 (560179, 1:200), anti-light-chain lambda-V450 (561379, 1:200), anti-light-chain
kappa-V450 (561327, 1:200), and anti-CD57-APC (555518, 1:200). Subsequently, cells were sorted on a FACSAria III (BD Biosciences) to a post-sort purity of >95%. CD34 enriched cell
fraction was gated on CD45 + /CD33- and sorted into the following HSPCs subpopulations: HSC/MPP (CD38-/CD34 + /CD45RA-); MLP (CD38-/CD34 + /CD45RA + ); CMP/MEP (CD38 + /CD34 +
/CD7-/CD10-/CD45RA-); and GMP (CD38 + /CD34 + /CD7-/CD10-/CD45RA + ) subsets. CD34 negative cell fraction was sorted into the following mature cell populations: Myeloid cells (CD45dim/CD33 +
); T cells (CD45high/CD3 + /CD8 + ); B cells (CD45high/CD19 + / light chains lambda or kappa + ); and NK cells (CD45high/CD56 + /CD57 + ). DNA from each sorted subpopulation was isolated
and amplified using the RepliG whole genome amplification (WGA) kit (REPLI-g Mini Kit for 16 h). XENOTRANSPLANTATION ASSAYS Animal experiments were performed in accordance to the IACUC of
the Weizmann Institute, its relevant guidelines and regulations (11790319-2) and we complied with all relevant ethical regulations for animal testing and research. Eight- to 12-week-old
female NOD/SCID/IL-2Rgc-null (NSG) mice were maintained under a 12 h dark/light cycle, at an ambient temperature of around 22 degrees and humidity of 50%. Mice were sub-lethally irradiated
(225 cGy) 24 h before transplantation. CD34 + cells were enriched from peripheral blood mononuclear cells of a myelofibrosis (MF) patient by magnetic beads (Miltneyi Inc.) and 50,000 cells
were injected into the right femur. Mice were euthanized 16 weeks following transplantation and human engraftment in the injected right femur and non-injected bone marrow (left femur,
tibias) was evaluated by flow cytometry analysis using the BD LSR II flow cytometer (BD Biosciences). The threshold for detection of engraftment was 0.1% human CD45 + cells. Human myeloid
(human CD45 + /CD33 + /CD19-) and B cells (human CD45 + /CD33-/CD19 + ) were sorted out of the xenografts using the following antibodies (all from BD Biosciences unless stated otherwise,
catalog numbers and dilution used in parentheses): anti-CD45-APC (340943, 1:200), anti-CD19-PE (340364, 1:200) and anti-CD33-PE-Cy5 (Beckman Coulter catalog number IM2647U, 1:200). DNA from
each sorted subpopulation was isolated and amplified using the RepliG whole genome amplification (WGA) kit (REPLI-g Mini Kit for 16 h). DDPCR ANALYSIS OF GRAFT SUBPOPULATIONS ddPCR reaction
was performed by using probes designed for _CALR_ deletion as described elsewhere51. Amplified DNA (2ul from a 1:20 dilution of a 16 h REPLI-g Mini Kit whole-genome amplification, Qiagen)
from each sorted population was tested in a 96-well plate in duplicate according to the manufacturer’s protocol. Mutant and wild-type sequences were read using a droplet reader with a
two-color fluorescein/HEX fluorescence detector (Bio-Rad). The mutant allele frequency was calculated as the fraction of mutant-positive droplets divided by total droplets containing a
target. As previously reported1 the minimum detection level was 1:1,000 (0.1%). Variants were considered present if there were at least three dots in the mutant fluorescein channel resulting
in VAF > 0.1%. T-CELLS ISOLATION AND EXPANSION FROM PRIMARY AML SAMPLES CD3 + cells were isolated from peripheral blood mononuclear cells of AML patients by using EasySep Human CD3
Positive Selection Kit II (StemCell Technologies, 17851) and re-suspended in RPMI 1640 Medium with L-Glutamine (Biological Industries, 01-100-1 A), 10% FBS, 250 IU/ml human IL-2
(ThermoFisher scientific, BMS334) and 5 ug/ml anti-CD28 antibody (clone CD28.2, ThermoFisher scientific, 16-0289-81). Re-suspended cells were added to 24-well plate that was pre-coated for 2
h with PBS containing 5 ug/ml anti-CD3 antibody (clone OKT3, ThermoFisher scientific, 16-0037-81). Cells were cultured for 4 days before re-suspension in a fresh RPMI 1640 Medium, 10% FBS,
250 IU/ml hIL-2 and re-plating in a 6-well plate. Cells were then cultured for additional 21 days. Cells purity was assessed by flow cytometry before they were lysed for subsequent NGS
sequencing. SINGLE CELL RNA-SEQ ANALYSES HSPCs RNA-seq profiles were isolated from the HCA immune census BM data based on CD34 expression. A total of 19757 profiles were isolated from the
roughly 310,000 BM profiles from 8 different donors. To generate metacells from the profiles, we used the MetaCell package23 with parameters as specified below. Feature genes were selected
using the parameter T_vm = 0.08 and minimal total UMIs of 100, while excluding genes correlated to lateral effects such as mitochondrial genes, immunoglobulin genes, high abundance, prefix
“RP-“ genes, cell cycle, type I Interferon response and stress. The final feature genes, consisting of 527 genes, were used for the computation of the Metacell balanced similarity graph,
with parameters _k_ = 60, n_resamp = 500 and min_mc_size = 20. Outliers threshold of T_lfc = 3.5 was used, with 464 profiles deemed as outliers. Next, we annotated the metacell model using
hierarchical clustering of the metacell confusion matrix, supervised analysis of enriched genes and analysis of marker genes (Supplementary Fig. 8). The metacells and profiles were projected
and plotted in 2D using mc2d_K = 40, mc2d_T_edge = 0.02 with a max degree of 6, and colored using thresholds on metacells log enrichment scores (lfp values) for marker genes chosen from
common markers and the above annotation process. For studying DNA polymerases genes expression correlations to _PARPs_ in HSPCs, we calculated the Pearson correlations using metacells e_gc
values calculated once using only HSPCs metacells, and once for all progenitors metacells (Fig. 7a). DATA ANALYSIS ANALYSES OF CRISPR/CAS9 DATA All indels generated by CRISPR/Cas9 system
were called using varscan2.3.9. Substitutions and short indels identified in both edited and control samples in _ASXL1_, _SRSF2_ and _CALR_ loci were excluded. Allele percent was calculated
as the number of modified reads associated with each variant divided by the sum of all modified reads per experiment. Overall CRISPR/Cas9 modification frequencies were calculated as the sum
of modified reads divided by the mean depth per experiment. We discriminated between three deletion signatures throughout all CRISPR/Cas9 experiments: ≥5 bp deletions with flanking MHs≥2 bp,
≥5 bp deletions with flanking MHs<2 bp and short deletions of <5 bp. ANALYSES OF PUBLICLY AVAILABLE DATASETS Somatic mutation cohorts were downloaded from publicly available web-links
as described under the “Data Availability” section. Only deletions were included in our analyses, duplicate samples were removed from all cohorts. For COSMIC dataset only deletions with
available genomic coordinates were analyzed. Additionally, for COSMIC, myeloid deletions were obtained by filtering the ‘Primary site’ to include’haematopoietic and lymphoid’ tissue
following by the exclusion of the letters ‘lymph’ from the ‘Primary histology’ column. Identical deletions with multiple’Mutation CDS’ values due to multiple isoforms were combined under a
uniform name (for example _ASXL1__c.1888_1910del23 and c.1900_1922del23 were combined under the name _ASXL1_ c.1900_1922del23). Deletions from COSMIC dataset that contained common SNPs at
adjacent genomic loci, were located at intronic or intergenic regions and those that were reported in a single publication were excluded. To exclude common SNPs from all sequencing cohorts,
minor allele frequencies (MAF) for each deletion were identified using Annovar tool (https://github.com/WGLab/doc-ANNOVAR) according to the following datasets: AF, ExAC_ALL, Kaviar_AF,
ExAC_nonpsych_ALL and AF_popmax. Variants with MAF of 0.0001 or above in at least one dataset were filtered out. Deletions with no available data in any of these datasets were included. For
signature detection, deletion coordinates with flanking 20 bp from both deletion ends (e.g start-20bp, end+20 bp) were generated and used as an input for bedtools getfasta command to
generate Fasta files for all deletions flanking sequences. ‘MH signature’ detection was performed using an in-house matlab code by analyzing each deletion’s flanking sequences for
microhomologies (MHs). Specifically, we discriminated between three deletion signatures throughout all data analyses: ≥5 bp deletions with flanking MHs≥2 bp, ≥5 bp deletions with flanking
MHs<2 bp and short deletions of <5 bp. Of note, our detection algorithm did not discriminate between short deletion that are located in microsatellite repeats and those that are not,
as this was beyond the scope of the current manuscript. Matlab code is available as described under the “Code Availability” section. All other analyses were performed using R (version
3.5.2). REPORTING SUMMARY Further information on research design is available in the Nature Research Reporting Summary linked to this article. DATA AVAILABILITY Raw Illumina sequencing reads
associated with CRISPR/Cas9 cell line experiments have been deposited in the NCBI Short Read Archive under bioproject ID PRJNA707245. Publicly available datasets used in this study are
available in the following web links: https://cancer.sanger.ac.uk/cosmic/download (COSMIC dataset), https://www.nejm.org/doi/full/10.1056/nejmoa1516192 (1540 adult-AML dataset10),
https://www.nejm.org/doi/full/10.1056/NEJMoa1716614 (2045 MPN dataset11), http://www.vizome.org/aml/ (BeatAML dataset12). For 1540 adult-AML and 2045 MPN datasets, we used annotated
mutational data that are open-access and available for download as part of the supplementary appendixes of these papers (table s5 in AML paper10 and table s4 in MPN paper11). Full data
containing the deletion signatures from the publicly available datasets as well as CRISPR/Cas9 indel data are provided as a Source Data file. All relevant data are also available from the
corresponding author upon reasonable request. Source data are provided with this paper. CODE AVAILABILITY MMEJ deletion matlab code is documented on GitHub
(https://github.com/ShlushLab/MMEJ_detection) and is publicly available under the MIT license from the following Zenodo repository (https://doi.org/10.5281/zenodo.4555395)52. For the
Metacell analysis, we used the previously published MetaCell package23 with parameters specified under the “Methods” section. Metacell code is documented on GitHub
(https://github.com/tanaylab/metacell) and is publicly available under the MIT license from the following Zenodo repository (https://doi.org/10.5281/zenodo.3334525)53. REFERENCES * Shlush,
L. I. et al. Identification of pre-leukaemic haematopoietic stem cells in acute leukaemia. _Nature_ 506, 328–333 (2014). Article ADS CAS PubMed PubMed Central Google Scholar * Jan, M.
et al. Clonal evolution of preleukemic hematopoietic stem cells precedes human acute myeloid leukemia. _Sci. Transl. Med._ 4, 149ra118 (2012). Article PubMed PubMed Central CAS Google
Scholar * Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. _N. Engl. J. Med._ 371, 2488–2498 (2014). Article PubMed PubMed Central CAS Google
Scholar * Shlush, L. I. Age-related clonal hematopoiesis. _Blood_ 131, 496–504 (2018). Article CAS PubMed Google Scholar * Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk
inferred from blood DNA sequence. _N. Engl. J. Med._ 371, 2477–2487 (2014). Article PubMed PubMed Central CAS Google Scholar * Alexandrov, L. B. et al. The repertoire of mutational
signatures in human cancer. _Nature_ 578, 94–101 (2020). Article ADS CAS PubMed PubMed Central Google Scholar * Preston, B. D., Albertson, T. M. & Herr, A. J. DNA replication
fidelity and cancer. _Semin Cancer Biol._ 20, 281–293 (2010). Article CAS PubMed PubMed Central Google Scholar * McVey, M. & Lee, S. E. MMEJ repair of double-strand breaks
(director’s cut): deleted sequences and alternative endings. _Trends Genet._ 24, 529–538 (2008). Article CAS PubMed PubMed Central Google Scholar * Mehta, A. & Haber, J. E. Sources
of DNA double-strand breaks and models of recombinational DNA repair. _Cold Spring Harb. Perspect. Biol._ 6, a016428 (2014). Article PubMed PubMed Central Google Scholar * Papaemmanuil,
E. et al. Genomic classification and prognosis in acute myeloid leukemia. _N. Engl. J. Med._ 374, 2209–2221 (2016). Article CAS PubMed PubMed Central Google Scholar * Grinfeld, J. et
al. Classification and personalized prognosis in myeloproliferative neoplasms. _N. Engl. J. Med._ 379, 1416–1430 (2018). Article CAS PubMed PubMed Central Google Scholar * Tyner, J. W.
et al. Functional genomic landscape of acute myeloid leukaemia. _Nature_ 562, 526–531 (2018). Article ADS CAS PubMed PubMed Central Google Scholar * Yang, H. et al. Gain of function of
ASXL1 truncating protein in the pathogenesis of myeloid malignancies. _Blood_ 131, 328–341 (2018). Article CAS PubMed PubMed Central Google Scholar * Abelson, S. et al. Prediction of
acute myeloid leukaemia risk in healthy individuals. _Nature_ 559, 400–404 (2018). Article ADS CAS PubMed PubMed Central Google Scholar * Nangalia, J. et al. Somatic CALR mutations in
myeloproliferative neoplasms with nonmutated JAK2. _N. Engl. J. Med._ 369, 2391–2405 (2013). Article CAS PubMed PubMed Central Google Scholar * Iyer, S. et al. Precise therapeutic gene
correction by a simple nuclease-induced double-stranded break. _Nature_ 568, 561–565 (2019). Article ADS CAS PubMed PubMed Central Google Scholar * Luijsterburg, M. S. et al. PARP1
links CHD2-mediated chromatin expansion and H3.3 deposition to DNA repair by non-homologous end-joining. _Mol. Cell_ 61, 547–562 (2016). Article CAS PubMed PubMed Central Google Scholar
* Beagan, K. & McVey, M. Linking DNA polymerase theta structure and function in health and disease. _Cell Mol. Life Sci._ 73, 603–615 (2016). Article CAS PubMed Google Scholar *
Wyatt, D. W. et al. Essential roles for polymerase theta-mediated end joining in the repair of chromosome breaks. _Mol. Cell_ 63, 662–673 (2016). Article CAS PubMed PubMed Central Google
Scholar * Taheri-Ghahfarokhi, A. et al. Decoding non-random mutational signatures at Cas9 targeted sites. _Nucleic Acids Res._ 46, 8417–8434 (2018). Article CAS PubMed PubMed Central
Google Scholar * Black, S. J. et al. Molecular basis of microhomology-mediated end-joining by purified full-length Poltheta. _Nat. Commun._ 10, 4423 (2019). Article ADS CAS PubMed
PubMed Central Google Scholar * Ceccaldi, R. et al. Homologous-recombination-deficient tumours are dependent on Poltheta-mediated repair. _Nature_ 518, 258–262 (2015). Article ADS CAS
PubMed PubMed Central Google Scholar * Baran, Y. et al. MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions. _Genome Biol._ 20, 206 (2019). Article PubMed PubMed
Central CAS Google Scholar * Loeb, L. A. & Monnat, R. J. Jr. DNA polymerases and human disease. _Nat. Rev. Genet._ 9, 594–604 (2008). Article CAS PubMed Google Scholar *
Johansson, E. & Dixon, N. Replicative DNA polymerases. _Cold Spring Harb. Perspect. Biol_. 5, a012799 (2013). * Baranovskiy, A. G. et al. Structural basis for inhibition of DNA
replication by aphidicolin. _Nucleic Acids Res._ 42, 14013–14021 (2014). Article CAS PubMed PubMed Central Google Scholar * Sheaff, R., Ilsley, D. & Kuchta, R. Mechanism of DNA
polymerase alpha inhibition by aphidicolin. _Biochemistry_ 30, 8590–8597 (1991). Article CAS PubMed Google Scholar * Byrnes, J. J. Structural and functional properties of DNA polymerase
delta from rabbit bone marrow. _Mol. Cell Biochem._ 62, 13–24 (1984). Article CAS PubMed Google Scholar * Cheng, C. H. & Kuchta, R. D. DNA polymerase epsilon: aphidicolin inhibition
and the relationship between polymerase and exonuclease activity. _Biochemistry_ 32, 8568–8574 (1993). Article CAS PubMed Google Scholar * Truong, L. N. et al. Microhomology-mediated end
joining and homologous recombination share the initial end resection step to repair DNA double-strand breaks in mammalian cells. _Proc. Natl Acad. Sci. USA_ 110, 7720–7725 (2013). Article
ADS CAS PubMed PubMed Central Google Scholar * Wu, W. et al. Repair of radiation induced DNA double strand breaks by backup NHEJ is enhanced in G2. _DNA Repair_ 7, 329–338 (2008).
Article CAS PubMed Google Scholar * Sfeir, A. & Symington, L. S. Microhomology-mediated end joining: a back-up survival mechanism or dedicated pathway? _Trends Biochem. Sci._ 40,
701–714 (2015). Article CAS PubMed PubMed Central Google Scholar * Walter, D. et al. Exit from dormancy provokes DNA-damage-induced attrition in haematopoietic stem cells. _Nature_ 520,
549–552 (2015). Article ADS PubMed CAS Google Scholar * Flach, J. et al. Replication stress is a potent driver of functional decline in ageing haematopoietic stem cells. _Nature_ 512,
198–19 (2014). Article ADS CAS PubMed PubMed Central Google Scholar * Okano, S., Lan, L., Caldecott, K. W., Mori, T. & Yasui, A. Spatial and temporal cellular responses to
single-strand breaks in human cells. _Mol. Cell Biol._ 23, 3974–3981 (2003). Article CAS PubMed PubMed Central Google Scholar * Lee-Six, H. et al. Population dynamics of normal human
blood inferred from somatic mutations. _Nature_ 561, 473–478 (2018). Article ADS CAS PubMed PubMed Central Google Scholar * Yu, A. M. & McVey, M. Synthesis-dependent
microhomology-mediated end joining accounts for multiple types of repair junctions. _Nucleic Acids Res._ 38, 5706–5717 (2010). Article CAS PubMed PubMed Central Google Scholar * Verdin,
H. et al. Microhomology-mediated mechanisms underlie non-recurrent disease-causing microdeletions of the FOXL2 gene or its regulatory domain. _PLoS Genet._ 9, e1003358 (2013). Article CAS
PubMed PubMed Central Google Scholar * Laurie, C. C. et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. _Nat. Genet._ 44, 642–650 (2012). Article
CAS PubMed PubMed Central Google Scholar * Sulkowski, P. L., et al. 2-Hydroxyglutarate produced by neomorphic IDH mutations suppresses homologous recombination and induces PARP inhibitor
sensitivity. _Sci. Transl. Med._ 9, eaal2463 (2017). * Muvarak, N. et al. c-MYC generates repair errors via increased transcription of alternative-NHEJ factors, LIG3 and PARP1, in tyrosine
kinase-activated leukemias. _Mol. Cancer Res._ 13, 699–712 (2015). Article CAS PubMed PubMed Central Google Scholar * Nieborowska-Skorska, M. et al. Gene expression and mutation-guided
synthetic lethality eradicates proliferating and quiescent leukemia cells. _J. Clin. Invest._ 127, 2392–2406 (2017). Article PubMed PubMed Central Google Scholar * Nieborowska-Skorska,
M. et al. Ruxolitinib-induced defects in DNA repair cause sensitivity to PARP inhibitors in myeloproliferative neoplasms. _Blood_ 130, 2848–2859 (2017). Article CAS PubMed PubMed Central
Google Scholar * Bak, R. O., Dever, D. P. & Porteus, M. H. CRISPR/Cas9 genome editing in human hematopoietic stem cells. _Nat. Protoc._ 13, 358–376 (2018). Article CAS PubMed
PubMed Central Google Scholar * Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. _Nat. Protoc._ 8, 2281–2308 (2013). Article CAS PubMed PubMed Central Google Scholar
* Biezuner, T. et al. A generic, cost-effective, and scalable cell lineage analysis platform. _Genome Res._ 26, 1588–1599 (2016). Article CAS PubMed PubMed Central Google Scholar *
Li, H. Minimap2: pairwise alignment for nucleotide sequences. _Bioinformatics_ 34, 3094–3100 (2018). Article CAS PubMed PubMed Central Google Scholar * McKenna, A. et al. The Genome
Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. _Genome Res._ 20, 1297–1303 (2010). Article CAS PubMed PubMed Central Google Scholar *
Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. _Genome Res._ 22, 568–576 (2012). Article CAS PubMed PubMed Central
Google Scholar * Shlush, L. I. et al. Tracing the origins of relapse in acute myeloid leukaemia to stem cells. _Nature_ 547, 104–108 (2017). Article ADS CAS PubMed Google Scholar *
Mansier, O. et al. Quantification of the mutant CALR allelic burden by digital PCR application to minimal residual disease evaluation after bone marrow transplantation. _J. Mol. Diagnostics_
18, 68–74 (2016). Article CAS Google Scholar * Tzah Feldman, A. B., et al. Recurrent deletions in clonal hematopoiesis are driven by Microhomology-mediated end joining. Zenodo. Available
from https://doi.org/10.5281/zenodo.4555395. (2021). * Yael Baran, et al. MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions. Zenodo. Available from
https://doi.org/10.5281/zenodo.3334525. (2019). Download references ACKNOWLEDGEMENTS The authors wish to thank Prof. John Dick, and Dr. Ayal Hendel for fruitful discussion and support. All
primary patient samples that were used in this study were generously provided by Dr. Mark Minden and through the Leukemia Tissue Bank at Princess Margaret Cancer Centre/ University Health
Network. L.S. is the incumbent of The Ruth and Louis Leland career development chair. This research was supported by the EU horizon 2020 grant project MAMLE ID: 714731, LLS and rising tide
foundation Grant ID: RTF6005-19, ISF-NSFC 2427/18, ISF-IPMP-Israel Precision Medicine Program 3165/19, BIRAX 713023, the Ernest and Bonnie Beutler Research Program of Excellence in Genomic
Medicine, awarded to LIS. LIS is an incumbent of the Ruth and Louis Leland career development chair. N.K. is an incumbent of the Applebaum Foundation Research Fellow Chair. This research was
also supported by the Sagol Institute for Longevity Research, the Barry and Eleanore Reznik Family Cancer Research Fund, Steven B. Rubenstein Research Fund for Leukemia and Other Blood
Disorders, the Rising Tide Foundation and the Applebaum Foundation. AUTHOR INFORMATION Author notes * These authors contributed equally: Akhiad Bercovich, Yoni Moskovitz. AUTHORS AND
AFFILIATIONS * Department of Immunology, Weizmann Institute of Science, Rehovot, Israel Tzah Feldman, Yoni Moskovitz, Noa Chapal-Ilani, Tamir Biezuner, Nathali Kaushansky & Liran I.
Shlush * Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel Akhiad Bercovich & Amos Tanay * Princess Margaret Cancer Centre,
University Health Network (UHN), Toronto, ON, Canada Amanda Mitchell, Jessie J. F. Medeiros, Mark D. Minden, Vikas Gupta & Liran I. Shlush * Department of Molecular Genetics, University
of Toronto, Toronto, ON, Canada Jessie J. F. Medeiros * Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada Mark D. Minden * Department of Medicine, University of
Toronto, Toronto, ON, Canada Mark D. Minden & Vikas Gupta * Division of Medical Oncology and Hematology, University Health Network, Toronto, ON, Canada Mark D. Minden & Vikas Gupta *
Department of Pathology, Tel-Aviv University, Tel-Aviv, Israel Michael Milyavsky * Sackler Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel Michael Milyavsky * Department of
Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel Zvi Livneh * Division of Hematology, Rambam Healthcare Campus, Haifa, Israel Liran I. Shlush Authors * Tzah Feldman View
author publications You can also search for this author inPubMed Google Scholar * Akhiad Bercovich View author publications You can also search for this author inPubMed Google Scholar *
Yoni Moskovitz View author publications You can also search for this author inPubMed Google Scholar * Noa Chapal-Ilani View author publications You can also search for this author inPubMed
Google Scholar * Amanda Mitchell View author publications You can also search for this author inPubMed Google Scholar * Jessie J. F. Medeiros View author publications You can also search for
this author inPubMed Google Scholar * Tamir Biezuner View author publications You can also search for this author inPubMed Google Scholar * Nathali Kaushansky View author publications You
can also search for this author inPubMed Google Scholar * Mark D. Minden View author publications You can also search for this author inPubMed Google Scholar * Vikas Gupta View author
publications You can also search for this author inPubMed Google Scholar * Michael Milyavsky View author publications You can also search for this author inPubMed Google Scholar * Zvi Livneh
View author publications You can also search for this author inPubMed Google Scholar * Amos Tanay View author publications You can also search for this author inPubMed Google Scholar *
Liran I. Shlush View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS T.F. designed and developed the study, performed CRISPR/Cas9 experiments,
cells culture and maintenance, deep targeted sequencing, analyzed sequencing data, performed variant calling, analyzed the publicly available datasets and wrote the manuscript. A.B.
performed single-cell RNA analysis. Y.M. performed CRISPR/Cas9 experiments, cells culture and maintenance, deep targeted sequencing. N.I.C. provided bioinformatics support and wrote the
matlab code for MMEJ detection. N.K. revised the paper and contributed to data interpretation. T.B. provided sequencing and technical support. A.M. and J.M. Performed xenotransplantation
experiments, cell sorting and ddPCR. M.D.M. and G.V. enabled sample acquisition, M.M. and Z.L. helped with the knockout experiments, A.T. supervised single-cell RNA analysis L.I.S. designed
and supervised the study and wrote the manuscript. CORRESPONDING AUTHOR Correspondence to Liran I. Shlush. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests.
ADDITIONAL INFORMATION PEER REVIEW INFORMATION _Nature Communications_ thanks David Kent and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer
reviewer reports are available. PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY
INFORMATION SUPPLEMENTARY INFORMATION PEER REVIEW FILE REPORTING SUMMARY SOURCE DATA SOURCE DATA RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use
is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Feldman, T., Bercovich, A., Moskovitz, Y. _et al._ Recurrent deletions in clonal
hematopoiesis are driven by microhomology-mediated end joining. _Nat Commun_ 12, 2455 (2021). https://doi.org/10.1038/s41467-021-22803-y Download citation * Received: 23 June 2020 *
Accepted: 29 March 2021 * Published: 28 April 2021 * DOI: https://doi.org/10.1038/s41467-021-22803-y SHARE THIS ARTICLE Anyone you share the following link with will be able to read this
content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative