Hematolymphoid neoplasms frequently harbor recurrent genetic abnormalities. Some of the most well recognized lesions are chromosomal translocations, and many of these are known to play pivotal roles in pathogenesis. In lymphoid malignancies, some translocations result from erroneous V(D)J-type events. However, other translocation junctions appear randomly positioned and their underlying mechanisms are not understood.
We tested the hypothesis that genomic repeats, including both simple tandem and interspersed repeats, are involved in chromosomal translocations arising in hematopoietic malignancies. Using a database of translocation junctions and RepeatMasker annotations of the reference genome assembly, we measured the proximity of translocation sites to their nearest repeat. We examined 1,174 translocation breakpoints from 10 classifications of hematolymphoid neoplasms. We measured significance using Student’s t-test, and we determined a false discovery rate using a random permutation statistics technique.
Most translocations showed no propensity to involve genomic repeats. However, translocation junctions at the transcription factor 3 (TCF3)/E2A immunoglobulin enhancer binding factors E12/E47 (E2A) locus clustered within, or in proximity to, transposable element sequences. Nearly half of reported TCF3 translocations involve a MER20 DNA transposon. Based on this observation, we propose this sequence is important for the oncogenesis of TCF3-PBX1 acute lymphoblastic leukemia.
Genomic rearrangements can occur in germline nuclei, resulting in inherited diseases, or in somatic nuclei, contributing to tumorigenesis. The latter can vary from complex events such as chromothripsis, to relatively simple abnormalities such as recurrent chromosomal translocations; the underlying mechanisms remain unclear. Genomic rearrangements have been induced in mammalian cell cultures in few systems [1–3]. Although these in vitro generated translocations provide a valuable experimental tool, the engineered translocation partner sequences rarely match known oncogenic translocation sequences .
Most recognized genomic rearrangements in human cancers today are not resolved at the nucleotide level. Widely used assays include karyotyping, fluorescence in situ hybridizations, and microarray platforms with probes for comparative genomic hybridization and single nucleotide polymorphism genotyping. None provides nucleotide resolution of translocation breakpoints; massively parallel short-read sequencing has this ability, particularly when tailored approaches are used to 'rescue’ alignments of reads spanning the breakpoints. However, highly repetitive intervals at breakpoints may be a confounding factor.
Breakpoints resolved precisely can provide insights into the mechanisms responsible for rearrangements. For example, some hematolymphoid neoplasm breakpoints are marked by the presence of cryptic heptamer/nanomer sequences . Similarly, Translin protein binding sequences have been detected near chromosomal breakpoints in lymphoid neoplasms . In both scenarios, DNA sequence is a key participant in the mechanism of translocation.
We chose to look for evidence of genomic repeat involvement in chromosomal translocations that drive human hematopoietic malignancies. Repetitive sequences comprise nearly half of the human genome; many are interspersed repeats reflecting insertions of mobile DNA sequences . Because of their prevalence in genomes, these repeats are intrinsic substrates for homologous recombination and single strand annealing reactions [8, 9]. For unknown reasons, repeating elements are also disproportionately involved in non-homologous end joining events at specific loci. One example of this occurs in a mouse model of MYC-induced lymphoma, which shows increased LINE-1 retrotransposon sequences at break sites with no homology or short microhomologies (1–4 bp) suggestive of non-homologous end joining .
To address the question, we took advantage of two resources, the RepeatMasker annotation of the reference human genome assembly [http://www.repeatmasker.org], and a compilation of more than 1,000 chromosomal translocation spanning sequences curated by the Liber laboratory . For each translocation junction, we measured distance to the nearest repeat. To avoid erroneous associations between translocation junctions and repeats, we compared randomly permuted positions within the translocation gene locus.
Results and discussion
Translocation junctions from ten types of hematolymphoid neoplasm (Table 1) were analyzed to determine whether these occurred within or closer to the nearest repeat than would be expected by chance (Figure 1). The percent of translocation junctions occurring within repeat intervals varied, partly as a reflection of repeat content at the involved gene loci. For example, 67% of translocation junctions in both transcription factor 3/transcription factor E2-alpha (TCF3) and abelson murine leukemia viral oncogene homolog 1 (ABL1) were present in repeats (Table 2). In contrast, only 2–3% of junctions in runt-related transcription factor 1; translocated to, 1 (RUNX1T1) were in repeats (Table 2). The longest average and shortest average observed distances between translocations and their nearest repeat were 684 bp and 1 bp in T-cell receptor alpha chain (TCRA) and TCF3, respectively (Table 2).
*Distinct hematolymphoid neoplasms according to the World Health Organization classification; Pre-B/B-ALL: B lymphoblastic leukemia/lymphoma; CML: chronic myelogenous leukemia; Therapy AML: therapy-related acute myeloid leukemia; sporadic BL: Burkitt lymphoma; Pre-T/T-ALL: T lymphoblastic leukemia/lymphoma.
‡Number of translocation junctions examined.
Repeat features at the translocation regions studied
Translocation junction regions
Junctions occurring in repeats (%)
Junction to the nearest repeat‡ (bp)
P value for interaction
E2A = PBX1 (TCF3-PBX1)
‡Distance, expressed in number of nucleotides, from translocation junction to the nearest repeat.
†P value for interaction between translocation junction to the nearest repeat (including repeating elements and tandem repeats).
*P value for interaction between translocation junction to the nearest repeating element.
Next, we calculated ratios of the expected versus observed translocation-to-repeat distances (Figure 2). The largest ratio, reflecting a relative enrichment of translocation junctions in the vicinity of repeats, occurred in the TCF3 translocation junction region (TCF locus ratio = 42, average ratio for other loci = 1.15) (Figure 2). Applying permutation based statistics, as described in the Methods section, confirmed significance of the enrichment of TCF3 translocation junction at genomic repeats (n = 30; P <0.001) (Table 2). Using the same approach, we note a weaker association between translocations and genomic repeats at the ABL1 region (n = 27; P = 0.017) (Table 2).
The TCF3 translocation junction region encompasses interspersed repeats from three categories, including a small nuclear RNA sequence (U6 snRNA), five retrotransposons, and a hAT-Charlie family DNA transposon (MER20). The retrotransposons at the locus include two Short INterspersed Elements (SINE) elements (AluY and AluJb), and three Long INterspersed Elements (LINE) elements (two L1M5s and a L2) (Figure 3). Interestingly, 14/30 (47%) of reported TCF3 translocation junctions reside in the MER20 transposon (Figure 3); the distribution of MER20 embedded translocation junctions was non-random (Figure 3, inset).
Recurrent pathologic translocations occur in a wide range of human malignancies, from hematolymphoid cancers to carcinomas and sarcomas. As the genetics of these diseases are better characterized, specific lesions are being related to clinicopathological entities or even incorporated in their definition . Sequence features at breakpoints can lend insights into how these events occur, and so we decided to investigate the prevalence of breakpoints with respect to genomic repeats. There have been other reports of non-uniform distributions of transposable element sequences at sites of chromosomal breaks. For example, nucleotide junctions demarking the postnatal chromosome 12p deletions in ETV6-RUNX1 leukemia often occur at, or near, retrotransposon sequences .
In our study, we looked at rearrangement sites at 20 gene loci. Only TCF 3 translocation sites exhibited clustering at or near transposable element sequences. All other translocation junctions from malignant proliferations of lymphoid and myeloid lineages showed random distributions relative to nearby repeats.
Our study leaves the mechanism unaddressed. How could TCF3 repeats create a site susceptible to breakage or otherwise involve the locus in events leading to the translocation? It is possible that very short sequences also occurring randomly are sufficient. Prior work by Tsai et al. has shown that dsDNA breaks at the TCF3/E2 A locus leading to translocations occurring in clusters at CpG dinucleotides . This is similar to some other hotspots for breaks occurring the pro-B/pre-B stage of B-cell maturation. Of note, though, CpG nucleotides are not at break sites seen in the TCF3 fusion partner locus, pre-B-cell leukemia homeobox 1 (PBX1). CpG dinucleotides occurred on 53% of TCF3 translocation junctions, while transposable elements were found on 67% of TCF3 translocation sites.
It is also possible that a lengthier protein recognition sequence is important near the break site. Transposable elements can contain, for example, transcription factor binding sites and other regulatory protein binding sites important for transcriptional control around the repeat [14, 15]. Indeed, MER20 DNA transposons provide cis-regulatory sequences critical for inducing the transcription of prolactin during pregnancy and have been implicated in endometrial gene recruitment in the evolution of placental mammals [14, 16, 17].
In summary, we analyzed 1,174 translocation sequences from ten hematolymphoid neoplasms for proximity to nearby repeats. Of these, TCF3 translocation junctions were seen to cluster at or near transposable elements in a majority of TCF3-PBX1 acute lymphoblastic leukemia. It is possible that the involved transposable element sequences are inherently susceptible to dsDNA breaks. Further studies will be needed to address sequence requirements for TCF3-PBX1 and other leukemogenic translocations.
Translocation junction sequences
Genomic DNA from human clinical samples was extracted and translocations were Sanger sequenced by numerous independent investigators . Published sequences assembled by Tsai et al. are publically accessible in a repository, herein referred to as the Lieber database (http://lieber.usc.edu/Data.aspx) . The Lieber database includes translocation junction sequences, translocation genomic coordinates (hg18), and limited clinical data from various hematolymphoid neoplasms that are associated with recurrent translocations. We downloaded this information (Table 1), and analyzed loci with ten or more translocation breakpoints (Additional file 1).
Mapping breakpoints with respect to repeats
Distances between each translocation junction and its nearest repeat element were determined by a Perl script (Additional file 2). Briefly, each translocation junction was aligned to its corresponding sequence in the March 2006 GRCh36/hg18 assembly version of the human genome. Translocation was annotated for repetitive sequences using Tandem Repeat Finder and RepeatMasker. We included the two major categories of genomic repeats: tandem repeats and interspersed repeats. The number of nucleotides between the translocation and its nearest repeat were then calculated, considering upstream and downstream sequences. For each locus, the observed distribution of distances was compared to distances found using random positions as substitutes for translocation junction (Figure 1).
For each of the twenty translocation intervals analyzed, we compared actual measurements between translocation junction and their nearest genomic repeats against the distances separating 1,000 random positions and their corresponding repeats. For each permutation, we calculated a Student’s t-value and its P value. For each of the twenty translocation intervals analyzed, we compared actual measurements between translocation junction and their nearest genomic repeats to the distances separating 1,000 random positions and their corresponding nearest repeats. Each translocation was compared to the distribution of distances created by the random sites using a one-sided Student’s t-test, to generate a P value; low P values indicate that the translocation is significantly closer to a repeat element than expected by random chance.
Abelson murine leukemia viral oncogene homolog 1
(Arthrobacter luteus) element
B-cell lymphoma 6
Breakpoint cluster region
Immunoglobulin enhancer binding factors E12/E47
ets variant gene 6
Fibroblast growth factor 8
IgG heavy chain locus
Long INterspersed element 1
Long INterspersed element 2
LIM domain only 2
Medium reiteration frequency repetitive 20 element
This work was supported the Resident Research Fund grant from the Department of Pathology at the Johns Hopkins Hospital to NR.
Department of Pathology, Johns Hopkins University, School of Medicine
Department of Dermatology, Johns Hopkins University, School of Medicine
Howard Hughes Medical Institute
Department of Oncology, Division of Biostatistics and Bioinformatics, Johns Hopkins University, School of Medicine
McKusick-Nathans Institute of Genetic Medicine
Sidney Kimmel Comprehensive Cancer Center
High Throughput Biology Center, Johns Hopkins University School of Medicine
Lin C, Yang L, Tanasa B, Hutt K, Ju BG, Ohgi K, Zhang J, Rose DW, Fu XD, Glass CK, Rosenfeld MG: Nuclear receptor-induced chromosomal proximity and DNA breaks underlie specific translocations in cancer.Cell 2009,139(6):1069-1083. 10.1016/j.cell.2009.11.030PubMed CentralView ArticlePubMed
Elliott B, Richardson C, Jasin M: Chromosomal translocation mechanisms at intronic alu elements in mammalian cells.Molecular cell 2005,17(6):885-894. 10.1016/j.molcel.2005.02.028View ArticlePubMed
Aplan PD, Chervinsky DS, Stanulla M, Burhans WC: Site-specific DNA cleavage within the MLL breakpoint cluster region induced by topoisomerase II inhibitors.Blood 1996,87(7):2649-2658.PubMed
Libura J, Ward M, Solecka J, Richardson C: Etoposide-initiated MLL rearrangements detected at high frequency in human primitive hematopoietic stem cells with in vitro and in vivo long-term repopulating potential.Eur J Haematol 2008,81(3):185-195. 10.1111/j.1600-0609.2008.01103.xPubMed CentralView ArticlePubMed
Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette LJ 3rd, Lohr JG, Harris CC, Ding L, Wilson RK, Wheeler DA, Gibbs RA, Kucherlapati R, Lee C, Kharchenko PV, Park PJ, Cancer Genome Atlas Research Network: Landscape of somatic retrotransposition in human cancers.Science 2012,337(6097):967-971. 10.1126/science.1222077PubMed CentralView ArticlePubMed
Schlissel MS, Kaffer CR, Curry JD: Leukemia and lymphoma: a cost of doing business for adaptive immunity.Gen Dev 2006,20(12):1539-1544. 10.1101/gad.1446506View Article
Gajecka M, Pavlicek A, Glotzbach CD, Ballif BC, Jarmuz M, Jurka J, Shaffer LG: Identification of sequence motifs at the breakpoint junctions in three t(1;9)(p36.3;q34) and delineation of mechanisms involved in generating balanced translocations.Hum Genet 2006,120(4):519-526. 10.1007/s00439-006-0222-1View ArticlePubMed
Tsai AG, Lu H, Raghavan SC, Muschen M, Hsieh CL, Lieber MR: Human chromosomal translocations at CpG sites and a theoretical basis for their lineage and stage specificity.Cell 2008,135(6):1130-1142. 10.1016/j.cell.2008.10.035PubMed CentralView ArticlePubMed
Beck CR, Garcia-Perez JL, Badge RM, Moran JV: LINE-1 elements in structural variation and disease.Ann Rev Genom Hum Genet 2011, 12: 187-215. 10.1146/annurev-genom-082509-141802View Article
Kidd JM, Graves T, Newman TL, Fulton R, Hayden HS, Malig M, Kallicki J, Kaul R, Wilson RK, Eichler EE: A human genome structural variation sequencing resource reveals insights into mutational mechanisms.Cell 2010,143(5):837-847. 10.1016/j.cell.2010.10.027PubMed CentralView ArticlePubMed
Liu P, Erez A, Nagamani SC, Dhar SU, Kołodziejska KE, Dharmadhikari AV, Cooper ML, Wiszniewska J, Zhang F, Withers MA, Bacino CA, Campos-Acevedo LD, Delgado MR, Freedenberg D, Garnica A, Grebe TA, Hernández-Almaguer D, Immken L, Lalani SR, McLean SD, Northrup H, Scaglia F, Strathearn L, Trapane P, Kang SH, Patel A, Cheung SW, Hastings PJ, Stankiewicz P, Lupski JR, Bi W: Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements.Cell 2011,146(6):889-903. 10.1016/j.cell.2011.07.042PubMed CentralView ArticlePubMed
Wiemels JL, Hofmann J, Kang M, Selzer R, Green R, Zhou M, Zhong S, Zhang L, Smith MT, Marsit C, Loh M, Buffler P, Yeh RF: Chromosome 12p deletions in TEL-AML1 childhood acute lymphoblastic leukemia are associated with retrotransposon elements and occur postnatally.Cancer Res 2008,68(23):9935-9944. 10.1158/0008-5472.CAN-08-2139PubMed CentralView ArticlePubMed
Jaffe ES: The 2008 WHO classification of lymphomas: implications for clinical practice and translational research.Hematol/Educ Prog Am Soc Hematol 2008, 2009: 523-531.View Article
Haldar M, Hancock JD, Coffin CM, Lessnick SL, Capecchi MR: A conditional mouse model of synovial sarcoma: insights into a myogenic origin.Cancer Cell 2007,11(4):375-388. 10.1016/j.ccr.2007.01.016View ArticlePubMed
Lynch VJ, Leclerc RD, May G, Wagner GP: Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals.Nature genetics 2011,43(11):1154-1159. 10.1038/ng.917View ArticlePubMed
Sasaki T, Nishihara H, Hirakawa M, Fujimura K, Tanaka M, Kokubo N, Kimura-Yoshida C, Matsuo I, Sumiyama K, Saitou N, Shimogori T, Okada N: Possible involvement of SINEs in mammalian-specific brain formation.Proc Nat Acad Sci USA 2008,105(11):4220-4225. 10.1073/pnas.0709398105PubMed CentralView ArticlePubMed
Emera D, Wagner GP: Transformation of a transposon into a derived prolactin promoter with function during human pregnancy.Proc Nat Acad Sci USA 2012,109(28):11246-11251. 10.1073/pnas.1118566109PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.