Skip to main content

Translocation junctions in TCF3-PBX1 acute lymphoblastic leukemia/lymphoma cluster near transposable elements



Hematolymphoid neoplasms frequently harbor recurrent genetic abnormalities. Some of the most well recognized lesions are chromosomal translocations, and many of these are known to play pivotal roles in pathogenesis. In lymphoid malignancies, some translocations result from erroneous V(D)J-type events. However, other translocation junctions appear randomly positioned and their underlying mechanisms are not understood.


We tested the hypothesis that genomic repeats, including both simple tandem and interspersed repeats, are involved in chromosomal translocations arising in hematopoietic malignancies. Using a database of translocation junctions and RepeatMasker annotations of the reference genome assembly, we measured the proximity of translocation sites to their nearest repeat. We examined 1,174 translocation breakpoints from 10 classifications of hematolymphoid neoplasms. We measured significance using Student’s t-test, and we determined a false discovery rate using a random permutation statistics technique.


Most translocations showed no propensity to involve genomic repeats. However, translocation junctions at the transcription factor 3 (TCF3)/E2A immunoglobulin enhancer binding factors E12/E47 (E2A) locus clustered within, or in proximity to, transposable element sequences. Nearly half of reported TCF3 translocations involve a MER20 DNA transposon. Based on this observation, we propose this sequence is important for the oncogenesis of TCF3-PBX1 acute lymphoblastic leukemia.


Genomic rearrangements can occur in germline nuclei, resulting in inherited diseases, or in somatic nuclei, contributing to tumorigenesis. The latter can vary from complex events such as chromothripsis, to relatively simple abnormalities such as recurrent chromosomal translocations; the underlying mechanisms remain unclear. Genomic rearrangements have been induced in mammalian cell cultures in few systems [13]. Although these in vitro generated translocations provide a valuable experimental tool, the engineered translocation partner sequences rarely match known oncogenic translocation sequences [4].

Most recognized genomic rearrangements in human cancers today are not resolved at the nucleotide level. Widely used assays include karyotyping, fluorescence in situ hybridizations, and microarray platforms with probes for comparative genomic hybridization and single nucleotide polymorphism genotyping. None provides nucleotide resolution of translocation breakpoints; massively parallel short-read sequencing has this ability, particularly when tailored approaches are used to 'rescue’ alignments of reads spanning the breakpoints. However, highly repetitive intervals at breakpoints may be a confounding factor.

Breakpoints resolved precisely can provide insights into the mechanisms responsible for rearrangements. For example, some hematolymphoid neoplasm breakpoints are marked by the presence of cryptic heptamer/nanomer sequences [5]. Similarly, Translin protein binding sequences have been detected near chromosomal breakpoints in lymphoid neoplasms [6]. In both scenarios, DNA sequence is a key participant in the mechanism of translocation.

We chose to look for evidence of genomic repeat involvement in chromosomal translocations that drive human hematopoietic malignancies. Repetitive sequences comprise nearly half of the human genome; many are interspersed repeats reflecting insertions of mobile DNA sequences [7]. Because of their prevalence in genomes, these repeats are intrinsic substrates for homologous recombination and single strand annealing reactions [8, 9]. For unknown reasons, repeating elements are also disproportionately involved in non-homologous end joining events at specific loci. One example of this occurs in a mouse model of MYC-induced lymphoma, which shows increased LINE-1 retrotransposon sequences at break sites with no homology or short microhomologies (1–4 bp) suggestive of non-homologous end joining [10].

To address the question, we took advantage of two resources, the RepeatMasker annotation of the reference human genome assembly [], and a compilation of more than 1,000 chromosomal translocation spanning sequences curated by the Liber laboratory [11]. For each translocation junction, we measured distance to the nearest repeat. To avoid erroneous associations between translocation junctions and repeats, we compared randomly permuted positions within the translocation gene locus.

Results and discussion

Translocation junctions from ten types of hematolymphoid neoplasm (Table 1) were analyzed to determine whether these occurred within or closer to the nearest repeat than would be expected by chance (Figure 1). The percent of translocation junctions occurring within repeat intervals varied, partly as a reflection of repeat content at the involved gene loci. For example, 67% of translocation junctions in both transcription factor 3/transcription factor E2-alpha (TCF3) and abelson murine leukemia viral oncogene homolog 1 (ABL1) were present in repeats (Table 2). In contrast, only 2–3% of junctions in runt-related transcription factor 1; translocated to, 1 (RUNX1T1) were in repeats (Table 2). The longest average and shortest average observed distances between translocations and their nearest repeat were 684 bp and 1 bp in T-cell receptor alpha chain (TCRA) and TCF3, respectively (Table 2).

Table 1 Translocation regions studied
Figure 1

Experimental outline depicting a hypothetical translocation region encompassing three translocation junctions. An illustration on the left represents the hypothesis, where there is a spatial association (symbol X) between the three observed translocation junctions (red triangles) and the nearest repeated sequence (blue arrow). Similarly, an illustration on the right represents the null hypothesis, where there is no spatial association (symbol X’) between three randomly generated translocation junctions (broken triangles) and their nearest repeat (blue arrow). We compared actual translocation junctions to 1,000 randomly generated positions to identify translocation junction regions that consistently happen near repeats.

Table 2 Repeat features at the translocation regions studied

Next, we calculated ratios of the expected versus observed translocation-to-repeat distances (Figure 2). The largest ratio, reflecting a relative enrichment of translocation junctions in the vicinity of repeats, occurred in the TCF3 translocation junction region (TCF locus ratio = 42, average ratio for other loci = 1.15) (Figure 2). Applying permutation based statistics, as described in the Methods section, confirmed significance of the enrichment of TCF3 translocation junction at genomic repeats (n = 30; P <0.001) (Table 2). Using the same approach, we note a weaker association between translocations and genomic repeats at the ABL1 region (n = 27; P = 0.017) (Table 2).

Figure 2

Translocation junctions in TCF3 occur at or near repeats. The Y-axis denotes the expected versus observed ratio of distances between translocation junctions and their nearest repeats. The X-axis denotes translocation loci analyzed. Other translocations examined were independent of local repeat content; expected versus observed ratios for these loci approach one (1). See Table 1 for abbreviations.

The TCF3 translocation junction region encompasses interspersed repeats from three categories, including a small nuclear RNA sequence (U6 snRNA), five retrotransposons, and a hAT-Charlie family DNA transposon (MER20). The retrotransposons at the locus include two Short INterspersed Elements (SINE) elements (AluY and AluJb), and three Long INterspersed Elements (LINE) elements (two L1M5s and a L2) (Figure 3). Interestingly, 14/30 (47%) of reported TCF3 translocation junctions reside in the MER20 transposon (Figure 3); the distribution of MER20 embedded translocation junctions was non-random (Figure 3, inset).

Figure 3

Schematic representation of a TCF3 locus including translocations and transposable elements. The red triangles represent individual translocation junctions, the blue arrows indicate transposable elements within TCF3, and the black rectangles identify TCF3 exons. Inset, TCF3 translocation junction density map within the MER20 transposon. Genome coordinates correspond to March 2006, NCBI36/hg18 human genome assembly. TCF3: Transcription factor 3; MER20: Medium reiteration frequency repetitive 20; L1: LINE-1 Long INterspersed Element 1; L2: Long INterspersed Element 2; Alu: Alu SINE; U6: Small nuclear RNA.

Recurrent pathologic translocations occur in a wide range of human malignancies, from hematolymphoid cancers to carcinomas and sarcomas. As the genetics of these diseases are better characterized, specific lesions are being related to clinicopathological entities or even incorporated in their definition [12]. Sequence features at breakpoints can lend insights into how these events occur, and so we decided to investigate the prevalence of breakpoints with respect to genomic repeats. There have been other reports of non-uniform distributions of transposable element sequences at sites of chromosomal breaks. For example, nucleotide junctions demarking the postnatal chromosome 12p deletions in ETV6-RUNX1 leukemia often occur at, or near, retrotransposon sequences [13].

In our study, we looked at rearrangement sites at 20 gene loci. Only TCF 3 translocation sites exhibited clustering at or near transposable element sequences. All other translocation junctions from malignant proliferations of lymphoid and myeloid lineages showed random distributions relative to nearby repeats.

Our study leaves the mechanism unaddressed. How could TCF3 repeats create a site susceptible to breakage or otherwise involve the locus in events leading to the translocation? It is possible that very short sequences also occurring randomly are sufficient. Prior work by Tsai et al. has shown that dsDNA breaks at the TCF3/E2 A locus leading to translocations occurring in clusters at CpG dinucleotides [11]. This is similar to some other hotspots for breaks occurring the pro-B/pre-B stage of B-cell maturation. Of note, though, CpG nucleotides are not at break sites seen in the TCF3 fusion partner locus, pre-B-cell leukemia homeobox 1 (PBX1). CpG dinucleotides occurred on 53% of TCF3 translocation junctions, while transposable elements were found on 67% of TCF3 translocation sites.

It is also possible that a lengthier protein recognition sequence is important near the break site. Transposable elements can contain, for example, transcription factor binding sites and other regulatory protein binding sites important for transcriptional control around the repeat [14, 15]. Indeed, MER20 DNA transposons provide cis-regulatory sequences critical for inducing the transcription of prolactin during pregnancy and have been implicated in endometrial gene recruitment in the evolution of placental mammals [14, 16, 17].


In summary, we analyzed 1,174 translocation sequences from ten hematolymphoid neoplasms for proximity to nearby repeats. Of these, TCF3 translocation junctions were seen to cluster at or near transposable elements in a majority of TCF3-PBX1 acute lymphoblastic leukemia. It is possible that the involved transposable element sequences are inherently susceptible to dsDNA breaks. Further studies will be needed to address sequence requirements for TCF3-PBX1 and other leukemogenic translocations.


Translocation junction sequences

Genomic DNA from human clinical samples was extracted and translocations were Sanger sequenced by numerous independent investigators [11]. Published sequences assembled by Tsai et al. are publically accessible in a repository, herein referred to as the Lieber database ( [11]. The Lieber database includes translocation junction sequences, translocation genomic coordinates (hg18), and limited clinical data from various hematolymphoid neoplasms that are associated with recurrent translocations. We downloaded this information (Table 1), and analyzed loci with ten or more translocation breakpoints (Additional file 1).

Mapping breakpoints with respect to repeats

Distances between each translocation junction and its nearest repeat element were determined by a Perl script (Additional file 2). Briefly, each translocation junction was aligned to its corresponding sequence in the March 2006 GRCh36/hg18 assembly version of the human genome. Translocation was annotated for repetitive sequences using Tandem Repeat Finder and RepeatMasker. We included the two major categories of genomic repeats: tandem repeats and interspersed repeats. The number of nucleotides between the translocation and its nearest repeat were then calculated, considering upstream and downstream sequences. For each locus, the observed distribution of distances was compared to distances found using random positions as substitutes for translocation junction (Figure 1).

Statistical methods

For each of the twenty translocation intervals analyzed, we compared actual measurements between translocation junction and their nearest genomic repeats against the distances separating 1,000 random positions and their corresponding repeats. For each permutation, we calculated a Student’s t-value and its P value. For each of the twenty translocation intervals analyzed, we compared actual measurements between translocation junction and their nearest genomic repeats to the distances separating 1,000 random positions and their corresponding nearest repeats. Each translocation was compared to the distribution of distances created by the random sites using a one-sided Student’s t-test, to generate a P value; low P values indicate that the translocation is significantly closer to a repeat element than expected by random chance.



Abelson murine leukemia viral oncogene homolog 1


(Arthrobacter luteus) element


B-cell lymphoma 6


Breakpoint cluster region


Immunoglobulin enhancer binding factors E12/E47


ets variant gene 6


Fibroblast growth factor 8


IgG heavy chain locus


Long INterspersed element 1


Long INterspersed element 2


LIM domain only 2


Medium reiteration frequency repetitive 20 element


Myeloid/lymphoid or mixed lineage leukemia gene


v-myc avian myelocytomatosis viral oncogene homolog


Pre B-Cell leukemia transcription factor


Runt-related transcription factor 1


Runt-related transcription factor 1, translocated to, 1


Stem cell leukemia hematopoietic transcription factor


Short INterspersed element


Small nuclear RNA


T-cell antigen receptor, alpha subunit


Transcription factor 3.


  1. 1.

    Lin C, Yang L, Tanasa B, Hutt K, Ju BG, Ohgi K, Zhang J, Rose DW, Fu XD, Glass CK, Rosenfeld MG: Nuclear receptor-induced chromosomal proximity and DNA breaks underlie specific translocations in cancer. Cell 2009,139(6):1069-1083. 10.1016/j.cell.2009.11.030

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  2. 2.

    Elliott B, Richardson C, Jasin M: Chromosomal translocation mechanisms at intronic alu elements in mammalian cells. Molecular cell 2005,17(6):885-894. 10.1016/j.molcel.2005.02.028

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Aplan PD, Chervinsky DS, Stanulla M, Burhans WC: Site-specific DNA cleavage within the MLL breakpoint cluster region induced by topoisomerase II inhibitors. Blood 1996,87(7):2649-2658.

    CAS  PubMed  Google Scholar 

  4. 4.

    Libura J, Ward M, Solecka J, Richardson C: Etoposide-initiated MLL rearrangements detected at high frequency in human primitive hematopoietic stem cells with in vitro and in vivo long-term repopulating potential. Eur J Haematol 2008,81(3):185-195. 10.1111/j.1600-0609.2008.01103.x

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  5. 5.

    Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette LJ 3rd, Lohr JG, Harris CC, Ding L, Wilson RK, Wheeler DA, Gibbs RA, Kucherlapati R, Lee C, Kharchenko PV, Park PJ, Cancer Genome Atlas Research Network: Landscape of somatic retrotransposition in human cancers. Science 2012,337(6097):967-971. 10.1126/science.1222077

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  6. 6.

    Schlissel MS, Kaffer CR, Curry JD: Leukemia and lymphoma: a cost of doing business for adaptive immunity. Gen Dev 2006,20(12):1539-1544. 10.1101/gad.1446506

    CAS  Article  Google Scholar 

  7. 7.

    Gajecka M, Pavlicek A, Glotzbach CD, Ballif BC, Jarmuz M, Jurka J, Shaffer LG: Identification of sequence motifs at the breakpoint junctions in three t(1;9)(p36.3;q34) and delineation of mechanisms involved in generating balanced translocations. Hum Genet 2006,120(4):519-526. 10.1007/s00439-006-0222-1

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Tsai AG, Lu H, Raghavan SC, Muschen M, Hsieh CL, Lieber MR: Human chromosomal translocations at CpG sites and a theoretical basis for their lineage and stage specificity. Cell 2008,135(6):1130-1142. 10.1016/j.cell.2008.10.035

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  9. 9.

    Beck CR, Garcia-Perez JL, Badge RM, Moran JV: LINE-1 elements in structural variation and disease. Ann Rev Genom Hum Genet 2011, 12: 187-215. 10.1146/annurev-genom-082509-141802

    CAS  Article  Google Scholar 

  10. 10.

    Kidd JM, Graves T, Newman TL, Fulton R, Hayden HS, Malig M, Kallicki J, Kaul R, Wilson RK, Eichler EE: A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell 2010,143(5):837-847. 10.1016/j.cell.2010.10.027

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  11. 11.

    Liu P, Erez A, Nagamani SC, Dhar SU, Kołodziejska KE, Dharmadhikari AV, Cooper ML, Wiszniewska J, Zhang F, Withers MA, Bacino CA, Campos-Acevedo LD, Delgado MR, Freedenberg D, Garnica A, Grebe TA, Hernández-Almaguer D, Immken L, Lalani SR, McLean SD, Northrup H, Scaglia F, Strathearn L, Trapane P, Kang SH, Patel A, Cheung SW, Hastings PJ, Stankiewicz P, Lupski JR, Bi W: Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell 2011,146(6):889-903. 10.1016/j.cell.2011.07.042

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  12. 12.

    Wiemels JL, Hofmann J, Kang M, Selzer R, Green R, Zhou M, Zhong S, Zhang L, Smith MT, Marsit C, Loh M, Buffler P, Yeh RF: Chromosome 12p deletions in TEL-AML1 childhood acute lymphoblastic leukemia are associated with retrotransposon elements and occur postnatally. Cancer Res 2008,68(23):9935-9944. 10.1158/0008-5472.CAN-08-2139

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  13. 13.

    Jaffe ES: The 2008 WHO classification of lymphomas: implications for clinical practice and translational research. Hematol/Educ Prog Am Soc Hematol 2008, 2009: 523-531.

    Article  Google Scholar 

  14. 14.

    Haldar M, Hancock JD, Coffin CM, Lessnick SL, Capecchi MR: A conditional mouse model of synovial sarcoma: insights into a myogenic origin. Cancer Cell 2007,11(4):375-388. 10.1016/j.ccr.2007.01.016

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Lynch VJ, Leclerc RD, May G, Wagner GP: Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nature genetics 2011,43(11):1154-1159. 10.1038/ng.917

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Sasaki T, Nishihara H, Hirakawa M, Fujimura K, Tanaka M, Kokubo N, Kimura-Yoshida C, Matsuo I, Sumiyama K, Saitou N, Shimogori T, Okada N: Possible involvement of SINEs in mammalian-specific brain formation. Proc Nat Acad Sci USA 2008,105(11):4220-4225. 10.1073/pnas.0709398105

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  17. 17.

    Emera D, Wagner GP: Transformation of a transposon into a derived prolactin promoter with function during human pregnancy. Proc Nat Acad Sci USA 2012,109(28):11246-11251. 10.1073/pnas.1118566109

    PubMed Central  CAS  Article  PubMed  Google Scholar 

Download references


This work was supported the Resident Research Fund grant from the Department of Pathology at the Johns Hopkins Hospital to NR.

Author information



Corresponding author

Correspondence to Nemanja Rodić.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

NR conceived of the study. NR, JZ, TCC, and KHB wrote the code to produce genomic distances. NR, SJW, and KHB performed statistical analyses. NR, SJW, and KHB drafted the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

Nucleotide positions of translocation junctions examined.

Additional file 1: Column A depicts a gene symbol that specifies one of the two translocation partners within a given hematolymphoid neoplasm with recurrent genetic abnormality. Column B denotes sequence used to determine translocation junction. Columns C and D denote chromosomal position and nucleotide position of translocation junction, relative to March 2006 Human Genome Assembly (hg18). (XLSX 53 KB)

Additional file 2: Program used to calculate translocation junction to repeat distance and to generate 1,000 random positions for each translocation region. (TXT 5 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Rodić, N., Zampella, J.G., Cornish, T.C. et al. Translocation junctions in TCF3-PBX1 acute lymphoblastic leukemia/lymphoma cluster near transposable elements. Mobile DNA 4, 22 (2013).

Download citation


  • Translocation Breakpoint
  • Retrotransposon Sequence
  • Transposable Element Sequence
  • Genomic Repeat
  • Translocation Site