L1 retrotransposons inserted within genes in the human genome show a strong bias against sense orientation with respect to the gene. One suggested explanation for this observation was the possibility that L1 inserted randomly, but that there was negative selection against sense-oriented insertions. However, multiple studies have now found that de novo and polymorphic L1 insertions, which have little opportunity for selection to act, also show the same bias.
Here we show that the transcription-coupled sub-pathway of nucleotide excision repair does not affect the overall rate of insertion of L1 elements, which is in contrast with the regulation by the global sub-pathway of nucleotide excision repair. The transcription-coupled subpathway does cause a strong bias against insertion in the sense orientation relative to genes.
This suggests that a major portion of the L1 orientation bias might be generated during the process of insertion through the action of transcription-coupled nucleotide excision repair.
Sequencing of the human genome revealed that transposable elements make up almost half of the genome [1, 2]. The long interspersed element L1 is the only active, autonomous retrotransposon in the human cells and constitutes 17% of the genome. L1 inserts are relatively randomly distributed in genic and intergenic regions, with the elements showing a genomic preference for AT-rich regions . However, L1 copies within genes show a significant enrichment for the antisense orientation. It has been proposed that this orientation bias may be caused by a selection process limiting transcriptional interference with gene expression . However, a similar trend is observed with published de novo inserts recovered in HeLa cells using an engineered L1 element, although there are insufficient data for that to reach significance [4,5,6], and somatic L1 insertions identified in brain cells . These latter findings would be expected to be subjected to much less selection and raise the possibility for an insertion-related mechanism controlling L1 insertion in actively transcribed genes, in a gene-orientated manner.
Transcription-coupled repair (TCR), a sub-pathway of nucleotide excision repair (NER), is a DNA repair pathway that excises helix distorting lesions. These lesions are typically caused by UV-light exposure or chemical compounds and they block the RNA polymerase II (RNAPII) processivity on the template strand of transcribing genes ( and Fig. 1). CSA and CSB (Cockayne Syndrome proteins A and B), the sensor proteins of the pathway, are recruited to the stalled RNAPII complex and initiate the excision process of the damaged strand. If the bulky DNA lesion is located on the coding strand of the gene or in an untranscribed genomic region, they do not interfere with the transcription process and are then subject to the slower, global genome repair (GGR) NER sub-pathway ([9, 10] and Fig. 1). After lesion recognition, the TCR and GGR mechanisms converge on a common series of steps. Briefly, the DNA helix is opened by the helicase proteins, XPD and XPB, of the TFIIH complex . The open DNA structure is then stabilized by XPA-RPA proteins [12, 13]. ERCC1-XPF and XPG endonucleases cleave the damaged strand at 5′ and 3′ ends of the lesion [14, 15].
We have recently shown that several proteins of the NER pathway, notably two central proteins of the DNA repair, XPD and XPA , as well as the endonuclease ERCC1-XPF  and the lesion binding protein XPC of the GGR pathway, limits L1 retrotransposition . In cells with mutations in these genes, the L1 retrotransposition rate increased and generated larger tandem site duplications(TSDs) at the insertion site that are abnormally large [4,5,6, 16, 18]. As the GGR pathway can inhibit L1 insertions, we hypothesize that the TCR sub-pathway may also serve the same role. The TCR sub-pathway is only active on the portion of the genome that is actively transcribing in any given cell ( and see Discussion) while the GGR sub-pathway activity would continue to protect the majority of the genome. Because of this, we expect that TCR would not greatly affect the overall rate of retrotransposition. However, we hypothesize that it might generate a strong bias against L1 insertions in the template strand of transcribed genes, thereby helping to explain the observed bias in orientation of L1 elements within genes.
Cell lines and culture conditions
HeLa cells (ATCC CCL2) were grown in eMEM supplemented with 10% Fetal Bovine Serum, 0.1 mM non-essential amino acids (Life Technologies) and 1 mM sodium pyruvate (Life Technologies) at 37° in a 5% carbon dioxide environment. The following cell lines were obtained from the Coriell Cell Repository: CSA-SV40 transformed fibroblasts (GM16094), XPC-SV40 transformed fibroblasts (GM15983), XPD- SV40 transformed fibroblasst (GM08207), the stably complemented version of XPD- cell line (XPD+) (GM15877). XPC-, XPD- and CSA- cell lines were grown in eMEM supplemented with 10% Fetal Bovine Serum, 0.1 mM non-essential amino acids (Life Technologies) at 37° in a 5% carbon dioxide environment. XPD+ cell line was grown in the DMEM supplemented with 10% Fetal Bovine Serum (Life Technologies). A stably complemented version of the CSA- cell line (CSA+) was generated in this study by transfecting CSA- cells with a CSA cDNA expression vector (# EX-S0507-M67, GeneCopoeia) along with a hygromycin selection vector to allow selection for integration. CSA+ cells are maintained in eMEM medium supplemented with 10% Fetal Bovine Serum (Life Technologies), 0.1 mM non-essential amino acids (Life Technologies) and 200 μg/mL hygromycin at 37° in a 5% carbon dioxide environment.
JM102/L1.3 contains the CMV promoter upstream of the L1.3 element deleted for the 5′ UTR and the mneo indicator cassette cloned in pCEP4 plasmid .
JM102/D702A/L1.3 derives from JM102/L1.3 and contains the reverse transcriptase deficient mutant of an L1.3 element and the mneo retrotransposition cassette cloned in pCEP4 vector .
TAM102/L1.3 contains the CMV promoter upstream of the L1.3 element deleted for the 5′ UTR and the mblastI indicator cassette cloned in pCEP4 vector .
TAM102/D702A/L1.3 derives from TAM/L1.3 and contains the reverse transcriptase deficient mutant of an L1.3 element and the mblastI indicator cassette cloned in pCEP4 vector .
TAM102/H230A/L1.3 derives from TAM102/L1.3 and contains the endonuclease deficient mutant of the L1.3 element and the mblastI indicator cassette cloned in pCEP4 vector .
# EX-S0507-M67 (GeneCopoeia) contains the CSA cDNA driven by CMV promoter and a hygromycin resistance gene in pReceiverM67 vector.
The synL1_neo vector used for the recovery of de novo L1 inserts was previously described .
The pIRES2-EGFP vector (Clontech) contains a neomycin resistance gene expressed from a SV40 promoter. The vector contains a multi-cloning site upstream of an IRES and eGFP marker. The cloned gene and eGFP marker are expressed from the CMV promoter on the same transcript.
All plasmid DNA were purified by Maxiprep kit (Qiagen). DNA quality was also evaluated by the visual assessment of ethidium bromide stained agarose gel electrophoresed aliquots.
Briefly, 5 × 106 CSA+ and CSA- cells were seeded in T75 flasks. Cells were transfected the next day at about 90% confluence using Lipofectamine 2000 (Life Technologies) following the manufacturer’s protocol. Cells were transfected with 3 μg of L1.3 or L1.3-RT (−) construct tagged with the mneo retrotransposition cassette (JM102/L1.3 or JM102/D702A/L1.3) in T75 flasks. Two days after transfection, cells were selected for the transposition events in medium, containing 500 μg/mL Geneticin (Life Technologies). After 14 days, cells were fixed and stained with crystal violet solution (0.2% crystal violet in 5% acetic acid and 2.5% isopropanol) (Fig. 2b). Each assay was performed in triplicate. The number of neoR colonies was counted in each flask.
L1 toxicity and colony formation assay
L1 toxicity and colony formation assays were performed using the L1 episomal and the pIRES2-EGFP vectors. Briefly, 5 × 106 CSA+ and CSA- cells were seeded in T75 flasks. Cells were transfected the next day at about 90% confluence using Lipofectamine 2000 (Life Technologies) following the manufacturer’s protocol. Cells were transfected with 3 μg of L1.3, or L1.3-EN (−) construct tagged with the mblast retrotransposition cassette (TAM102/L1.3, or TAM102/H230A/L1.3) and 0.5 μg of pIRES2-EGFP vectors (pIRES2-GFP was used because it contains a G418 resistance cassette). Cells were selected for the presence of the pIRES2-EGFP plasmid in selective medium containing 500 μg/ml geneticin (Life Technologies) for 14 days. The cells were then fixed and stained with crystal violet solution (0.2% crystal violet in 5% acetic acid and 2.5% isopropanol). The number of neoR colonies was counted in each flask.
Total RNA were extracted from a confluent T75 flask, using TRIzol Reagent (Life Technologies). We then carried out chloroform extraction and isopropanol precipitation. RNA was suspended in 100 μL of DEPC-treated water. The cDNA was synthetized using the Reverse Transcription System (Promega), following the manufacturer’s protocol. Briefly 1 μg of total RNA was denatured at 65° for 5 min. The reverse transcription reaction was primed with Oligo(dT)15 primers and incubated at 42° for 1 h in a thermocycler (BioRad, C1000 Touch). The enzyme was then heat-inactivated at 85° for 5 min. The PCR amplification of CSA cDNA was performed using previously published primers . Meanwhile, the PCR amplification of beta-actin cDNA was performed as a control of the assay. The PCR products were analyzed on a 1% agarose gel and the bands were gel extracted and cloned into TOPO-TA (Life Technologies). Cloned PCR products were Sanger sequenced using M13 forward and reverse primers. Samples were sent for Sanger sequencing to Elim Biopharmaceuticals, Inc., Hayward, California. Lasergene 10 SeqBuilder software was utilized for sequence analysis and the sequences were compared to the reference cDNA using BLAST software (website: https://blast.ncbi.nlm.nih.gov/Blast.cgi).
Recovery of de novo L1 insertions
De novo L1 insertion recovery was performed as previously described . Briefly, 5 × 106 CSA- and CSA+ cells were transfected with 3 μg synL1_neo rescue vector  using Lipofectamine 2000 reagent (Life Technologies). Cells were selected with 500 μg/mL of Geneticin (Life Technologies) for 14 days to allow for colony formation. NeoR cells were harvested by trypsinization and genomic DNA was extracted using a Qiagen DNeasy Blood and Tissue kit. Genomic DNA was digested with 100 U of HindIII (NEB) overnight at 37°. The following day, digested genomic DNA was self-ligated using 1200 U T4 DNA ligase (NEB) in a volume of 1 mL overnight at room temperature. DNA was purified and concentrated using centrifugal filters (Amicon Ultra, 0.5 mL, 50 K, Millipore). Purified DNA was transformed by electroporation into competent DH5α E. coli (Life Technologies). Individual kanamycin-resistant colonies were grown and plasmid DNA was harvested using SV Wizard miniprep kit (Promega). The 5′ end of the de novo L1 insertion was sequenced using primers specific to the L1 rescue plasmid and primer walking until the 5′ end of the insert was recovered as described in . Because sequencing through a long adenosine tract at the 3′ end of the L1 insertions is not effective, the 3′ flanking genomic region was sequenced by ligation mediated PCR based on [23, 24]. Briefly, a pool of five to six L1 rescue vectors was digested with StuI (NEB) to relax supercoils, and then sheared by sonication using a Bioruptor (Diagenode, high, 30 s on, 90 s off, for 12 min). Sheared plasmid DNA was primer extended using an oligo specific to the 3′ end of the synL1_neo rescue plasmid (3′_rescue_1: 5′ ATATATGAGTAACCTGAGGC 3′ or 3′_rescue_1_secondpA: 5′ GTGGGCATTCTGTCTTGTTC 3′). Duplexed T-linkers were ligated using 10 U T4 DNA ligase and PCR was performed using the primers: linker specific (5′ ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3′) and 3′_rescue_1 (or 3′_rescue_1_secondpA) primer. PCR was carried out with these steps: initial denaturation at 94°, 20 cycles of 94° for 30s, 60° for 1 min, 72° for 1 min, and a final extension for 10 min at 72°. PCR reactions were run on a 1% agarose gel and a light smear between 400 and 700 nt was gel extracted with the Qiaquick gel extraction kit (Qiagen). One μL of gel extracted DNA was subject to an additional 15 cycles of PCR amplification as described above using linker specific and nested 3′ rescue vector primers (3′_rescue_2: 5′ TGAGTAACCTGAGGCTATGCTG 3′ or 3′_rescue_2_secondpA: 5′ TTCTGTCTTGTTCCGGTTCTTAAT 3′). The nested PCR product was run on a 1% agarose gel and the resulting smear was gel extracted and cloned into TOPO-TA (Life Technologies). Cloned PCR products were Sanger sequenced using M13 forward and reverse primers to determine 3′ end junctions. Samples were sent for sequencing to Elim Biopharmaceuticals, Inc., Hayward, California. Lasergene 10 SeqBuilder software was utilized for sequence analysis. Flanking regions were mapped on the human reference genome hg19 (build 37) using Blat tool (https://genome.ucsc.edu/cgi-bin/hgBlat). The sequence data related to these insertions is included in Additional file 1: Table S3.
To evaluate expression of CSA protein in the cells, HeLa, XPC-, XPD-, XPD+, CSA- and CSA+ cells were haverested in 300 μl of lysis buffer (50 mM Tris, pH 7.2, 150 mM NaCl, 0.5% Triton X-100, 10 mM EDTA, 0.5% SDS). After 10 min of sonication (Bioruptor, Diagenode, manufacturer’s recommended settings), lysates were clarified by centrifugation for 15 min at 4° at 13,000 rpm and the protein concentration was determined by Bradford assay (Biorad). 40 μg of protein was fractionated on a 4–12% bis-tris polyacrylamide gel (Life Technologies). Proteins were transferred to a nitrocellulose membrane using the iBlot gel transfer system from Life Technologies (manufacturer’s settings). The membrane was blocked for 1 h at room temperature in PBS (pH 7.4), 0.1% Tween 20 (Sigma), 5% skim milk powder (OXOID) and then incubated overnight at 4° with an anti-CSA monoclonal antibody (D-2, sc-376,981, Santa Cruz Biotechnology) diluted at 1:500 and an anti-GAPDH antibody (FL-335, sc-25,778, Santa Cruz Biotechnology) diluted at 1:1000 in PBS, 0.1% Tween 20, 3% non-fat dry milk. The membrane was then incubated for 1 h at room temperature with the secondary goat anti-mouse or donkey anti-rabbit HRP-conjugated antibody (sc-2005, sc-2313, Santa Cruz Biotechnology) diluted at 1:100,000 in PBS, 0.1% Tween 20, 3% non-fat milk. Signals were detected using Super Signal West Femto Chemiluminescent Substrate (Pierce).
UV sensitivity assay
The protocol was adapted from . Briefly 5 × 105 cells were seeded in 6-cm plates and grown in growth medium for 24 h. The growth medium was removed and the cells were irradiated in the presence of 1 mL of 1X phosphate buffer saline (PBS) with a bactericidal UVC lamp (254 nm, 1.57 J/m2/s) at 0, 3, 6, 9 and 12 J/m2 UVC dose. The PBS was removed and replaced with growth medium. After 4 days, cells were counted with a hemocytometer to determine cell survival. Cell survival was calculated as the percent of live cells in the irradiated sample relative to the untreated sample.
RNA-Seq analysis of HeLa gene expression
RNA was isolated from HeLa cells as described for RT-PCR. 5 μg of RNA was submitted to the University of Wisconsin Biotechnology Center (http://www.biotech.wisc.edu/services/dnaseq/services/Illumina) for polyA selection and strand-specific 2 × 100 bp RNA sequencing on an Illumina HiSeq2000. Approximately 40 million reads were subjected to RSEM analysis  on the human GR38 reference genome and output calculated for all of the ENCODE coding gene alignments in FPKM (fragments per kilobase per million reads).
CSA protein does not control the rate of L1 retrotransposition
In GGR-deficient cells, we have observed an increase of 3–10-fold in L1 retrotransposition rate in comparison to the complemented cell lines, suggesting that the NER repair pathway limits L1 insertion to the genome . We therefore wondered if the L1 retrotransposition rate would also increase in TCR-deficient cells. SV40-transformed, CSA-deficient (CSA-) skin fibroblasts were obtained from Coriell Cell Repository from a patient suffering from cockayne syndrome (see materials and methods). These cells express a truncated CSA mRNA that does not produce functional CSA protein and the cells are remarkably sensitive to UV light exposure ( and Additional file 2: Figure S1). We stably complemented the cells by transfection with a CSA cDNA expression vector under selection and controlled for the efficiency of the complementation with a functional UV sensitivity assay (Materials and Methods and ). The data revealed that the stably complemented (CSA+) cells are less sensitive to UV light exposure (Additional file 2: Figure S1A). RT-PCR and immunoblot assays confirmed the overexpression of CSA mRNA and protein in the stably complemented cells (Additional file 2: Figs. S1A and S1B).
To test the activity level of the L1 retrotransposon in CSA-deficient and complemented cells, we performed an L1 retrotransposition assay by transfecting the cells with the JM102/L1.3 vector expressing the L1.3 element tagged at the 3’end with mneoI retrotransposition cassette . The retrotransposition cassette contains an antisense neomycin resistant gene, interrupted by a sense oriented intron that is spliced only in L1 mRNA (Fig. 2a). Therefore, the neoR gene becomes expressed and functional only after retrotransposition. The assay allows for an estimation of L1 retrotransposition rate by counting NeoR colonies 14 days after selection (Fig. 2b). In contrast to the results obtained in GGR-deficient cells, the retrotransposition assays do not show a rate increase in CSA- cells in comparison to isogenic CSA+ cells (Fig. 2c and Additional file 2: Figs. S2A-C). There were also no measurable differences in L1-caused toxicity in the cells or cell growth as shown in Fig. 2d and Additional file 2: Figs. S2D-F. This study suggests that if there is a difference of L1 retrotransposition rate in these cells, it is relatively minor, as we would have predicted based on the relatively small portion of the genome under surveillance by the TCR-NER pathway at any one time ( and see RNA-Seq gene expression data (Additional file 2: Figure S3)).
de novo L1 inserts do not generate large duplications at the target site in CSA-deficient cells
In GGR-deficient cells, we also observed that abnormally large duplications (over 1 kb) were formed at the L1 insertion site . We therefore decided to investigate the features of L1 de novo insertions in CSA-deficient and complemented cells (Additional file 1: Tables S1 and S2). We have recovered 60 and 75 L1 de novo insertions from CSA-deficient and complemented cells, respectively (Additional file 1: Tables S1 and S2), using the synL1_neo rescue vector and the previously published method (Materials and Methods section and [16, 20, 27]). Surprisingly, the characteristics of L1 de novo insertions were very similar in CSA- and CSA+ cells. No chromosome was specifically targeted by L1 de novo insertions. No significant difference was identified in the median length of the inserts in CSA+ and CSA- cells (3401 and 3642 bp respectively) (Fig. 3a). Additionally, we found about 21% of L1 de novo insertions were full length in both cells lines, consistent with 10% - 30% observed in previous studies [1, 4, 28,29,30]. Except for one recovered insert in CSA- cells, all L1 de novo insertions had a poly-A tail and their target site sequences were T-rich, close to the TTTT/A consensus sequence (Additional file 1: Tables S1 and S2; [4, 6, 20]). Deletions (2 to 2000 bp) at the target site of L1 de novo insertion were identified in 19 out of 60 insertions (31,6%) in CSA- deficient and in 21 out of 77 insertions (27%) in the complemented cells (Additional file 1: Tables S1 and S2). A high rate of genomic deletions was also reported in XPD+ and HeLa cells (47% and 26%, respectively) [4, 16]. Typical target-site duplications (TSDs) duplications were primarily observed at the target site of L1 de novo insertions recovered from CSA- and CSA+ cells (Additional file 1: Tables S1 and S2). The TSD size ranged from 1 to 29,902 bp in CSA- cells and from 1 to 3450 bp in CSA+ cells with a median length of 13 and 12 bp in CSA+ and CSA- cells, respectively (Fig. 3b). These data corresponded to the typical observations reported in HeLa cells or complemented NER cells (15 bp on average) [4, 16, 18] and were very different to the abnormally large TSDs (over 1 kb on average) observed in the other GGR-deficient cells .
Does TCR-NER influence the insertional bias of L1 elements in genes?
We then investigated the distribution of L1 de novo insertions from our tagged vector in the genomes of CSA-deficient and complemented cells. Because the TCR pathway specifically excises the DNA lesion that interrupts the transcription process, it seems likely that a nascent L1 insert in the template strand would block transcription, and possibly trigger TCR to remove the inhibiting L1 retrotransposition event.
As observed in the reference genome and in many cell lines (Additional file 2: Figure S4A and ), L1 de novo insertions were almost equally dispersed in genic and intergenic regions of the genome of both CSA- and CSA+ cells (Additional file 1: Tables S1 and S2 and Fig. 4a). Nevertheless, when L1 de novo insertions were integrated within genes in the complemented CSA+ cells, we characterized twice as many antisense-oriented as sense-oriented insertions (62.1% and 37.9% respectively) (Fig. 4b). This observation agreed with the previously reported trends for the genomic orientation of L1 elements in genes [4, 21, 31] (see Additional file 2: Figure S3), L1 de novo insertions in HeLa cells (see Additional file 1: Table S4) and brain cells [4, 7]. In contrast, L1 de novo insertions showed no significant bias in sense versus antisense orientation in CSA-deficient cells (Fig. 4b).
We reasoned that if the TCR sub-pathway would influence L1 orientation in genes, any steps in the pathway downstream from the sensor (CSA) would influence similarly L1 de novo insertions, while the sensor for the GGR sub-pathway (XPC) would not have the same effect. XPC-deficient cells showed a similar orientation bias for L1 de novo insertions to those seen in other TCR-proficient cells (Additional file 2: Figure S4B). However, L1 de novo insertions were equally sense and antisense oriented in XPD-deficient cells, which are defective for the downstream NER pathway factor that affects both TCR and GGR (Additional file 2: Figure S4B). In XPD+ cells, the complemented version of XPD- cells, the orientation bias was again observed for L1 de novo insertions (Additional file 2: Figure S4B).
In conclusion, our results revealed that L1 de novo insertions were preferentially antisense oriented in cells proficient for the TCR pathway (TCR+, Fig. 4c), such as HeLa, CSA+, XPD+ cells as well as XPC- cells. In TCR-deficient cells (TCR-, Fig. 4c), such as CSA- and XPD- cells, the orientation of L1 de novo insertions within genes was random (Fig. 4c).
Expression of genes in which L1 inserted in HeLa cells
Because TCR is only active when a transcription complex hits a DNA lesion on the template strand, we predicted that sense-strand L1 insertions (occurring in the template strand) would be depleted in actively transcribing genes relative to the antisense-oriented insertions that would not be predicted to be affected by TCR (see Additional file 2: Figure S5). We therefore carried out a quantitation of gene expression for the ENCODE coding sequences in the human genome from HeLa cells. HeLa cells were chosen because they have an intact TCR pathway and because there is more available data on de novo inserts in HeLa cells than any other cell line.
In this study, approximately 80% of the cellular genes had little or no transcription (Additional file 2: Figure S3) confirming that they would be unlikely targets for TCR. Many of the expressed genes had expression levels less than 1 % the level of GAPDH, suggesting that they might be less subject to TCR than more actively transcribed genes.
When we examined HeLa de novo inserts analyzed with the rescue approach utilized in this manuscript from Gilbert et al.  and from this study, we see 39 inserts in the antisense orientation relative to ENCODE genes and 17 in the sense orientation (Additional file 1: Table S1). This ratio of antisense to sense is very similar to the ratio seen in the genome . When we look at the expression levels from those genes, we see that the genes with antisense inserts have an average FPKM expression value of almost 25, while the sense inserts are in genes with less than 10 FPKM. This is significant at the 0.04 level in a two-tailed T-test. Furthermore, given that the majority of ENCODE genes have no measured expression, it is interesting that even though the genes in which the insertions occurred are not highly expressed, there is also a depletion of insertions in non-expressed genes. We are not sure if this represents a preferred target for insertion or the requirement for open chromatin to allow the selectable marker in the L1 element to express.
Although L1 retrotransposons are inserted throughout the human genome, these autonomous mobile elements have been found to be located with a strong antisense bias within genes . This orientation bias is a characteristic of referenced and established L1 elements as well as polymorphic and de novo insertions (Additional file 1: Table S1) [4, 7]. Although it has been suggested that the bias may be the result of selection eliminating the insertions in the sense orientation that might be more disruptive of gene expression , this seems unlikely to have a strong influence on the de novo insertions. Thus, it is worth considering whether there is a specific mechanism limiting sense insertion in genes, possibly limiting the mutagenic impact of these insertion events. In the present study, we have demonstrated that recovered L1 de novo insertions are equally sense and antisense oriented within active genes in CSA- and XPD- deficient cells, both defective in the TCR pathway. These results suggest that the TCR pathway is responsible for much of the orientation bias of L1 elements in the human genome, although we cannot rule out some post-insertional selection influences as well. This demonstrates that in addition to the influence of GGR on L1 retrotransposition rate, the TCR subpathway also influences the distribution of inserts.
In cells proficient for TCR, the pathway is recruited at stalled RNAPII complex and excises DNA lesions blocking the RNAPII processivity on the template strand (Fig. 1). After the repair, the transcription process is re-initiated. If L1 elements insert in the template strand of a gene, they would end up in the same orientation as the gene . Insertions in the coding strand that would result in antisense insertions would not be expected to stall RNAPII and induce TCR (Fig. 5a). The data presented in our study suggest that the TCR pathway may prevent the insertion of L1 elements in the template strand of actively transcribed genes, but not in the coding strand, leading to the observed orientation bias of L1 inserts in the genome. This is supported by both the ratios of sense to antisense inserts (Fig. 4), as well as the tendency for sense inserts to be present in less expressed genes (Additional file 1: Table S1) than antisense inserts in HeLa cells with active TCR. Conversely, if the L1 machinery targets the coding strand, there would be no interference with the RNAPII complex and a L1 de novo insertion would be able to occur (Fig. 5b). The de novo insertion would be in antisense orientation within the gene.
Our data are consistent with the model that the TCR pathway may minimize interference of gene expression by new L1 retrotransposition events. We did not observe a strong effect of the TCR regulation on the overall L1 retrotransposition rate because only a small part of the genome is actively and efficiently transcribed at any given time in a cell (see Additional file 2: Figure S3) and the rest can still be protected from de novo L1 insertion by the GGR pathway (Additional file 2: Figure S5). The TCR pathway, which is essential for the protection of gene expression, represents a unique mechanism in the regulation of L1 retrotransposition especially during embryonic development when L1 activity is high  and L1-caused mutations could be detrimental for cell survival.
Although L1 elements are distributed throughout the genome, there are likely to be multiple factors that influence their distribution. L1 elements preferentially insert into a locally A + T-rich target sequence [6, 35, 36]. Thus, it is likely that the relative density of such A + T-rich target sequences may influence the rate of insertion in those regions. In addition, insertion of L1 sequences into genes may provide various signals that either fully or partially disrupt expression of the gene [37,38,39] resulting in negative selective pressure that will eventually lead to depletion of genes in which L1 insertions have occurred [40,41,42,43] This is likely a contributor to the relative paucity of L1 within genes that increases over evolutionary time [3, 41]. The insertion of L1 sequences may be more disruptive in one orientation relative to another  which could also lead to selection for more L1 elements in one orientation relative to another within genes. However, our finding that TCR can contribute strongly to such an insertion bias provides a mechanism that may establish such a bias immediately, without requiring time for selective pressure to alter the frequency.
This work shows that the previously observed bias against sense-oriented L1 elements in genes is primarily due to transcription-coupled nucleotide excision repair being able to block sense insertions, rather than principally being due to selection post insertion. This would serve to minimize the negative impact of L1 insertions on gene expression.
Cockayne syndrome protein A
Cockayne syndrome protein B
Enhanced green fluorescent protein
Excision repair 1
Fragments per kilobase per million reads
Global genome repair sub-pathway of NER
Internal ribosome entry site
Joules per meter squared per second
LINE-1 or Long, INterspersed Element-1
Blasticidin resistance cassette
Neomycin resistance cassette
Neomycin or geneticin resistance
Nucleotide excision repair
Polymerase chain reaction
RNA polymerase II
Next generation sequencing protocol for RNA
Reverse transcription – PCR
Simian virus 40
Target site duplication
Xeroderma pigmentosum protein A
Xeroderma pigmentosum protein C
Xeroderma pigmentosum protein D
Xeroderma pigmentosum protein F
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921.
Sugasawa K, Ng JM, Masutani C, Iwai S, van der Spek PJ, Eker AP, Hanaoka F, Bootsma D, Hoeijmakers JH. Xeroderma pigmentosum group C protein complex is the initiator of global genome nucleotide excision repair. Mol Cell. 1998;2(2):223–32.
Coin F, Marinoni JC, Rodolfo C, Fribourg S, Pedrini AM, Egly JM. Mutations in the XPD helicase gene result in XP and TTD phenotypes, preventing interaction between XPD and the p44 subunit of TFIIH. Nat Genet. 1998;20(2):184–8.
Vasquez KM, Christensen J, Li L, Finch RA, Glazer PM. Human XPA and RPA DNA repair proteins participate in specific recognition of triplex-induced helical distortions. Proc Natl Acad Sci U S A. 2002;99(9):5848–53.
Sijbers AM, van der Spek PJ, Odijk H, van den Berg J, van Duin M, Westerveld A, Jaspers NG, Bootsma D, Hoeijmakers JH. Mutational analysis of the human nucleotide excision repair gene ERCC1. Nucleic Acids Res. 1996;24(17):3370–80.
Streva VA, Jordan VE, Linker S, Hedges DJ, Batzer MA, Deininger PL. Sequencing, identification and mapping of primed L1 elements (SIMPLE) reveals significant variation in full length L1 elements between individuals. BMC Genomics. 2015;16:220.
Yuanxin Y, Chengcai A, Li L, Jiayu G, Guihong T, Zhangliang C. T-linker-specific ligation PCR (T-linker PCR): an advanced PCR technique for chromosome walking or for isolation of tagged DNA ends. Nucleic Acids Res. 2003;31(12):e68.
Emmert S, Kobayashi N, Khan SG, Kraemer KH. The xeroderma pigmentosum group C gene leads to selective repair of cyclobutane pyrimidine dimers rather than 6-4 photoproducts. Proc Natl Acad Sci U S A. 2000;97(5):2151–6.
Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH Jr. Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci U S A. 2003;100(9):5280–5.
Ponnaluri VK, Ehrlich KC, Zhang G, Lacey M, Johnston D, Pradhan S, Ehrlich M. Association of 5-hydroxymethylation and 5-methylation of DNA cytosine with tissue-specific gene expression. Epigenetics. 2017;12(2):123–38.
We wish to thank other members of the Deininger laboratory and the Consortium for Mobile Elements at Tulane for their constructive criticism of these studies, including manuscript editing by Tiffany Kaul. We wish to acknowledge the expert help of Melody Baddoo and the Cancer Crusaders Bioinformatics Core.
This work was supported by USPHS grant # R01 GM045668 and R01 GM121812.
Availability of data and materials
All reagents are available by request. All DNA sequences are provided in the supplementary material.
GS carried out and did the preliminary analysis for most of the experiments as well as preparing the first draft of the manuscript. VS helped with some of the rescue experiments and participated in manuscript revisions. PD provided guidance, and oversight of experimental design as well as some data analysis and editing of the manuscript. All authors read and approved the final manuscript.
Authors and Affiliations
Tulane University, Tulane Cancer Center and the Department of Epidemiology, 1430 Tulane Ave, New Orleans, LA, 70112, USA
Geraldine Servant, Vincent A. Streva & Prescott L. Deininger
Present Address: Division of Infectious Diseases, Boston Children’s Hospital and Harvard Medical School, 300 Longwood Ave, Boston, MA, 02115, USA
Vincent A. Streva
Tulane Cancer Center, SL66, Tulane University Health Sciences Center, 1430 Tulane Ave., New Orleans, LA, 70112, USA
Characteristics of recovered de novo L1 inserts in CSA-deficient cells. This table describes the general characteristics of the L1 inserts isolated from the CSA-minus cells. Table S2. Characteristics of recovered de novo L1 inserts in stably complemented CSA + cells. This table describes the general characteristics of the L1 inserts isolated from the cells that have been complemented to be CSA+. Table S3A&B. DNA sequences flanking rescued L1 inserts. S3A has the sequence data from the L1 insertion rescues for the CSA-minus cells, while S3B has similar data for the complemented cells that are now CSA plus. Table S4. FPKM values for de novo L1 inserts in HeLa cells that inserted within genes. (ZIP 130 kb)
Control for the efficiency of the complementation of CSA-deficient cells. Figure S2. L1 retrotransposition rate is not significantly different in CSA-deficient cells (CSA-) and in the stably complemented CSA-deficient cells (CSA+). Figure S3. FPKM counts for Encode genes expressed in HeLa. Figure S4. The tendency of de novo L1 elements to insert in the antisense orientation within genes is lost in the cells deficient in the TCR pathway (CSA- and XPD- cells). Figure S5. Model of regulation of L1 insertion in genes by the TCR pathway. (ZIP 241 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Servant, G., Streva, V.A. & Deininger, P.L. Transcription coupled repair and biased insertion of human retrotransposon L1 in transcribed genes.
Mobile DNA8, 18 (2017). https://doi.org/10.1186/s13100-017-0100-5