Properties of LINE-1 proteins and repeat element expression in the context of amyotrophic lateral sclerosis

Background Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease involving loss of motor neurons and having no known cure and uncertain etiology. Several studies have drawn connections between altered retrotransposon expression and ALS. Certain features of the LINE-1 (L1) retrotransposon-encoded ORF1 protein (ORF1p) are analogous to those of neurodegeneration-associated RNA-binding proteins, including formation of cytoplasmic aggregates. In this study we explore these features and consider possible links between L1 expression and ALS. Results We first considered factors that modulate aggregation and subcellular distribution of LINE-1 ORF1p, including nuclear localization. Changes to some ORF1p amino acid residues alter both retrotransposition efficiency and protein aggregation dynamics, and we found that one such polymorphism is present in endogenous L1s abundant in the human genome. We failed, however, to identify CRM1-mediated nuclear export signals in ORF1p nor strict involvement of cell cycle in endogenous ORF1p nuclear localization in human 2102Ep germline teratocarcinoma cells. Some proteins linked with ALS bind and colocalize with L1 ORF1p ribonucleoprotein particles in cytoplasmic RNA granules. Increased expression of several ALS-associated proteins, including TAR DNA Binding Protein (TDP-43), strongly limits cell culture retrotransposition, while some disease-related mutations modify these effects. Using quantitative reverse transcription PCR (RT-qPCR) of ALS tissues and reanalysis of publicly available RNA-Seq datasets, we asked if changes in expression of retrotransposons are associated with ALS. We found minimal altered expression in sporadic ALS tissues but confirmed a previous report of differential expression of many repeat subfamilies in C9orf72 gene-mutated ALS patients. Conclusions Here we extended understanding of the subcellular localization dynamics of the aggregation-prone LINE-1 ORF1p RNA-binding protein. However, we failed to find compelling evidence for misregulation of LINE-1 retrotransposons in sporadic ALS nor a clear effect of ALS-associated TDP-43 protein on L1 expression. In sum, our study reveals that the interplay of active retrotransposons and the molecular features of ALS are more complex than anticipated. Thus, the potential consequences of altered retrotransposon activity for ALS and other neurodegenerative disorders are worthy of continued investigation. Electronic supplementary material The online version of this article (10.1186/s13100-018-0138-z) contains supplementary material, which is available to authorized users.


Background
With the discovery in 1950 of transposable elements (TEs) genomes began to seem far more dynamic than hitherto conceived [1]. It is now clear that TEs have been important long-term drivers of genome evolution.
Year by year, more and more ways in which mobile DNA impacts gene expression and integrity, cell variability and viability, and ultimately human health are revealed. With recent discoveries that TEs are active not only in the germline but also in somatic cells, it is evident that each of us is a mosaic of different genomes that now seem dynamic indeed (reviewed by [2] and many others).
Retrotransposon TEs include long terminal repeat (LTR) and non-LTR class elements. Both retrotranspose by a "copy and paste" mechanism involving reverse transcription of an RNA intermediate and insertion of its cDNA copy at a new site in the genome. LTR-retrotransposons, including human endogenous retroviruses (HERVs), are remnants of past germ line infections by retroviruses that subsequently lost their ability to reinfect cells. While the HERV-K(HML-2) group includes some polymorphic proviral loci [3,4], human LTR retrotransposons generally are insertionally inactive, although many remain capable of transcription. Long Interspersed Element-1 (LINE-1, L1) retrotransposons are the only active autonomous mobile DNA in humans. Alone they occupy at least 17% of our genome and have also been responsible for the insertion in trans of thousands of processed pseudogenes and a million non-autonomous Short Interspersed Elements (SINEs), including Alu and SVA (composite SINE/VNTR/Alu) elements [5]. The 6.0 kilobase (kb) bicistronic human L1 has a 5' untranslated region (UTR) that functions as an internal promoter, two open reading frames (ORF1 and ORF2), and a 3' UTR. A weak promoter also exists on the antisense strand of the human L1 5' UTR [6]. ORF2 encodes a 150-kilodalton (kD) protein with DNA endonuclease and reverse transcriptase (RT) activities. While the 40 kD ORF1p RNA-binding protein is essential for retrotransposition, its exact role in retrotransposition is unclear, although it possesses RNA chaperone and packaging properties [7][8][9]. The great majority of L1s in the genome are 5' truncated and otherwise rearranged or mutated and so incapable of autonomous transcription.
There are 145 fully intact L1s in the human genome of which cell culture retrotransposition assays suggest about 100 remain potentially mobile in any individual diploid genome [10][11][12]. There are also hundreds of full length L1s lacking intact ORFs but possibly capable of generating protein [13,14]. While L1 expression is normally suppressed by a host of cellular factors, the suppression is relaxed in embryonic stem cells, the early embryo, and some cancers (reviewed in [15,16]).
Notably, retrotransposons are also active in some brain cells [17]. Loss of piRNA pathway proteins correlates with elevated retrotransposon expression in Drosophila brain [18], although in mammals this pathway seems to act primarily in the germ line to control retrotransposon activity (see [19] for review). Early studies showed L1 retrotransposition in dividing neuronal progenitor cells (NPCs), especially those of the hippocampus [20,21], and subsequently in non-dividing neurons [22,23]. High-throughput sequencing of single neurons confirmed endogenous L1 retrotransposition in the human brain, although frequency estimates differed significantly (reviewed in [24][25][26][27]). Thus, it has been proposed that L1 activity contributes to neuronal plasticity [28]. Elevated L1 retrotransposition has also been reported for several human neurological conditions, including ataxia telangiectasia [29], Rett syndrome [30], autism [31], schizophrenia [32,33], and major depressive disorder [34], as well as in neuronal cell lines or brains of patients exposed to opioids [35][36][37], and in brains of a mouse model of Huntington's disease [38] and hippocampi of mice following novel exploration [39] or diminished maternal care [40]. However, some of these studies relied solely on DNA amplification by quantitative (q)PCR or digital droplet PCR to compare L1 insertion copy differences between test and normal states, strategies that may fail to distinguish between bona fide genomic L1 insertions and contaminating extrachromosomal L1-derived nucleic acids. Some studies may therefore warrant additional verification (see also [41,42] and discussion).
Several studies have also drawn connections between altered expression of LTR retrotransposons and amyotrophic lateral sclerosis (ALS). ALS is a fatal neurodegenerative disease involving loss of upper and lower motor neurons and afflicts 2 in 100,000 people each year. Death typically follows 2 to 3 years after onset and, while about 90% of cases are sporadic, the rest have a family history of the disease. There are no current means to reverse the course of ALS, and treatment involves efforts to slow progression of symptoms [43]. ALS has overlapping clinical presentations with frontotemporal lobar degeneration (FTLD) and its most common subtype frontotemporal dementia (FTD), a neurolgical condition affecting the frontal and temporal lobes and marked by cognitive and behavioral impairment. About 20% of ALS patients also exhibit FTLD, and ALS and FTLD have been seen as part of a continuous disease spectrum [44].
Increased reverse transcriptase activity from an unknown source is detectable in sera and cerebrospinal fluids of non-HIV-infected ALS patients [45][46][47][48]. Douville et al. [49] correlated this RT activity with elevated expression of a few HERV-K(HML-2) loci and increased amounts of pol gene transcripts and RT protein in cortical neurons of some ALS patients. Hadlock et al. [50] noted elevated immune response to HERV-K(HML-2) Gag protein in serum samples from ALS patients, and recently it was shown that overexpression of the HERV-K(HML-2) envelope protein causes motor neuron toxicity and motor dysfunction in transgenic mice [51]. However, it cannot be excluded that some of the elevated RT activity observed is also due to increased expression of LINE-1 retrotransposons. It is reasonable to presume that cellular changes that increase HERV-K(HML-2) expression in ALS patients may similarly activate other retrotransposons. Indeed, a recent study reported global increases in expression of selected families of both LTR and non-LTR retrotransposons in ALS and FTLD patients with a hexanucleotide expansion in the Chromosome 9 Open Reading Frame 72 (C9orf72) ALS gene but not in sporadic ALS cases or controls [52].
The accumulation of neuronal RNA and protein aggregates, including cytoplasmic stress granules (SGs), is a pathogenic hallmark of a number of neurodegenerative diseases. Pathological aggregation of RNA-binding proteins has been implicated in ALS, FTLD, Alzheimer's disease, spinocerebellar ataxia, Huntington's disease, and inclusion body myositis. In the case of ALS, there is increasing evidence that abnormal RNA processing and abnormal self-aggregation of proteins, leading to altered RNA granule formation and malfunction of protein pathways, contribute to motor neuron death [53]. A key pathological feature of ALS is the presence of cytoplasmic inclusions in degenerating motor neurons and oligodendrocytes. Inclusions are not restricted to the spinal cord and motor cortex but are present in other brain regions, such as the frontal and temporal cortices, hippocampus, and cerebellum, and are especially evident in patients with accompanying FTD [54]. What triggers protein aggregation and what it means for cell pathology and progression of the disease remain unclear.
Aggregation of TAR DNA binding protein 43 (TDP-43, product of the TARDBP gene) is especially interesting as a unifying pathological marker of both FTLD and ALS. Mutations in TARDBP are involved in about 4% of familial (fALS) and 1% of sporadic ALS (sALS) cases. However, even lacking a mutation, TDP-43 protein, while typically nuclear in healthy cells, is cleaved and hyperphosphorylated and accumulates in ubiquitinated cytoplasmic inclusions in almost all ALS and almost half of FTLD patients (reviewed in [55]). TDP-43 protein aggregation pathology also characterizes other neurodegenerative disorders, including Parkinson's [56], Alzheimer's [57] and Huntington's [58] diseases, and inclusion body myopathies [59].
Several features of LINE-1-encoded ORF1p are reminiscent of neurodegeneration-associated proteins. ORF1p is an ubiquitinated and phosphorylated RNA-binding protein prone to forming cytoplasmic aggregates, including SGs [60][61][62][63]. Therefore, it is conceivable therefore that abnormal expression of ORF1p in neuronal cells might aggravate formation of cytoplasmic aggregates and contribute to disease pathology. Here we analyzed subcellular localization and aggregation features of LINE-1 ORF1p and ways they may be analogous to or differ from those of neurodegeneration-associated RNA-binding proteins. We show that some ALS-associated RNA-binding protein mutants closely associate with ORF1p in cytoplasmic RNA granules of tumor cell lines, and that increasing the expression of some ALS proteins, including TDP-43, inhibits L1 retrotransposition in a cell culture reporter assay. We also considered the possibility that LINE-1 retrotransposon activity may be associated with ALS disease. Reverse transcription (RT)-qPCR) analyses failed to detect significantly altered expression of non-LTR Alu or L1 elements in sALS tissues. However, by reanalyzing publicly available RNA-Seq datasets, one previously examined for TE levels [52] and one hitherto untested, we confirmed misregulation of selected TE subfamilies in C9orf72 gene-related ALS samples. While so far the evidence is not compelling for ALS, we believe the potential for altered non-LTR retrotransposon expression playing a role in neurodegenerative disorders is worthy of continued investigation.

A common LINE-1 polymorphism alters formation of ORF1p cytoplasmic RNA granules
Early studies showed that endogenous ORF1p concentrates in cytoplasmic granules in cultured cells or fixed tissues [60,64]. Overexpression or cell stress causes ORF1p to enter stress granules (Fig. 1F) [60,61,65,66]. Here we used the monoclonal α-human α-4H1 ORF1 antibody (Millipore Sigma; [67]), which targets ORF1p N-terminal amino acids (aa) 35 to 44, to examine expression of endogenous ORF1p in 2102Ep cells, an embryonal germ cell teratocarcinoma line that has unchanging human embryonal stem cell (hESC) characteristics [68,69]. By Western blotting, α-4H1-ORF1 detected a robust approximately 42 kD band of size consistent with the ORF1p monomer, a much weaker 85 kD band that likely marks ORF1p dimers, and sporadically a band of 65 kD of unknown identity (Additional file 1: Figure S1A). Single, less intense 42 kD bands were seen in human embryonic kidney HEK 293T and neuroblastoma SH-SY5Y cells. SH-SY5Y is a thrice-cloned sub-line of the bone marrow-derived neural line SK-N-SH, which shows significantly less ORF1p signal. No endogenous ORF1p was detected in human cervical adenocarcinoma HeLa cells (Additional file 1: Figure  S1A).
Confirming previous results [60], constitutive aggregation of ORF1p in the cytoplasm was detected in multiple cell lines by multiple α-ORF1p antibodies (Additional file 1: Figure S1B-E). However, the pattern and degree of granule formation by ORF1p can vary significantly with cell type. In human neuroblastoma SH-SY5Y cells, for example, ORF1p granules are rare in the main cytoplasm but evident in neurite outgrowths (Additional file 1: Figure S1C). Notably, endogenous ORF1p cytoplasmic aggregates differ from SGs in certain ways. In the absence of external stress, in 2102Ep cells small ORF1p aggregates are numerous but only faintly and rarely marked by SG proteins such as cytotoxic granule associated RNA binding protein (TIA1) and elongation initiation factor 3 (eIF3η) (Fig. 1A, C). Furthermore, unlike SGs [70], endogenous ORF1p granules do not obviously dissemble during cell mitosis (Fig. 1E). As previously reported, when exposed to sodium arsenite, an inducer of and eIF3η (c, d) show minimal colocalization with endogenous ORF1p in cytoplasmic granules (shown by arrows) of unstressed human embryonal carcinoma 2102Ep cells (a, c) but colocalize in large stress granules of cells treated with Na-arsenite (0.5 mM for 1 hour) (b, d). Cell nuclei are stained with Hoechst. Size bars are 10 μm. e ORF1p cytoplasmic granules retain integrity during cellular mitosis. Patient sera-derived α-ANA-N marks nucleoli [60]. Endogenous ORF1p is mostly excluded from metaphase chromatin plates (arrow), as shown by Hoechst staining (see arrow). f Ectopically expressed EGFP-tagged human ORF1p induces prominent cytoplasmic granules, but (g) deletion of its Q-N-rich region abolishes granule formation. h The ORF1p R159H point mutation reduces cytoplasmic granule formation by 50%. Approximately 500 cells were examined. i The R159H mutation abolishes cell culture LINE-1 retrotansposition. pc6-RPS-EGFP-ΔCMV wild-type or R159H mutant retrotransposition reporter constructs were transfected in HEK 293T cells and 5 days later the percentages of EGFP-positive cells were determined by flow cytometry. The construct 99-PUR-JM111-EGFP served as a negative control for retrotransposition [84]. Each construct was tested in quadruplicate wells with results for one biological replicate shown oxidative stress, ORF1p redistributes to larger-sized aggregates that now mostly colocalize with SG proteins (Fig. 1B, D). Thus ORF1p aggregates form constitutively but do not chronically induce a cellular stress state that is marked by redistribution of SG proteins.
Previously, it was shown that overexpressed ORF2p and L1 RNA also colocalize with ORF1p in cytoplasmic aggregates [60,61,63,71]. Interestingly, we noted [72], and others confirmed [73], that ORF2p is visible in only a minor percentage of ORF1p-positive cells when the two proteins are coexpressed from an L1 construct. The reason for this is unknown but may relate to an unconventional translation mechanism of LINE-1 ORF2 [74]. Unfortunately, although α-ORF2p antibodies exist [75][76][77][78][79], they are not widely available or are ineffective in detecting endogenous ORF2p, and so we did not examine ORF2p localization in this study.
Many RNA-binding proteins that form SGs have intrinsically disordered prion-like domains rich in glutamine and asparagine (Q-N) residues. Aggregation of prion-like domain proteins is characteristic of various neurodegenerative disorders including ALS (reviewed in [80]). No prion-like domain is predicted in ORF1p using the PrionW [81] or PLAAC [82,83] algorithms. However, human (but not mouse) L1 ORF1p contains a Q-N-rich internal region (36% Q or N between residues 179 and 205, numbering according to accession number AF148856.1). Deletion of this region abolishes granule formation (Fig. 1F, G), indicating it is critical for human L1 ORF1p aggregation properties.
Previously, we and others [60,61] showed that mutations in the N-terminus leucine zipper domain or the C-terminal domain double-point mutation R261/262A (the so-called JM111 mutation that abrogates cell culture retrotransposition; [84]) also alter ORF1p cytoplasmic aggregation. We also reported that a non-conservative mutation, R159G, inhibits ORF1p granule formation. This residue was subsequently shown to be important for RNA-binding and is within the RNP2 sequence of the ORF1p RRM (RNA recognition motif ) [85]. In the present study, to ascertain the prevalence of L1s in the human genome with R159 polymorphisms, we queried the L1Base2 database [13]. L1Base2 is subdivided into 3 categories: L1s with intact ORF1 and ORF2 (FLI-L1s), L1s with intact ORF2 but disrupted ORF1 (ORF2-L1s), and non-intact L1s >4500 nucleotides in length (FLnI-L1s). Although the R159G variant was detected at only very low frequency (0.47% of 6346 alignable FLnI-L1 sequences), many other R159 polymorphisms were found, with R159H being most common. In all, we identified R159 changes to histidine, cysteine or proline residues in 4.8% of FLI-L1s, 11.5% of ORF2-L1s, and 40.3% of FLnI-L1s (Additional file 1: Figure S2A). Thus, sequence variation in the aggregation-control R159 codon of human L1 ORF1p is common in endogenous L1s.
We introduced the R159H change into ORF1-EGFP-L1-RP, a construct with CMV promoter and ORF1 C-terminally tagged with EGFP followed by intact downstream L1 sequence, and as expected observed a 30% decline in the number of HEK 293T cells with ORF1p cytoplasmic granules (Fig. 1H). We next tested the effect of the R159H polymorphism in a cell culture retrotransposition assay. In this assay, an enhanced green fluorescent protein (EGFP) reporter gene reporter cassette, interrupted by a backwards γ-globin intron, is inserted in opposite transcriptional orientation into the 3' UTR of L1-RP (a highly active human L1 [86]). The EGFP reporter gene can be expressed from its own promoter only after the L1 is transcribed, the γ-globin intron is removed by splicing, the L1-reporter cassette hybrid transcript is reverse-transcribed, and its cDNA inserted in the genome [84,87]. The R159H mutation abolished cell culture retrotransposition to levels similar to that observed for an L1 containing the ORF1p JM111 mutation that cannot form a functional L1 ribonucleoprotein (RNP) complex [88] (Fig. 1I).
Finally, we considered the possibility that the abundance of R159 polymorphisms might be due to a CpG dinucleotide methylation hotspot. Following genome bisulfite conversion, PCR amplification, cloning of the amplicons and Sanger sequencing, we queried the methylation status of nine CpGs within a 436-nt stretch (1169-1604) of ORF1 surrounding the R159 codon. Although CpGs were methylated (16 to 64%), we observed no preference for methylation at the ORF1 R159 codon (Additional file 1: Figure S2B).
Thus, L1 ORF1 polymorphisms can alter not only retrotransposition efficiency but also ORF1p aggregation dynamics for a subset of L1s abundant in the human genome.

LINE-1 ORF1 protein concentrates in nuclear aggregates
Previously we showed that both overexpressed and endogenous L1 ORF1p are not only cytoplasmic but colocalize with nucleoli of a subset of cells ( Fig. 2A). Overexpressed ORF2p also enters nucleoli [76]. Here we report that exogenously expressed GFP-tagged human ORF1p also strongly concentrates at the nuclear membrane and forms small discrete perinucleolar foci in 5% or fewer of human osteosarcoma U2OS or HEK 293T cells. These cells show an attendant reduction in the size and number of cytoplasmic granules (Fig. 2B,C). Consistently, endogenous ORF1p nuclear foci are also seen in a small fraction of 2102Ep cells (Fig. 2D); presence of the foci in nuclei was confirmed by z-series confocal imaging. Recently, De Luca et al. [79] also showed in human melanoma cells both endogenous ORF1p and ORF2p in nuclear puncta that partially colocalized.
Using the MS2-NLS-GFP detection protocol [89], we previously reported that overexpressed Alu SINE RNA forms small distinct nuclear foci that partially associate with coiled (Cajal) body marker proteins [72]. Coiled bodies are nuclear non-membrane RNP suborganelles involved in the processing of non-coding RNAs and have been linked with the rare motor neuron disease spinal muscular atrophy (SMA) [90]. In our present experiments, we show that in the minor percentage of HEK 293T cells that form ORF1p nuclear foci, these foci closely colocalize with coexpressed MS2-tagged Alu RNA (Fig. 2E, detected here by fluorescent in situ hybridization (FISH)). Thus, ORF1 protein and Alu RNA may directly interact in the nucleus. SVA SINE RNA expressed from plasmid pcDNA SVA SPTA1 -MS2 is mainly cytoplasmic but also forms nuclear foci [72]. Interestingly, these foci do not colocalize with ORF1-EGFP foci, despite the fact that both Alu and SVA RNAs depend upon L1 for their retrotransposition and insert in the genome by a common mechanism (Fig. 2F). As previously reported, L1 RNA failed to form nuclear foci in our experiments [61,72].
Certain neurodegenerative conditions, including myotonic dystrophy, fragile X-associated tremor ataxia syndrome and spinocerebellar ataxias, are associated with genes that undergo long simple repeat expansion ORF1p is mostly cytoplasmic where it concentrates in granules and occasionally at the nuclear membrane. It is faintly seen in the nucleoplasm and concentrates in nucleoli of a subset of cells. b, c Exogenously expressed EGFP-tagged ORF1p strongly concentrates at the nuclear membrane and in perinucleolar foci of 5% or fewer human (b) U2OS or (c) HEK 293T cells with attendant reduction in size and number of cytoplasmic granules. Cotransfected mCherry-PSP1 marks nuclei and is excluded from nucleoli. d Endogenous ORF1p detected by α-4H1-ORF1 also forms discrete nuclear foci in a minor percentage of 2102Ep cells. Selected foci are enlarged in panels to the right. e Alu RNA, tagged with six MS2 coat protein recognition stem loops and expressed from construct pBS 7SL Alu-MS2 (Ya5), was detected by FISH using a Cy3-tagged DNA probe to the MS2 stem loops. Alu RNA colocalizes with nuclear foci marked by EGFP-tagged ORF1p in HEK 293T cells. f Nuclear foci of MS2 stem loop-tagged full-length SVA RNA detected by the Cy3-MS2 DNA probe do not colocalize with foci marked by ORF1p-EGFP. g RNA having 31 tandem G4C2 repeats detected by FISH using a Cy3-conjugated (C4G2) 4 DNA probe induces intense intranuclear or cytoplasmic RNA aggregates that colocalize with ORF1p-EGFP in a minor percentage of HEK 293T cells (nuclear granules are marked by small arrows and cytoplasmic granules by large arrows). Size bars are 10 μm mutations. RNAs transcribed from these mutant genes accumulate in nuclear foci [91]. A pathogenic GGGCC (G4C2) hexanucleotide expansion in intron 1 of the C9orf72 gene is the most common mutation associated with both ALS and FTD [92,93] and is implicated in Huntington's disease [94]. Mutant C9orf72 gene transcripts form toxic RNA foci in affected neuronal cells and are associated with the disease pathology [92,95]. We transiently coexpressed in HEK 293T cells ORF1p-EGFP together with C9orf72 RNA having 31 tandem G4C2 repeats, the latter detected by RNA FISH using a Cy3-conjugated (C4G2) 4 probe [96]. As with Alu SINE RNAs, ORF1p-EGFP granules directly overlapped or juxtaposed with G4C2 31 RNA granules in nuclei and cytoplasm of some cells (Fig. 2G).
Thus, being a promiscuous RNA-binding protein, it is possible that L1 ORF1p is able to bind and sequester many cellular RNAs in granules present in both the cytoplasm and nucleus.

The control of LINE-1 ORF1p nuclear localization in 2102Ep cells
Several studies have reported that cell division facilitates efficient retrotransposition, citing a failure of L1s to retrotranspose in cultured primary and tumor cells blocked at G 0 phase but disagreeing on the extent of retrotransposition loss in G 1 /S-arrested cells (reporting a 3-fold to 10-fold decline; [97][98][99]). Mita et al. [100] recently reported that cell culture retrotransposition occurs preferentially in S-phase. On the other hand, we previously showed significant retrotransposition in non-dividing neuronal cells differentiated from hESCs [23], and similar data was earlier observed in transformed cell lines [97]. Since ORF1 protein is essential for active retrotransposition [84], we chose to examine two mechanisms postulated to control ORF1p subcellular localization, cell cycle and active nuclear export.
2102Ep cells are nullipotent, manifesting a stable phenotype; indeed, they are used as a reference to characterize newly derived hESC lines [69]. However, we noticed considerable variation in the percentage of 2102Ep cells showing obvious nucleolar localization of endogenous ORF1p when examining clusters of cells across a single slide using immunofluorescence (IF) and the α-4H1-ORF1 antibody (between 0.6% and 36% of total cells randomly examined in three separate experiments). Less densely clustered cells more frequently showed ORF1p nucleolar concentration. Germline tumor cells, including 2012Ep cells, are altered for cell cycling by cell-to-cell contact inhibition or serum depletion [101], and we therefore wondered if concentration of ORF1p in the nucleus might relate to cell cycle status.
Accordingly, we seeded 2102Ep cells at low densities and blocked G 1 /S phase transition using aphidicolin, an inhibitor of DNA polymerase α, or hydroxyurea, which inhibits ribonucleotide reductase causing a loss of deoxyribonucleotides [102,103], and examined effects on endogenous ORF1p nucleolar localization. Cell cycle blockage was confirmed by propidium iodide staining followed by flow cytometry (Fig. 3A). Blocking the cell cycle at G 1 /S had no significant effect on the average percentage of cells with endogenous ORF1p nucleolar localization (Fig. 3B), or on nuclear-cytoplasmic levels following cell fractionation and Western blotting (Fig.  3C). For both treated and non-treated 2102Ep cells, Western blotting showed a major amount of endogenous ORF1p in the nuclear fraction, in agreement with Sokolowski et al. [104] who reported nuclear fraction concentration of plasmid-expressed ORF1p in human HeLa and mouse NIH3T3 cells. However, it is likely that some of the ORF1p we detect in the nuclear fraction is due to copurification of insoluble ORF1p cytoplasmic aggregates [105]. Significantly, however, the amounts of ORF1p detected by Western blotting in both cytoplasmic and nuclear fractions remained unaltered by cell cycle blockage.
We next stained untreated 2102Ep cells with antibodies to chromatin licensing and DNA replication factor 1 (CDT1), a G 1 phase nuclear protein lost after initiation of S phase [106], or Geminin (GMNN), a protein expressed only in S/G2/M phases [107], and then examined cells for L1 ORF1p nucleolar concentration. Immunocytochemistry showed that both CDT1 and Geminin marked 2102Ep cells with or without endogenous ORF1p visible in nucleoli (Fig. 3D, E). However, a majority of cells showing nucleolar ORF1p failed to stain with CDT1, while the opposite was true for Geminin ( Fig. 3F). This suggests partial nucleolar exclusion of ORF1p during G1 phase but without stringent cell cycle control. Our results in part contradict a recently published observation [100] that overexpressed LINE-1 ORF1p is nuclear in HeLa cells expressing CDT1 (G1 phase) and almost completely cytoplasmic in cells expressing Geminin (see Discussion).
Next, we considered if LINE-1 ORF1p shuttles between the nucleus and cytoplasm, as is the case with TDP-43 and some other prion-domain RNA-binding proteins associated with neurodegenerative diseases [108]. Shuttling proteins often contain nuclear export signals (NESs), consisting of a short stretch of hydrophobic leucine-rich residues [109]. We previously reported that subcellular localization of overexpressed GFP-tagged ORF1p in HEK 293T cells was unaltered by leptomycin B (LMB), a chemical inhibitor of the chromosomal region maintenance 1 (XPO1/CRM1) nuclear export pathway [76]. We now observed that treatment of 2102Ep cells with 55 nM LMB for 18 hours also had no obvious effect on endogenous ORF1p localization (Additional file 1: Figure S3A). On the other hand, controls revealed that LMB efficiently inhibited cytoplasmic export of endogenous cyclin B1, which contains an NES responsive to CRM1 (Additional file 1: Figure S3B) [110], as well as a GFP-tagged phosphorylation mimetic mutant of MAPKAP kinase 2 (MK2-mut T205/317E) that remains in the cytoplasm once exported from the nucleus (Additional file 1: Figure S3C) [111].
Previously, we fused a suspected ORF1p leucine-rich NES (aa 87-93, LKELMEL) and linker to the C-terminus of EGFP. While a functional NES should cause EGFP, which is normally both cytoplasmic and nuclear, to become more cytoplasmic [112], we failed to observe increased concentration of EGFP-LKELMEL in the cytoplasm [76]. For the present study, we used the NetNES 1.1 Server [113] to predict a second NES site at the C-terminus of ORF1p (ORF1 aa 313-321, LKELLKEAL). Fusing this sequence to the N-terminus of EGFP also failed to alter distribution of EGFP (Additional file 1: Figure S3D). Moreover, altering the sequence to encode LKEAAAAAL in construct ORF1-EGFP-L1-RP failed to visibly affect its ORF1p localization (Additional file 1: Figure S3E).
In contrast to our results, Mita et al. [100] reported a 20 to 35% increase in nuclear retention of exogenous ORF1p overexpressed in HeLa cells treated with LMB. While we failed to detect NES sequences in ORF1p or obvious sensitivity to the CRM1 export pathway, we cannot exclude the possibility that LMB causes nuclear ORF1p nuclear localization in 2102Ep cells is not strictly influenced by cell cycle status. a Cell cycle arrest was induced for 22 hours with 10 μg/ml aphidicolin or 3 mM hydroxyurea and confirmed by propidium iodide staining and flow cytometry. The percentages of cells in G 0 /G 1 , S or G 2 /M phases were determined using BD CELLQuest software (BD Biosciences). FL2-Area is plotted against cell counts. b Percentage of cells having visible nucleolar localization when not treated (NT) or treated with aphidicolin to induce cell cycle arrest at G 0 /G 1 phase. c Endogenous ORF1p was detected by Western blotting in both nuclear and cytoplasmic cell fractions left untreated (NT) or treated with aphidicolin (APH) or hydroyxurea (HU). Purities of nuclear and cytoplasmic fractions are shown using α-Lamin A/C and α-MEK1/2 antibodies, respectively. WCL: whole cell lysate. d,e) Immunofluorescence of 2102Ep cells showing that cells both with or without nucleolar ORF1p localization can express CDT1 or Geminin. f The percentages of untreated 2102Ep cells having visible ORF1p nucleolar localization that are marked (+) or unmarked (-) by α-CDT1 (red bars) or Geminin (green bars) staining. The data summarizes three replicate experiments with at least 400 cells scored for each experiment. Statistical significance was calculated by Student's t-test (** p<0.01; *** p<0.001) retention of minor amounts of endogenous ORF1p not visibly obvious in our system. Also, despite previous reports of attenuated cell culture retrotransposition following G 1 /S phase arrest, our results suggest this is not due to failure of ORF1p RNPs to enter nuclei, at least in 2102Ep cells, which are known to accommodate cell culture retrotransposition [114]. Moreover, despite the previous suggestion that nuclear membrane breakdown is required for nuclear entry of L1 ORF1p [100] this does not appear to be the case for this cell line.

ALS-related protein mutants colocalize with ORF1p in cytoplasmic granules
To date at least 25 genes have been linked to ALS [43,53]. The first ALS gene discovered, superoxide dismutase (SOD1) [115], is mutated in about 20% of familial cases. C9orf72 is by far the most frequent gene accounting for about 35% of fALS, 25% of fFTD, and 6% of sALS cases [92,93,116]. RNA-binding protein FUS (FUS) and TARDBP mutations each account for about 4% of fALS cases. Other ALS-associated genes, including alsin (ALS2), angiogenin (ANG), heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1), optineurin (OPTN), sequestosome-1 (SQSTM1/P62), ubiquilin 2 (UBQLN2), TANK binding kinase 1 (TBK1), valosin-containing protein (VCP) and VAMP-associated protein B and C (VAPB) among others, account for only a small percentage of cases so confounding treatment strategies. ALS animal models of neurodegeneration have mostly examined the toxic effects of overexpressing disease-related aggregation-prone proteins. Mutants of several ALSassociated RNA-binding proteins are known to shift localization from the nucleus to the cytoplasm and form RNA foci in the disease state [117,118].
Previously, in a yeast two-hybrid screen we identified FUS protein as an ORF1p interaction partner, and we confirmed that the two wild-type proteins colocalized in cytoplasmic granules of a minor percentage of stressed human nTERA-2 embryonal carcinoma cells [60]. Here we generated in-house or obtained from other sources tagged constructs for selected ALS disease-associated mutants and transfected these in HEK 293T or 2102Ep cells in the presence of EGFP-tagged ORF1p or endogenous ORF1p alone. Various ALS-associated mutant but not wild-type FUS, TDP-43, and SOD1 proteins strongly colocalized with ORF1p in granules of unstressed cells ( Fig. 4A-C). When cellular oxidative stress was induced by application of sodium arsenite, endogenous TDP-43 formed numerous cytoplasmic granules that strongly colocalized with endogenous ORF1p in most cells (Fig. 4D). Wild-type TDP-43 is known to form SGs in ALS neurons in response to cellular stress [119][120][121].
HNRNPA1 and HNRNPA2B1 are prion domain proteins that bind TDP-43 and have been linked with some ALS cases [122,123]. Both proteins bind the L1 RNP, and HNRNPA1 colocalizes with ORF1 in SGs, as previously reported ( [60,124]; Fig. 4E). Wild-type TIA1, recently found mutated in cases of ALS and FTD [125], also strongly colocalizes with ORF1p in stressed cells as noted above (Fig. 1B). However, some ALS-related proteins, including OPTN and ANG (Fig. 4F, G), fail to colocalize in the same granules with ORF1p.
Expanded hexanucleotide repeats within transcripts of the C9orf72 ALS gene can undergo non-conventional repeat-associated non-ATG (RAN) translation and generate dipeptide repeats that form inclusions in cerebellum, neocortex, and hippocampal neurons of C9 patients and toxic cytoplasmic aggregates in cultured neuronal cells or Drosophila models ( [126][127][128][129], reviewed in [130]). To determine if these aggregates also colocalize with those of L1 ORF1p, we coexpressed in cultured cells a C9orf72 RAN translation product consisting of 50 GA repeats tagged with EGFP [131] and full length L1 with FLAG-HA-tagged ORF1. However, while overexpressed dipeptide proteins formed one to three large cytoplasmic aggregates in each cell, these did not colocalize with and generally excluded ORF1p (Fig. 4H).
Thus, a subset of RNA-binding proteins mutated in ALS bind and colocalize with L1 ORF1p RNP in cytoplasmic RNA granules.

Overexpression of some ALS-associated proteins inhibit cell culture retrotransposition
We previously showed that retrotransposition occurs in non-dividing mature neurons [23]. Here we extended these analyses and tested retrotransposition in two cell lines often used to study neurodegeneration, human SH-SY5Y neuroblastoma and mouse NSC-34 neuronal cells, the latter a hybrid line generated by fusing spinal cord motor neurons with neuroblastoma cells (Fig. 5A). Due to inefficient plasmid transfection, we infected cells with the adenovirus-retrotransposon hybrid virus, A/ RT-pgk-L1RP-EGFP (Ad-L1) [23,97]. This viral construct contains L1-RP tagged with the EGFP retrotransposition reporter cassette. Retrotransposition detected by flow cytometry was 1.1% and 0.6% of gated SH-SY5Y and NSC-34 cells, respectively. Thus, both primary and transformed neuronal cell lines are competent for retrotransposition.
We next asked if ALS-related proteins alter L1 retrotransposition in the cell culture assay described above [84,87]. Briefly, we transfected HEK 293T cells with the retrotransposition reporter construct 99-PUR-RPS-EGFP together with constructs expressing tagged ALS-related proteins. 99-PUR-RPS-EGFP includes full-length L1-RP with the EGFP reporter cassette in its 3' UTR cloned in a modified version of pCEP4 vector (Invitrogen) lacking a cytomegalovirus (CMV) promoter. All constructs were expressed in HEK 293T cells (Fig. 5B, top). At least 3 biological replicates were performed. Test proteins did not cause significant cell death during the course of the experiment as determined by trypan blue exclusion staining (Additional file 1: Figure S4A). Three out of 14 proteins tested, including SQSTM1, TDP-43, and TBK1 kinase, reduced cell culture retrotransposition 50% or more when compared with cells transfected with empty vector only as control (Fig. 5B, bottom). SQSTM1/P62 is an autophagy receptor that targets bound proteins for selective degradation. Autophagy has previously been linked with retrotransposon restriction, and it was shown that SQSTM1 colocalizes with L1 RNA in stress granules, and that its knockdown causes increased accumulation of L1 and Alu RNAs and genomic insertions in cultured cells [71]. Autophagy misregulation has also been linked with numerous neurodegenerative disorders, including ALS.
TDP-43 is a multifunctional RNA-binding protein with roles in mRNA transcription, translation, transport, splicing, and stability [132][133][134]. Studies in model organisms have shown that overexpression of wild-type TDP-43 mimics loss-of-function phenotypes of neurodegeneration and motor dysfunction [135,136]. Several other studies have considered how endogenous TDP-43 levels affect expression of TEs but with inconsistent results ( [49,51,52,[137][138][139]; see discussion). In the HEK 293T cell culture retrotransposition assay, ectopic expression of a Exogenously expressed wild-type 3XFLAG-tagged FUS protein is nuclear and only rarely colocalizes with ORF1-EGFP in the cytoplasm, but some mutants strongly overlap in cytoplasmic granules of unstressed HEK 293T cells. b GFP-tagged wild-type TDP-43 is mostly nuclear with only faint colocalization with endogenous ORF1p in cytoplasic foci of some unstressed cells (arrows, top panels). However, mutations to the TDP-43 NLS (K82/84A and A90V) and some ALS-associated mutations (for example, A315T) show colocalization with ORF1p in cytoplasmic granules. c Cherry tomato-tagged wildtype SOD1 protein is diffusely cytoplasmic (top), but some ALS mutants are present with ORF1p in cytoplasmic foci of 293T cells. d Endogenous TDP-43 strongly colocalizes with endogenous ORF1p in cytoplasmic granules of Na-arsenite stressed but not untreated 2102Ep cells. ORF1p foci are much increased in size in stressed cells. e Red fluorescent protein (RFP)-tagged hnRNPA1 colocalizes with ORF1-EGFP cytoplasmic granules in unstressed HEK 293T cells. f, g Cytoplasmic granules formed by exogenously expressed OPTN or ANG do not colocalize with ORF1-EGFP granules. h) ORF1p is generally excluded from GFP-(GA) 31 dipeptide aggregates. The full-length L1 construct pc-L1-1FH expresses ORF1p with HA-FLAG tags. Size bars are 10 μm TDP-43 with an N-terminal Myc-tag inhibits L1 retrotransposition over 90% (Fig. 5B). As this was the ALS-related gene that most altered retrotransposition levels, we next sought to characterize TDP-43 effect on L1 activity in more detail. To determine if ORF1p and TDP-43 interact, we co-expressed a construct containing L1-RP with T7-tagged ORF1 (ORF1-T7-L1RP) and TDP-43 with a C-terminal FLAG-tag: the two proteins co-immunoprecipitated on α-FLAG agarose (Fig. 5C). This association was RNA-dependent and was lost upon treatment with RNase, similar to almost all other proteins previously identified within the L1 ORF1p RNP [73,124,140,141].
Over-expression of TDP-43 is toxic to neurons and cell toxicity has been associated with increased cytoplasmic mislocalization of some TDP-43 mutant proteins [142,143]. We therefore thoroughly tested for TDP-43-induced toxicity of HeLa or HEK 293T cells by three methods: 1) comparison of the effect of TDP-43 A C D B Fig. 5 Increasing expression of some ALS-associated proteins alters retrotransposition in cell culture assays. n=number of biological replicates. a L1-RP is retrotransposition-competent in human SH-SY5Y neuroblastoma and mouse NSC-34 motor neuron-like cells. Cells in 6-well plates were infected with A/RT-pgk-L1RP-EGFP (Ad-L1) L1-reporter adenovirus [97] at about 8 × 10 12 viral particles/ml. Flow cytometry analysis was performed at 9 days post-infection. b 99-PUR-RPS-EGFP was co-transfected in HEK 293T cells with empty vector (pcDNA3) or test constructs expressing tagged ALS-related proteins. Five days later, percentages of EGFP-positive cells were determined by flow cytometry. Each plasmid pair was transfected in four replicate wells, with at least 3 replicate experiments performed for each construct. Results are normalized to pcDNA3 vector control (lighter bar). Statistical significances compared with vector control were calculated by Student's t-test (* p<0.05; ** p<0.01; *** p<0.001). All test proteins were expressed as confirmed by Western blotting of whole cell lysates using α-DYKDDDDK (FLAG)-tag, α-Myc-tag, or α-V5-tag antibodies as indicated (top). Four-fold less V5-SQSTM1-and three-fold more V5-TBK1-transfected lysates were loaded on the gel. Full-length TBK1 expressed poorly, most of the protein existing as a high molecular weight smear. c FLAG-tagged TDP-43 co-immunoprecipitates T7-tagged L1 ORF1p complexes from HEK 293T cell lysates after α-FLAG-M2 affinity gel purification. Interaction is lost following treatment of lysates with RNase. d Expression of V5-or Myc-tagged TDP-43 strongly inhibits mouse IAP element retrotransposition in HeLa-JVM cell culture. Cells were treated with neomycin (G418) to select for retrotransposition events. Colony counts are not normalized. On the right are representative T 75 flask images with Giemsa-stained IAP retrotransposition-positive colonies in the absence or presence of TDP-43. The apparent diminished effect on retrotransposition of Myc-TDP-43 compared with TDP-43-V5-WT is likely because its plasmid backbone does not replicate and is diluted out of cells during the course of antibiotic selection which spans a couple of weeks overexpression on constitutive expression of antibiotic resistance in HeLa cells (Additional file 1: Figure S4B, C), 2) trypan blue staining for cell viability in HEK 293T cells (Additional file 1: Figure S4D), and 3) MultiTox-Fluor Multiplex Cytotoxicity Assay kit (Promega) analysis in HEK 293T cells (Additional file 1: Figure S4E). Overexpression of TDP-43 had no significant effect on cell viability during the time course of our assays, indicating the drop in retrotransposition efficiency is not a reflection of cellular toxicity.
We next tested if overexpression of TDP-43 might also inhibit the mobilization of LTR TEs. Human endogenous retroviruses are thought to be incapable of replication due to the presence of inactivating mutations in their ORFs [4]. However, mouse intracisternal A particle (IAP) LTR retrotransposons actively replicate and cause new mutations by insertional mutagenesis. Using an established cell culture assay [144], we found that in HeLa cells overexpression of C-terminal V5-or N-terminal Myc-tagged TDP-43 strongly restricted retrotransposition of an IAP element tagged with a neomycin phosphotransferase reporter cassette (Fig. 5D).
In a reciprocal assay, we next asked whether loss of endogenous TDP-43 affects L1 cell culture retrotransposition. We confirmed by Western blotting that two different siRNAs efficiently repressed endogenous TDP-43 protein when transfected in HEK 293T cells. However, TDP-43 depletion had no obvious effect on L1 retrotransposition, at least using the EGFP-based retrotransposition assay (Additional file 1: Figure S5A). We note that an inherent limitation of these assays is the transient nature of the siRNA-mediated protein depletion.
We also wondered if TDP-43 expression might affect the methylation status of the CpG island within the L1 5' UTR promoter [145]. We performed bisulfite conversion of genomic DNA from HEK 293T cells in which TDP-43 was either overexpressed (1 experiment; Additional file 1: Figure S6A) or depleted (2 independent experiments; Additional file 1: Figure S6B, C). PCR-amplified fragments containing the CpG island were cloned and at least 15 amplicons were sequenced for each sample [146]. Unexpectedly, when compared with controls, the 17 CpG residues of this region showed a significant overall increase in methylation in all experiments, although fully unmethylated sequences were found in all conditions. Therefore, one might speculate that perturbing steady-state TDP-43 protein levels alters DNA methylation status, a function for TDP-43 not to our knowledge previously reported. However, changes in L1 promoter methylation associated with TDP-43 expression were not accompanied by significant change in activity of either the L1 sense or antisense promoter in luciferase assays, at least using a plasmid-based assay (Additional file 1: Figure  S6D). Moreover, TDP-43 overexpression failed to alter levels of endogenous or ectopically expressed ORF1p in cell culture, as determined by Western blotting (Additional file 1: Figure S5B, C), nor consistently affected levels of endogenous L1 RNA in HEK 293T cells as detected by RT-PCR (Additional file 1: Figure S5D).
To complement these analyses, we next re-analyzed two available RNA-Seq datasets not previously examined for TE expression (Fig. 6A, B). The first study (SRA SRP057819) generated single-replicate paired-end 100-bp RNA-Seq data of control and TDP-43-depleted HeLa cells (using the same esiTARDBP siRNA shown in Additional file 1: Figure S5A) [147]. The second dataset (GEO GSE77702) includes single-end 50-bp RNA-Seq data (two replicates each) of wild-type human iPSC-derived motor neurons depleted by shRNAs of TDP-43, TAF15, FUS, or combined TAF15-FUS [148]. FUS and TAF15 are both members of the FET family of RNA-binding proteins, which are linked with several neurodegenerative disorders [149]. In the HEK 293T cell culture retrotransposition assay, overexpressing FUS had no effect, while TAF15 reduced retrotransposition by 38% (Fig. 5B).
To detect changes in TE expression, we plotted RPKM (Reads Per Kilobase of transcripts per Million mapped reads) values for a subset of mostly evolutionarily young primate-specific non-LTR retrotransposons, including L1, Alu and SVA, and LTR5_Hs (HERV-K(HML-2)) and LTR7Y (HERV-H) subfamilies. Study SRP057819 showed an increase (approximately 15%) in RPKM values for the TDP-43 knockdown (KD) versus control HeLa cell lines for L1s only (including L1PA2 and human-specific L1-Ta and L1-pre-Ta subfamilies) (Fig. 6A, left). However, there was a slight overall decrease in the percentage of retrotransposon (LINE, SINE, SVA, and LTR)-related RNA-Seq reads among the total number of mappable (gene and TE) reads in TDP-43 KD cells (Fig. 6A, right). For TDP-43-depleted motor neurons of study GSE77702, there was a modest but consistent increase in RPKM values for all retrotransposon subfamilies when compared with scrambled shRNA control (Fig. 6B, left). In addition, there was a modest increase from 7.6 to 9.8% in the percentage of retrotransposon RNA-Seq reads among total mapped reads (TEs and genes) for TDP-43 KD versus control motor neurons (Fig. 6B, right). A scatter plot, however, showed minimal change in the expression profile of all mapped TE subfamilies versus genes for the GSE77702 dataset (Fig. 6B, bottom).
We then analyzed the GSE77702 dataset with TEtranscripts [150], a software package that uses short-read alignment files to identify differentially expressed (DE) TE subfamilies listed in RepBase, a database of representative repeat sequences in eukaryote genomes [151,152]. A total of 192 retrotransposon subfamilies were expressed at significantly different levels in TDP-43 KD cells at an adjusted P-value (padj) <0.05 (Additional file 2: Table S1). However, only 55 retrotransposon TEs were significantly DE in the TDP-43 KD but in neither the TAF15 nor FUS KD datasets. Notably, only three DE retrotransposon TEs (HERVK3-int, MamGyp-int, and MER51D) were unique to TDP-43 KD cells and absent in the TAF15, FUS, and combined TAF15-FUS KD groups.
Therefore, and in contrast to some reports (see discussion), our analyses do not indicate a clear TDP-43-specific link with elevated activity of TEs, particularly LINE-1 retrotransposons. In fact, overexpression of wild-type TDP-43 strongly inhibits cell culture retrotransposition of both human L1 and mouse IAP elements.
Mutation of some ALS-associated proteins alters cell culture retrotransposition TDP-43 contains a NLS and NES, two RNA-recognition motifs (RRM1 and RRM2) that bind nucleic acids, and a C-terminal glycine-rich region that mediates protein interactions (Fig. 6C, top) [133]. A review in 2009 identified 70 pathogenic mutations in TDP-43, a majority in the glycine-rich domain [153]. We wished to determine if ALS-associated TDP-43 mutations might restore inhibition of retrotransposition by the wild-type protein, and so we tested the effects on L1 cell culture retrotransposition of a subset of mutant constructs. While all constructs expressed at levels similar to the wild-type, most mutations had no significant effect (Fig. 6C We also considered the effect of angiogenin mutations on cell culture L1 retrotransposition. ANG is a member of the pancreatic RNase A superfamily and a potent mediator of neovascularization, as well as being a host defense factor against some microorganisms [156] and an enhancer of motor neuron survival [157,158]. To date 33 ANG mutations have been implicated in ALS and Parkinson's disease [159]. Overexpression of V5-tagged ANG protein reduced cell culture retrotransposition to 62% of empty vector control without obvious cytotoxicity (Fig. 5B, 6D, S4C). We then introduced two disease-associated mutations known to abolish ANG RNase activity [159][160][161]. Notably, mutation H138R (H114R in the mature protein after signal peptide cleavage) had no effect, while H37R (H13R) restored retrotransposition to 87% when compared with vector-only control (Fig. 6D).
Similarly, we examined the effect of mutations in FUS protein. Exogenous expression of wild-type FUS had no effect on cell culture retrotransposition nor obvious cytotoxicity, but ALS-related mutations in its C-terminal NLS (R514G and H517Q) inhibited retrotransposition over 20% (Fig. 5B, 6E S4D, E). Finally we tested the effect of mutations in TBK1 on L1 retrotransposition. TBK1 is a member of the IKB kinase family, and an important player in innate immune signaling. Mutations in TBK1 also impair autophagy (153). Two mutants of TBK1, the kinase-dead mutant S172A and the ALS-associated mutant E696K (152) were tested, but neither showed any change from the 50% reduction of cell culture L1 retrotransposition caused by overexpression of the wild-type protein (Additional file 1: Figure S4F).
In sum, increased expression of some neurode generation-related proteins may decrease retrotransposon activity, while some disease-related mutations can modify these effects.

Retrotransposon expression in tissues of ALS patients and controls
To further determine if changes in expression of non-LTR class retrotransposons are associated with ALS, we performed RT-qPCR analyses of 108 bulk spinal cord and brain tissue samples of 38 ALS patients and 27 non-affected controls (Additional file 3: Table S2) according to methods described in [146]. We assayed 30 thoracic or cervical spinal cord samples (15 ALS, 15 controls), 16 cerebellum (9 ALS, 7 controls), 35 motor cortex samples (23 ALS, 12 controls), 19 occipital cortex samples (14 ALS, 5 controls), and 8 hippocampal samples, all of the latter from ALS patients. Most samples were from sALS patients or patients of unknown etiology; only 5 patients had a known gene mutation. RT-qPCR primer pairs targeted the ORF1 and ORF2 regions of the young human-specific and retrotranspositionally active L1Hs subfamily (Additional file 1: Figure  S7A, B) and two Alu subfamilies, AluS and AluY (Additional file 1: Figure S7C, D). Younger than AluJ elements, the AluS subfamily arose about 40 million years ago and may include some retrotransposition-competent elements [162,163]. AluY, the youngest lineage, has the most retrotranspositionally active elements, and many genetic disorders in humans have been generated by AluY insertions [162,164,165]. Only L1Hs (L1P1)-type L1s are known to be retrotransposition-competent in the human genome [12,166]. For the purposes of comparison, transcript levels were also determined for H9-hESCs [167]), human embryonic fibroblasts (HEFs), and HeLa cells.
Transcript levels were determined in duplicate for each brain and spinal cord sample, normalized to GAPDH internal control, and averaged. We considered all measurements of sample-specific transcript levels as real and did not omit possible outliers from analyses. Averaged RT-qPCR reactions within each experiment were normalized to expression of H9-hESCs as these cells strongly express endogenous L1 RNAs [168]; means and standard deviations are shown in Additional file 1: Figure S7. As previously observed, H9-hESCs expressed 5 to 25 times more L1 RNA than differentiated cultured cells such as HEFs or HeLa [146]. Expression levels of Alu and L1 element-related sequences detected in tissue samples were as high or higher than in H9 cells. Average expression in cerebellum was 2-to >3-fold higher than in other tissues for both Alu subfamilies and for L1s (except for ORF1 in occipital cortex); however, transcript levels in cerebellum and for L1 ORF1 in all brain tissue regions varied considerably between samples. Comparing expression of Alu and L1 elements in ALS versus unaffected controls, only expression of AluS elements in occipital cortex was significantly elevated for the 14 ALS versus 5 control samples (p=0.02) (Additional file 1: Figure S7C).
We next examined ORF1p expression by Western blotting of 60 brain and spinal cord tissue lysates (Additional file 1: Figure S8, Additional file 3: Table S2).
There are very few studies of endogenous L1 protein expression in the brain. Baudin de The et al. [22] detected L1 ORF1p in ventral mid-brain tissues of mouse. Using a commercial antibody, Moszcynska et al. [169] showed by immunocytochemistry putative ORF2p expression in several rat brain regions, although antibody specificity was not assessed. Sur et al. [170] detected ORF1p by immunohistochemistry of various brain regions, and antibody detection by Western blotting was confirmed for a single frontal cortex sample. Here, using Western blotting and the α-4H1-ORF1 antibody, we were, surprisingly, unable to detect an ORF1p band of appropriate size in frontal cortex, cerebellum, or hippocampal brain tissue samples, and only very faintly in some motor cortex samples, even when 50 μg of whole cell lysate was loaded in a well (Additional file 1: Figure S8A-D) and despite the detection of L1-related RNAs expressed in these tissue types by RT-qPCR (Additional file 1: Figure S7). In contrast, we could detect a very robust full-length ORF1p signal from an equal amount of 2102Ep cell protein lysate (Additional file 1: Figure S8). Distinct bands consistent in size with full-length ORF1p were observed in some spinal cord samples; bands of smaller size were also seen, including a robust 38 kD signal of unknown origin (Additional file 1: Figure  S8E). However, no overall differences in expression of ORF1p were evident in ALS compared with control spinal cord samples. Testing two different antibodies showed that failure to detect ORF1p signal in the brain was not limited to the α-4H1-ORF1 antibody (Additional file 1: Figure S8F, G).
In general, interpretation of TE expression from RT-qPCR data may be influenced by the presence of exonized TE-derived sequences in genes, the possible presence in the cell of non-integrated TE-derived cDNAs (see discussion below), and the cellular heterogeneity of the tissues analyzed. TE activation may occur in only a subset of cells within bulk tissue samples, so limiting sensitivity of detection, and in the case of motor neurons these cells may be progressively eliminated in the disease state. In sum, however, no major differences in TE expression where detected in ALS patients when compared with controls.

Retrotransposon expression in ALS RNA-Seq datasets
Prudencio et al [171] generated a paired-end total RNA-Seq data set (GSE67196) from cerebellum and frontal cortex samples of 9 healthy, 8 C9orf72-associated ALS (C9ALS), and 10 sALS individuals and analyzed these for differentially expressed genes. A subsequent reanalysis of the same datasets using the HOMER analy-zeRepeats program revealed significantly increased global expression of repetitive element types in frontal cortex but not cerebellum of C9ALS compared with sALS patients and healthy controls [52]. Setting FDR<0.1, the authors reported 300 DE TE subfamilies in the C9ALS samples: LTR class elements predominated (46%), followed by DNA elements (19%) and LINEs (18%). Notably, 91% of significant C9ALS DE repetitive elements had increased expression.
We sought to assess further the degree to which TEs are differentially expressed in sALS-associated tissues by using TEtranscripts [150] to analyze two RNA-Seq datasets, SRP064478 and GSE76220, both publicly available in sequence read archives and neither previously examined for repeat expression (see also the Methods section). SRP064478 includes paired-end 150-bp sequence derived from total RNA of 7 sALS and 8 healthy control post-mortem cervical spinal cord samples. No significant difference in the percentage of averaged retrotransposon-derived reads among the total number of mappable reads (TEs and genes) was detected for ALS vs control samples; 1.72 to 3.41 million sample reads mapped to retrotransposon subfamilies (Fig. 7A, left; Additional file 2: Table S1). Only one significant DE retrotransposon subfamily was detected for SRP064478 (padj<0.05; Additional file 2: Table S1). However, overall mapping efficiency was low: about 25 million reads were alignable to the genome and 30% of these mapped to mitochondrial genes.
The GSE76220 dataset consists of single-end 50-bp sequence of total RNA isolated from laser-capture microdissected lumbar spinal cord sections of 13 sALS patients and 8 control individuals [172]. There was no significant change in the percentage of retrotransposon reads among total mappable reads in the sALS vs control samples (Fig. 7A, right; Additional file 2: Table S1). Between 0.27 and 1.26 million sample reads mapped to retrotransposons. TEtranscripts detected only four significant DE retrotransposon subfamilies (Additional file 2: Table S1).
We then reanalyzed the GSE67196 RNA-Seq dataset of Prudencio et al. [52,171] using TEtranscripts. Significant increases in retrotransposon reads as a percentage of total mapped reads were seen for frontal cortex C9ALS vs control (p=0.04) and C9ALS vs sALS (p=0.01) samples (between 0.24 and 0.61 million sample reads mapped to retrotransposons) (Fig. 7B). As expected, multidimensional scaling (MDS) plots showed weak clustering of C9ALS samples in frontal cortex but not cerebellum samples (Fig. 7C). TEtranscripts detected no significant DE TEs in cerebellum samples of the GSE67196 dataset (padj<0.05). In the case of the frontal cortex samples however, and supporting Prudencio et al. [52], there were 3 DE TEs (DNAs, LTRs, LINEs, SINEs, and SVAs) in sALS vs controls, 10 DE TEs in C9ALS vs controls, and 133 DE TE subfamilies in C9ALS vs sALS samples, all increased in expression and including 36% LTR, 32% DNA, 15% LINE, and 17% SINE elements (Additional file 2: Table S1).
We also analyzed for the first time TE expression for the NeuroLINCS dbGaP Study phs001231 (SRP098831). This dataset consists of poly(A)+ non-stranded mRNA of iPSC-derived motor neurons from 4 C9ALS and 3 SMA patients (3 sequencing replicates each) and 3 unaffected controls (2 or 3 replicates each). Transcripts of some TE types are not polyadenylated and so are likely underrepresented in this dataset following poly(A)+ selection. However, although Alu elements are transcribed by RNA polymerase III and not polyadenylated, they contain both internal and 3'-end poly(A) stretches guaranteeing capture of their transcripts. An MDS plot showed C9ALS samples clustered away from SMA and control samples, while clustering of SMA from control samples was less evident (Fig. 7D, left). There was a significant increase (p=0.02) in the percentage of retrotransposon reads among total mappable reads in the C9ALS vs control dataset (Fig. 7D, right). The increase was also significant for C9ALS vs SMA (p=0.002) but not significant for SMA vs control samples (p=0.46) (not shown). TEtranscripts analysis showed that at padj<0.05, significant DE TE subfamilies (DNAs, LTRs, LINEs, SINEs, and SVAs) numbered 536 for C9ALS vs controls, 232 for C9ALS vs SMA, and 304 for SMA vs controls, most TEs being increased in expression (Additional file 2: Table S1). Three SVA and 30 Alu TEs were upregulated for C9ALS vs controls, including 6 AluY subfamilies. (The human-specific L1Hs/L1P1 subfamily was not detected). Interestingly, a recent literature review noted at least 37 neurological and neurodegenerative disorders linked with misregulated Alu retrotransposon activity [173]. A caveat of this dataset analysis is that sample numbers were small. Algorithms such as Homer and TEtranscripts map sequencing reads to TE consensus sequences only and locus-specific information is lost. The ability to map individual transcribed retrotransposons to their source loci can reveal (i) the particular loci that contribute to repeat family transcription differences between diseased and healthy states, (ii) the coding capacity of transcribed repeat loci of possible relevance for a specific disease, and (iii) potentially variant retrotransposon proteins and RNAs that should be considered when studying disease relevance. We therefore applied a recently developed locus-specific TE mapping pipeline (PT, EP, DT, unpublished data), as described in the Methods section, to reanalyze the GSE67196 data set [171]. The numbers of mapped loci are summarized in Additional file 1: Figure  S9A. Principal component analysis (PCA) and heatmap plots again showed cerebellum TE expression to be as variable within C9ALS, sALS, and control groups as between groups and without significant clustering (Additional file 1: Figure S9B, C, left). However, as expected, clustering of frontal cortex C9ALS samples distinct from control and sALS samples was evident (Additional file 1: Figure S9B, C right). Supporting Prudencio et al. [52], the greatest number of DE TE loci (determined as having padj<0.05 and greater than 2-fold differential transcription) were identified for C9ALS vs sALS (3963 loci), followed by C9ALS vs controls (652 loci), and sALS vs controls (109 loci). However, these DE TE loci comprised only 1.8%, 0.3%, and 0.06%, respectively, of a total of 2.12 x 10 5 TE loci mapped (Additional file 4: Table S3). Caveats of this type of analysis should be noted. For example, most of the significant DE L1 loci were likely not transcribed from their own promoters, since 1) almost 95% of those mapped were less than 5600 bp in length and so lacked much of their 5' UTRs, and 2) 70% of DE L1s were within genes, and so may be transcribed as part of a longer gene transcript. Moreover, only 2 younger primate-specific L1P1 and 4 L1PA2 L1 loci were differentially expressed (among a total of 164 L1P1 and 135 L1PA2 individual elements mapped). As for DE Alu loci, 81% were within genes (98% of all Alu loci being upregulated). Furthermore, most mapped Alu loci were older elements, with only 5% of them AluY subfamily members, a bias likely due to the inability of currently available algorithms to confidently map short sequence reads to young highly similar TEs. In general, designing RNA-Seq analysis pipelines that efficiently map short sequence reads of young highly similar TEs to their source loci has to date been difficult for reasons discussed below.
In summary, RNA-Seq analysis of the SRP098831 Neu-roLINCs dataset suggests widespread upregulation of TE sequences from numerous subfamilies in C9orf72 ALS patients, as previously reported for the GSE67196 dataset [52]. However, additional locus-specific analysis of the GSE67196 dataset suggests that many loci contributing mappable reads were not autonomously transcribed from their own promoters and were likely part of longer gene transcripts. More detailed transcription analyses targeting a selected cohort of full-length intact intergenic TE loci are needed to validate misregulation of retrotransposon expression in C9orf72-associated ALS disease.

Discussion
Self-aggregation of RNA-binding proteins is a leitmotif of neurodegenerative diseases, including amyotrophic lateral sclerosis. The ORF1 protein of LINE-1 retrotransposons is also an aggregation-prone RNA-binding protein. Of the approximately 500,000 L1s in the human genome, about 5000 are full-length, or about one percent of DNA [10,174]. Many of these L1s have the potential to be transcribed and translated, although different tissues may express different L1s [175]. We speculate that misregulation of even a small number of these, leading perhaps to mislocalization and augmented aggregation of ORF1p, could have negative effects on some cells, including neurons. In this study, to increase our understanding of the role of ORF1p in disease, we extended previous investigations of its subcellular localization and aggregation properties. We then considered potential interactions of amyotrophic lateral sclerosis-related proteins and the ORF1p RNP and the possibility of misregulated L1 activity in the ALS state.
Analogous to gene products associated with certain neurodegenerative diseases, L1 ORF1p RNPs are prone to forming cytoplasmic RNA granules. In unstressed cells of some lines, ORF1p constitutively forms cytoplasmic granules that are only faintly and partially marked by canonical stress granule proteins. As shown in Fig. 1, stress to the cell increases both the size of ORF1p cytoplasmic aggregates and colocalization with SG proteins, and deleting a Q-N-rich region of human ORF1p abolishes aggregation. Furthermore, we showed here that L1 elements with a variant ORF1 R159 codon, a residue that controls both retrotransposition and the ability of ORF1p to seed cytoplasmic RNA aggregates, are common in the human genome. Thus, cell stress promotes and certain sequence polymorphisms alter cytoplasmic aggregation of L1 ORF1p.
A functional role for L1-associated cytoplasmic RNA granules in retrotransposition remains unknown. This begs the question, what are these constitutively expressed ORF1p aggregates? ORF1p fails to associate with Golgi, lysosome, or endoplasmic reticulum marker proteins [60,176,177]. Endogenous ORF1p aggregates in unstressed cells occasionally abut P-bodies but generally do not overlap ( [60]; Additional file 1: Figure S1F). Guo et al. [71] found that exogenous and endogenous ORF1p colocalized with autophagosome marker LC3 protein in HEK 293T cells, an association that increased with inhibition of autophagy. We here confirmed that endogenous ORF1p granules of unstressed 2012Ep cells are also partially marked by red fluorescent protein (RFP)-tagged LC3 (Additional file 1: Figure S1G), but we failed to detect their colocalization with endogenous autophagy marker proteins ATG12 or ATG16L1 (Additional file 1: Figure S1H, I). It has also been reported that ORF1p co-IPs and colocalizes in some cytoplasmic granules with IGF2BP1/IMP1 [124], part of a multi-protein complex found in granules of neuronal axons [178][179][180]. IMP1 granules have been reported as distinct from P-bodies and stress granules [181]. It is therefore possible that ORF1p aggregates in more than one type of cytoplasmic structure.
Endogenous ORF1p may concentrate as perinuclear, nuclear, or nucleolar. We showed in Fig. 2

that in some cells
ORF1p also forms small nuclear foci distinct from nucleoli; ectopically expressed Alu and likely other RNAs colocalize with these foci. Other studies have also reported endogenous ORF1p nuclear localization, specifically in some human cancer cell lines and tissues [60,76,77,79,182], and murine germline, chloroleukemia, and cardiomyocyte cells [183][184][185][186][187]. Why ORF1p is cytoplasmic in some cells and nuclear in others is unclear but suggests a dynamic aspect of ORF1p biology that is starting to be appreciated in the retrotransposon field [100].
We therefore examined the cell cycle as one possible mechanism controlling ORF1p subcellular localization. Blocking 2102Ep cell cycling at G 1 /S phase transition did not obviously alter ORF1p nuclear localization. Non-blocked cells showed significant concentration of ORF1p in nucleoli, whether the cells were in G1 phase or not. Our results in part contradict a recent study proposing a strong cell cycle bias for ORF1p accumulation in the nucleus during mitosis where it remains during G1 phase [100]. This discrepancy may in part be due to the fact that we queried strictly nucleolar accumulation as the most obvious feature of ORF1p nuclear localization in 2102Ep cells; failure to observe ORF1p in nucleoli does not necessarily preclude its diffuse presence in the nucleoplasm. Also, the Mita et al. study [100] tested a different cell line (HeLa) and ORF1p overexpressed from plasmids, while here we examined endogenous ORF1p localization as more biologically relevant. Indeed, 2102Ep cells mimic early human embryogenesis, where heritable L1 insertions accumulate [69,168], while HeLa cancer cells mimic L1 activity in human cancers. While both cellular niches are known to support L1 retrotransposition, it stands to reason that differences may exist with respect to L1 regulation. Thus, the mechanisms that control L1 ORF1p nuclear localization clearly require further investigation in a range of cell lines and with care paid to their growth conditions (for as noted above, the frequency of ORF1p nucleolar localization varies with 2102Ep cell density).
Previously, we showed that ORF1p point mutations and C-terminal and N-terminal deletions increased nuclear accumulation (see text and Supplemental data of [60]). Thus, maintaining the integrity of ORF1p structure seems to be important for cytoplasmic retention and aggregation. Furthermore, concentration of ORF1p at the nuclear membrane of some cells (Fig. 2), its reported RNA-dependent association with karyopherin subunit alpha 2 (KPNA2; [124]), and the detection by mass spectrometry of importin 7 (IPO7) within an ORF2p complex [141], suggest that L1 RNPs interact with the nuclear import machinery. Indeed, it was recently shown that loss of transportin 1 (TNPO1), the beta subunit of the karyopherin receptor complex, reduces nuclear localization of epitope-tagged ORF1p [188]. It was also recently proposed that ORF1p expression is required for nuclear ORF2p localization [141]. However, this is not supported by our earlier findings that ORF2p overexpressed alone efficiently enters nucleoli of human osteosarcoma 143B TK-cells; at that time we also mapped a functional nuclear localization signal to the N-terminus of ORF2p [76].
Might there be cellular consequences for misregulated expression or mislocalization of aggregation-prone LINE-1 proteins? L1 ORF1p is a promiscuous RNA-binding protein able to capture many cellular RNAs. Co-IP experiments with tagged L1 RNPs have identified numerous bound RNAs, including SVA and Alu SINEs and small non-coding RNAs of importance for the cell [124,189]. Direct co-IP experiments also confirmed over 60 proteins that associate with tagged L1 ORF1p RNPs, mostly in an RNA-dependent manner [60,73,124,190]. Among these were several RNA-binding proteins associated with ALS and FTLD, including FUS, HNRNPA1, HNRNPA2B1, and TDP-43. As we have shown here, pathogenic mutants of ALS proteins FUS, SOD, and TDP-43 also colocalize with ORF1p in cytoplasmic RNP aggregates. As with certain neurodegeneration-associated proteins, increased expression or mislocalization of ORF1p, whether through mutation or loss of L1 suppression, could seed protein aggregation, co-sequester other cellular proteins or RNAs, disrupt normal patterns of protein degradation or RNA processing, and trigger cytotoxicity. Retrotransposon-encoded proteins can also induce cellular stress responses. Gasior et al. [191] showed that overexpression of L1 ORF2p causes double-strand chromosome breaks. These results are consistent with observations that L1 overexpression can induce apoptosis and senescence or potentially an immune response in some cell lines [192][193][194][195]. Perhaps these are reasons for the evolution of so many cellular factors that restrict L1 activity [42].
Previous studies have considered links between TDP-43 and retrotransposon expression. TDP-43 was first identified as a transcriptional repressor that binds the RNA regulatory element TAR of HIV-1 proviruses to inhibit their expression [196]. However, a role for TDP-43 in regulating HIV or HERV transcription is not clear [197]. Douville et al. [49] found expression of HERV-K(HML2) pol and TARDBP genes to be strongly and positively correlated, and their encoded proteins colocalized in ALS neurons. Douville and Nath [198] also linked TDP-43 with altered HERV-K(HML-2) RT expression in brain tissues. Data-mining rodent and human interaction experiments, Li et al. [137] found that TDP-43 protein targets and binds LTR and non-LTR TE transcripts and that this association is reduced in cortical tissues of FTLD patients. Furthermore, reanalysis of RNA-Seq datasets of human TDP-43 overexpressed in transgenic mice [199] and endogenous TDP-43 depleted in mouse striatum [200] showed a general increase in expression of LTR, non-LTR and DNA TEs under both conditions, with concordance between TE transcripts upregulated and those bound by TDP-43 protein [137]. While it was reported [51] that TDP-43 protein bound the HERV-K LTR with an attendant increase in HERV-K(HML-2) transcription and RT activity, Manghera et al. [138] found wild-type TDP-43 bound the HERV-K(HML-2) promoter without activating its transcription, while overexpressed ALS-associated TDP-43 mutants promoted HERV-K(HML-2) protein aggregation and clearance from astrocytes (but not neurons) by stress granule formation and autophagy. Overexpression of a human TDP-43 transgene in Drosophila was accompanied by motor problems and derepression of retrotransposons in general and glial cell-specific upregulation of gypsy elements in particular, along with an increase in programmed cell death induced by DNA-damage [139].
We found that, while TDP-43 binds and colocalizes with the L1 ORF1p RNP, its increased expression strongly represses rather than derepresses human L1 and mouse IAP cell culture retrotransposition (without attendant cytotoxicity). On the other hand, inhibition of endogenous TDP-43 had no effect on L1 retrotransposition in HEK 293T cells. Although altered levels of TDP-43 were associated with modestly increased methylation of endogenous L1 promoters, this was not accompanied by a change in exogenous ORF1p expression or promoter effects in a luciferase assay. Moreover, a recent study by Prudencio et al. [52] found no significant association between levels of TDP-43 RNA or protein and TE expression in frontal cortex samples of a large cohort of ALS/FTLD patients. Our reanalysis of two RNA-Seq datasets [147,148] also failed to detect strong TDP-43-specific changes in expression of retroelement subfamilies in cell lines depleted of endogenous TDP-43 protein. Therefore, a role for TDP-43 protein in aberrant retroelement activity begs further investigation.
We also reanalyzed for TE subfamily expression two RNA-Seq datasets of sALS tissue samples not previously examined for TE expression (GSE76220 and SRP064478) and one previously tested dataset of both C9orf72 and sporadic ALS tissue samples (GSE67196). In all three datasets, we failed to find significant misregulation of TE subfamilies in sALS vs controls, consistent with the previous analysis of GSE67196 [52] and with our RT-qPCR and Western blot analyses of ALS and control brain and spinal cord tissues. However, our analysis of a Neuro-LINCs dataset (SRP098831) found both SMA and C9ALS vs non-ALS patient-derived iPSC cell lines differentiated to motor neurons to have significant numbers of DE TEs, including young SINE Alu subfamilies: this was in line with the previous findings of Prudencio et al. [52] that TE expression is misregulated in C9ALS vs sALS samples of the GSE67196 dataset. However, our locus-specific analysis of the GSE67196 dataset suggested that many of the reads contributing to the retrotransposon subfamily analyses did not originate from TE sequences transcribed from their own endogenous promoters but rather from sequences contained within longer transcripts.
Several pitfalls exist for RNA-Seq analyses of differential TE expression: conclusions should be drawn with care. High copy number, close sequence similarity, and especially the frequent embedment of TE sequences in longer gene transcripts (i.e., exonization) can lead to misinterpretation. While expression of a TE subfamily may appear misregulated, a change in expression observed may in fact be due to altered expression of a gene in which a member of that TE subfamily resides. In their analysis of RNA-Seq data from HEK 293T cells, for example, Deininger et al. [174] mapped greater than 99 percent of L1-derived sequence reads within other RNAs unrelated to retrotransposition. Moreover, bona fide L1 transcripts originating from L1 5' UTR promoters were limited to only a small number of highly expressed full-length L1 loci.
Furthermore, we have speculated that cell conditions that induce elevated expression of L1s or HERVs, and therefore their encoded reverse transcriptases, could induce promiscuous reverse transcription of cytoplasmic RNAs ( [42], see also [201]). Indeed, a recent report has demonstrated the accumulation of cytoplasmic L1-related ssDNAs in neurons derived from hESCs lacking the exonuclease TREX1, a gene mutated in Aicardi-Goutières syndrome patients [195,202]. The ectopic cytoplasmic cDNAs so generated would be amenable to amplification during RNA-Seq or RT-qPCR protocols and so bias upwards estimates of expression from their source loci. Although as yet an unverified concern, recent studies have reported elevated levels of TE-derived cytoplasmic cDNAs or hybrid RNA/DNA molecules in cancer and other disease states [195,201,203]. The retrotransposon field is working to control such possible sources of error when interpreting TE expression data [174,175]. In general, further improvements in transcriptomics, and especially single-cell based approaches, will eventually clarify the degree of deregulation of TE expression in ALS and other neurodegenerative disorders.

Conclusions
In considering links between retrotransposon expression and neurodegenerative conditions, we expanded previous knowledge of the aggregation properties of the LINE-1-encoded ORF1 protein and factors that control its accumulation. We also presented data that the cell cycle does not strongly alter nuclear localization of endogenous L1 ORF1p in nullipotent embryonal carcinoma cells. We showed that some ALS-associated protein mutants associate with ORF1p in cytoplasmic aggregates and that increased expression of some ALS-linked proteins limit LINE-1 retrotransposition. We emphasized especially TDP-43, a protein that accumulates in the cytoplasm of a majority of ALS patients, but failed to find consistent evidence in cell culture for an effect on retrotransposon activity, in contrast to some previous reports.
By means of RT-qPCR and Western blotting of ALS tissues and reanalysis of available RNA-Seq datasets, we also sought a link between sporadic ALS and retrotransposon misregulation. In sum, clear-cut evidence is so far lacking for involvement of non-LTR retrotransposon expression in sALS. Using the same tissue samples as in the present study, we also recently profiled transcription of HERV-K(HML-2) and HERV-W LTR retrotransposons by direct Sanger sequencing of cloned cDNAs and RT-qPCR and Western blot analyses, but failed to find significant differences when comparing ALS and controls [204]. It is conceivable that previous observations of differential TE expression levels may relate to altered global DNA methylation status and other epigenetic changes observed in some ALS patients [205][206][207][208], which could in consequence cause selected TE loci to be differentially transcribed. At least, analyses of additional C9orf72-mutated ALS RNA-Seq datasets seem warranted. We believe examining neurodegenerative disease-affected tissues for perturbations in the aggregation dynamics of L1-encoded proteins could also prove informative. It is also reasonable to continue to apply improving methods of next-generation sequencing analysis to examine neurodegenerative and other brain diseases for misregulated activity of TEs in general and the L1 in particular, a mobile element with hundreds of thousands of copies and which through long evolution has been directly responsible for generating over a quarter of the DNA in the human genome [209].
siRNAs (Additional file 1: Figure S5A) [167] were obtained from Wicell (RRID: CVCL_9773) and cultured and passaged as previously described [168]. Post-mortem brain and spinal cord frozen tissues were obtained from the University of Maryland Brain and Tissue Bank of the NIH NeuroBioBank, the Target ALS Multicenter Postmortem Tissue Core at Johns Hopkins University, and the Department of Neurosciences of the University of California San Diego School of Medicine, as indicated in Additional file 3: Table S2. All tissues were obtained following approval of the Institutional Review Boards of the UCSD School of Medicine (to JR) and the JHU School of Medicine (IRB00066246 to JLG).
Western blotting, IF, and RNA FISH were performed as described [60,72]. All Western blots were run on NuPAGE 4-12% Bis-Tris gels (ThermoFisher). Cells were examined using a Nikon Eclipse Ti-A1 confocal microscope with NIS-Elements AR software.

Whole-cell protein and RNA extraction
For protein extracts, tissues or cells were lysed in RIPA buffer (Sigma) with Mammalian Protease Inhibitor Cocktail and phenylmethanesulfonyl fluoride (Sigma) and homogenized with a Diagenode Bioruptor. In the case of tissues, 2 mm zicronium silicate beads (Next Advance) were added to the tubes. Samples were centrifuged at 11K at 4 o C for 15 minutes to recover supernatant and resuspended in 3X SDS loading buffer. Isolation of HEK 293T cell nuclear and cytoplasmic extracts utilized the NE-PER kit (Thermo Scientific).
For RNA extracts, all brain tissue and some spinal cord tissues were disrupted and homogenized in 500 ml of Trizol (Invitrogen) using the TissueLyser LT (Qiagen). Briefly, 30 mg of sample were transferred to a 2 ml tube containing 250 μl of Trizol and one 5 mm stainless steel bead. The TissueLyser LT program used was 50Hz for 1 min. After a spin, the supernatant was collected and another 250 μl were added to the sample to repeat the same procedure. Finally, both fractions were combined and RNA purification with Trizol followed the manufacturer`s instructions. Some spinal cord samples were homogenized in 500 μl of Trizol and zicronium silicate beads using a Benchmark BeadBlaster24. Following centrifugation, the supernatant was further purified using an RNeasy Mini Kit with On-column DNase digestion with RNase-Free DNase Set (Qiagen).
Next, the RNA was treated with RQ1 RNase-free DNAse (Promega) for 30 min, purified with ultrapure phenol:chloroform:isoamyl alcohol mixed at 25:24:1 (v/ v/v) (Ambion) and precipitated with 3 volumes of ice cold 100% ethanol and 0.1 volume 3M sodium acetate. To assure absence of cross-contaminating genomic DNA, 1 μg of total RNA was treated again with another round of RNase-free DNase I (Invitrogen) for 15 min.
RNA Integrity numbers (RINs) are shown in Additional file 3: Table S2 (range: 2.1-10; median 6.6). RNA integrity numbers (RINs) were determined using an Agilent BioAnalyzer and Agilent RNA 6000 Nano Kit following the manufacturer's recommendations. We attribute low RIN numbers in some samples to long post-mortem intervals affecting tissue quality and to the rigorous DNase-treatments of RNA that were required to remove residual contaminating genomic DNA, a strategy necessary for our sensitive PCR amplification of multi-copy repeat cDNAs. To assess effects of RNA quality on our analyses we plotted RIN values versus RT-qPCR Ct-values of GAPDH and could detect no significant effect of RIN when the various tissue types were considered separately. However, a mild effect (R 2 =0.38) of RNA quality on Ct-levels is acknowledged when combining RIN and Ct values from all samples. Importantly, omission of samples with lower RINs did not affect our conclusions.

Retrotransposition assay
The EGFP L1 cell culture retrotransposition assay was conducted as previously described [87,221,222]. The IAP retrotransposition assay was carried out essentially as described in [144]. One μg of IAP-neo TNF element reporter plasmid was cotransfected with 0.5 μg of empty vector or test plasmid in HeLa-JVM cells. At eighteen hours post-transfection, the cells were expanded from six-well plates to T 75 flasks, and two days later selection for retrotransposition events with 500 μg/ml of G418 was begun. After 15 days of selection, cells were fixed, stained with Giemsa, and colonies were counted.

Assessment of toxicity
To test potential protein toxicity (Additional file 1: Figure S4), we co-transfected in HeLa-JVM cells pcDNA6/myc-His B, a blasticidin S-resistance gene (bsr)-containing vector, together with empty vector (pcDNA3) or test expression constructs. On day 2, cells were expanded to T 75 flasks and selection with 5 μg/ml blasticidin was begun. After 12 days, cells were fixed, stained with Giemsa and colonies were counted. Similarly, we co-transfected in HeLa cells pcDNA3, a neomycin (neo)-resistant vector, together with either empty vector (pcDNA6/myc-His B) or test expression constructs, followed by selection of cells with 500 μg/ml Geneticin (G418, Thermo Fisher).
Trypan Blue exclusion assays were performed in HEK 293T cells. Following staining, live and dead cells were counted using a Countess II Automated Cell Counter (Thermo Fisher Scientific). Use of the MultiTox-Fluor Multiplex Cytotoxicity Assay kit (Promega) followed manufacturer's instructions. This assay simultaneously measures cell viability and cytotoxicity in a single-reagent reaction, permitting ratios of live to dead cell readings to be calculated.

RT-qPCR
RT-qPCRs were conducted as previously described [146,224]. A High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems) was used to generate cDNA. RT-negative controls were run in parallel for all qPCR reactions. Duplicate samples were analyzed in a StepOne Real-Time PCR system (Applied Biosystems) using GoTaq qPCR Master Mix (Promega) and PCR primers at 200 nM each. We used two sets of primers to analyze endogenous L1 expression directed against L1Hs ORF1 (N-51-Fwd: GAATGATTTTGACGAGCTGAGAGAA; N-51-Rev: GTCCTCCCGTAGCTCAGAGTAATT) or L1Hs ORF2 sequence (N-22 Fwd: CAAACACCG CATATTCTCACTCA; N-22 Rev: CTTCCTGTG TCCATGTGATCTC). We also analyzed expression of AluS (AluS-Fwd: GCCGAGGCGGGCGGATCACC; AluS-Rev: GCCTCCCGAGTAGCTGGGAT) and AluY (AluY-Fwd: AGATCGAGACCATCCTGGCT; AluY-Rev: CCGCCTCCCGGGTTCACGCC). In all the cases, GAPDH was used as an internal normalization control (primers: GAPDH-Fwd: TGCACCACCAACTGCTTAG C; GAPDH-Rev: GGCATGGACTGTGGTCATGAG). qPCR cycling parameters were as follows: 10 min at 95°C, 40 cycles of 15 sec at 95°C, followed by 60 sec at 60°C. Melting curve analysis was performed to confirm the identity of the amplified product. We employed the ΔΔCt method [225] to determine relative differences in transcript levels. L1 and Alu transcript levels were plotted as "Fold change in transcript level" with respect to the transcript level in H9-hESCs (=1). Standard deviations were calculated based on 4 data points per sample derived from duplicate measurements and a technical replicate for each sample.

Bioinformatic analysis
Occurrence of R159 polymorphisms in human L1 elements L1Base 2 [13] was used for counting R159 polymorphism-containing human L1 elements. In brief, chromosome coordinates for human Full-Length, Intact LINE-1 elements (FLI-L1), human ORF2 Intact LINE-1 elements (ORF2-L1), and human Full-Length >4500nt LINE-1 elements (FLnI-L1) (Ens84.38) were obtained from L1Base 2. Corresponding sequences of L1 elements were retrieved using UCSC Table Browser [227]. Sequences of each subset were multiply aligned using MAFFT online [228] or as implemented in Geneious (Biomatters Ltd.; https://www.geneious.com; [229]). Occurrences within L1 ORF1 of a codon for R159 and its non-synonymous variants, including the most frequently observed codons for histidine (H), cysteine (C) or proline (P), were counted and their respective percentages calculated. Only a subset of the 13,671 L1 elements in the FLnI-L1 dataset were multiply aligned due to limitations of both the local and online versions of MAFFT. Also for the FLnI-L1 dataset, which included a greater number of evolutionarily older L1 sequences, a minority of aligned L1 sequences displayed structural rearrangements and/or higher sequence divergence in the R159 codon region resulting in unreliable prediction of sequence at the R159 codon position: these were excluded. In all, 6346 FLnI-L elements were included in the analysis.

RNA-Seq datasets
Publicly available RNA-Seq datasets were analyzed by TEtranscripts software package [150]. TDP-43-related datasets SRP057819 and GSE77702 have been previously described [147,148]. Dataset SRP064478, submitted by the Bennett Lab at Virginia Commonwealth University, consists of RNA-Seq data for total stranded RNA with >50 million 2x150 bp sequencing reads from 15 postmortem cervical spinal cord sections (7 ALS and 8 healthy controls). GSE76220 includes 20-30 million mappable 1x50 bp reads from total stranded RNA isolated from laser capture microdissected motor neurons from post-mortem lumbar spinal cords [172]. GSE67196 consists of on average 83 million 1X100 bp reads per sample (91.5 million for cerebellum and 73.6 million for frontal cortex), as described by [52,171].
The Library of Integrated Network-Based Cellular Signatures (LINCS)-NeuroLINCS dGAP dataset (accession number phs001231.v1.p1, SRP098831) includes RNA-Seq of iPSC-derived motor neurons from 4 C9ALS and 3 SMA patients (3 sequencing replicates each), and 3 unaffected healthy controls (2 or 3 replicates each). It has been reported that L1 activity in iPSCs can vary with cell passage, increasing during reprogramming but subsequently subsiding [224,226,230]. However, passage numbers of the NeuroLINCs cell lines fall within similar ranges, from 25 to 27 for ALS and healthy control and 21 to 30 for SMA samples.
MDS plots were generated in R script using the edgeR package [232].

Use of TEtranscripts
TEtranscripts is a software package that estimates both gene and TE transcript abundances in RNA-Seq data and conducts differential expression analysis on the resultant count tables [150]. Sequences were aligned to human genome assembly GRCh38 using STAR [233]. Alignment parameters were outFilterMultimapNmax100 and winAnchorMulti-mapNmax 200, which allow up to 100 alignments per read. TE annotation files were downloaded from http://labshare.cshl.edu/shares/mhammelllab/www-data/TEToolkit/ (including 1181 TE types). Following the generation of a count table for gene and TE transcripts, the differential expression analysis closely followed the DESeq2 package [234] for modeling the counts data with a negative binomial distribution and computing adjusted P-values. In addition to the standard transcript abundance normalization approach used by the DESeq2 package, TEtranscripts offers two additional options, reads per mapped million (RPM) and quantile normalization. All other procedures exactly followed the DESeq2 method. TEtranscripts runs the DESeq2 method with a default set of general parameters. When there were no (or very few) replicates, we used the blind method for variance estimation and fit-only for SharingMode. Otherwise, we used pooled or per-condition methods and maximum SharingMode, as suggested by the DESeq2 package.

Locus-specific mapping of TEs
The pipeline to map TEs to individual genomic loci used the alignment algorithm HISAT2 [235] to map sequence reads to the human genome. Reads that mapped to more than one genomic position were discarded. Counts per TE integrant (genomic loci) were generated using the multi-BamCov tool from the BEDtools software [236]. Normalisation for sequencing depth was performed using voom [237], with total number of reads on genes as size factors. RepeatMasker 4.0.5 (Library 20140131), a newer version than RepeatMasker 4.0 used by Prudencio et al. [52], was used to generate a list of TE subfamilies. In the case of HERVs, we re-assembled fragmented internal and LTR sequences to generate full-length HERV integrants: this step avoids bias in counts due to the highly fragmented nature of the annotated HERVs. We removed from our analyses very small and abundant repeats (low complexity and simple repeats). Any TEs with a low number of reads across all samples or which overlapped exons were also omitted from our analyses. Differential expression was performed as implemented in the voom library of Bioconductor [238]. A TE locus was considered to be differentially expressed if its fold change was greater than 2 and FDR smaller than 0.05. The Benjamini-Hochberg procedure was used to compute the FDR. Hierarchical clustering of the heatmap was performed with Pearson correlation as distance and complete agglomeration method for both, rows and columns. Any raw data files will be provided upon request to the authors.

Additional files
Additional file 1: Figure S1. Patterns of L1 ORF1p expression in various cell lines using multiple antibodies for detection. Figure S2. Frequency of polymorphisms and DNA methylation at the L1 ORF1p R159 residue. Figure  S3. ORF1p does not contain a CRM1-dependent nuclear export signal. Figure  S4. Cell culture toxicity assays. Figure S5. Overexpression or knockdown of TDP-43 protein does not alter L1 expression. Figure S6. Methylation analyses of the CpG island of the 5' UTR promoter of endogenous L1 elements show effects of altered levels of TDP-43. Figure S7. Expression levels of L1 (and Alu TEs determined by RT-qPCR. Figure S8. Representative gels showing expression of ORF1p in normal and ALS-associated tissues determined by Western blotting. Figure S9. TE locus-specific analyses of the GSE67196 RNA-Seq dataset [171]. (PDF 25088 kb) Additional file 2: