Revisiting the evolution of mouse LINE-1 in the genomic era
© Sookdeo et al.; licensee BioMed Central Ltd. 2013
Received: 19 July 2012
Accepted: 25 October 2012
Published: 3 January 2013
LINE-1 (L1) is the dominant category of transposable elements in placental mammals. L1 has significantly affected the size and structure of all mammalian genomes and understanding the nature of the interactions between L1 and its mammalian host remains a question of crucial importance in comparative genomics. For this reason, much attention has been dedicated to the evolution of L1. Among the most studied elements is the mouse L1 which has been the subject of a number of studies in the 1980s and 1990s. These seminal studies, performed in the pre-genomic era when only a limited number of L1 sequences were available, have significantly improved our understanding of L1 evolution. Yet, no comprehensive study on the evolution of L1 in mouse has been performed since the completion of this genome sequence.
Using the Genome Parsing Suite we performed the first evolutionary analysis of mouse L1 over the entire length of the element. This analysis indicates that the mouse L1 has recruited novel 5’UTR sequences more frequently than previously thought and that the simultaneous activity of non-homologous promoters seems to be one of the conditions for the co-existence of multiple L1 families or lineages. In addition the exchange of genetic information between L1 families is not limited to the 5’UTR as evidence of inter-family recombination was observed in ORF1, ORF2, and the 3’UTR. In contrast to the human L1, there was little evidence of rapid amino-acid replacement in the coiled-coil of ORF1, although this region is structurally unstable. We propose that the structural instability of the coiled-coil domain might be adaptive and that structural changes in this region are selectively equivalent to the rapid evolution at the amino-acid level reported in the human lineage.
The pattern of evolution of L1 in mouse shows some similarity with human suggesting that the nature of the interactions between L1 and its host might be similar in these two species. Yet, some notable differences, particularly in the evolution of ORF1, suggest that the molecular mechanisms involved in host-L1 interactions might be different in these two species.
KeywordsRetroposon Retrotransposon LINE-1 L1 Mus musculus Recombination
Long interspersed nuclear element-1 (LINE-1 or L1) constitutes the dominant category of transposable elements in mammalian genomes. L1s have accumulated in the genomes of their mammalian hosts in extremely large numbers and contribute to more than 20% of genome size in human and mouse [1, 2]. L1s have been a rich source of evolutionary novelties by providing motifs that can be recruited by the host either for the regulation of its own genes or within its coding sequences [3–6]. However, L1 activity can also be detrimental to the fitness of the host [7, 8], either by inserting within genes [9, 10] or by mediating chromosomal rearrangements through ectopic (non-allelic) recombination [11, 12]. L1 elements replicate using a copy-and-paste mechanism that involves the reverse-transcription of the L1 RNA at the insertion site [13–15]. L1 encodes the replicative machinery necessary for the retrotransposition reaction. It contains two open-reading frames (ORFs) that are both indispensable for L1 retrotransposition. ORF1 encodes a trimeric protein with RNA-binding properties and nucleic-acid chaperone activity [16–20]. ORF2 encodes an endonuclease that makes the first nick at the insertion site and a reverse-transcriptase that copies L1 RNA into DNA at the site of insertion [21, 22]. L1 has a 5’ untranslated region (UTR) that acts as an internal promoter [23, 24] and a 3’ UTR with a conserved poly-G tract of unknown function . The L1 retrotransposition reaction produces mostly 5’ truncated elements that are transpositionally inactive [26, 27]. As the vast majority of L1 insertions do not serve a function for the host, they accumulate mutations at the neutral rate so that young families of L1 elements are less divergent than older ones [28–32].
The pattern of L1 element evolution in mammals is very unusual. In most species analyzed so far, L1 evolves as a single lineage: a family of elements emerges, amplifies to hundreds or thousands of copies and then becomes extinct, being replaced by a more recently evolved family [30, 33–35]. This process is exemplified in human where a single lineage of replicatively dominant families has evolved over the last 40 MY . The reason(s) why L1 evolves as a single lineage remains unclear but the similarity between L1 and H3N2 influenza A virus evolution [36–38] suggests that the single lineage mode of evolution could result from a co-evolutionary arms race between L1 and its host. This hypothesis is supported by the observation that the coiled-coil domain of ORF1 harbors the signature of adaptive evolution, possibly in response to host repression , and that adaptive evolution apparently correlates with the replicative success of L1 families . However, in early primate evolution (from 70 to 40MY), multiple L1 lineages have co-existed in the human genome . Interestingly, co-existing lineages always had non-homologous 5’UTRs suggesting that their co-existence could be due to their reliance on different host factors for their transcription.
The patterns described above result mostly from the analysis of the human genome and it is unclear how patterns of evolution in human recapitulate L1 evolution in other species. It is thus important to examine in greater detail the evolution of L1 lineages in other mammals. Pre-genomics studies in the house mouse (Mus musculus) have demonstrated the presence of multiple concurrently active L1 families with non-homologous promoters [33, 40–48]. Recently active families are classified into two groups based on their promoter types (A or F types), whereas ancestral L1 families carry a third promoter, the V type. The co-existence of multiple L1 families with different promoters in extant mice recapitulates the situation in early primate evolution and provides a unique opportunity to investigate the interactions between concurrent L1 families and the molecular properties that would allow for such co-existence.
Previous L1 studies in mice were limited to sequence analysis performed on a few L1 loci, the majority of which were fragments of L1 inserts. No detailed study of L1 evolution in mouse has been performed since the completion of the mouse genome sequence . With the availability of this genome, we decided to perform a comprehensive analysis of full-length L1 elements to investigate the evolutionary dynamics of L1 in mouse. We present evidence that the diversification of mouse L1 has been influenced by frequent events of recombination across the entire length of the element, rapid structural changes in ORF1, as well as lateral transfer by inter-specific hybridization.
A total of 20,459 L1 inserts with complete reverse transcriptase (RT) domains were identified using the Genome Parsing Suite (GPS). L1 elements were first grouped based on their 5’UTR. This was done by comparing the 5’ end of all elements with a library of previously described mouse 5’UTR using the Repeatmasker program . The A, F, V, and Lx 5’UTR types have long been characterized [33, 50, 51] and the majority of elements could be assigned to one of these 5’UTR sequences. A number of elements however carried 5’UTRs distinct from these four types. These elements were aligned to each other and grouped into three novel types of 5’UTR: (1) a 5’UTR with similarity to the F type but with distinctive features, named Fanc (for F ancestral); (2) a 5’UTR that was not characterized before, named Mus (because it is absent from the rat genome); and (3) a 5’UTR that shows no similarity with any others, named N (for novel).
Once elements were sorted based on their 5’UTRs, they were further categorized into families using a phylogenetic analysis of the 3’ terminus. A family is defined as a collection of elements that result from the activity of a highly homogenous group of progenitors, which are characterized by a unique combination of characters. In the first step of the phylogenetic analysis, neighbor joining trees  of elements sharing similar 5’UTRs were built. Distinct clusters of sequences were provisionally considered families and were validated by a second round of phylogenetic analysis based on the principle that elements belonging to the same family should yield a star phylogeny (that is, a phylogenetic tree devoid of structure) because these elements result from the activity of very similar progenitors. These families were further confirmed by phylogenetic analysis performed on other regions of L1 to ensure that the homogeneity of the families extend over the entire length of the element.
Copy number, divergence, and age of mouse L1 families
Repeat masker classification
Genomic copy numberb
Number of FL elements
Average pairwise divergence (% ± S.E.)c
0.376 ± 0.073
2.939 ± 0.294
3.916 ± 0.304
4.346 ± 0.414
5.167 ± 0.341
8.554 ± 0.434
8.346 ± 0.414
0.462 ± 0.095
0.496 ± 0.087
2.233 ± 0.196
1.356 ± 0.250
3.929 ± 0.421
3.853 ± 0.278
4.537 ± 0.271
8.040 ± 0.400
11.627 ± 0.503
11.683 ± 0.487
12.366 ± 0.610
16.795 ± 0.821
L1VL1, L1Md_F, L1Md_F3
3.447 ± 0.212
15.257 ± 0.647
18.318 ± 0.855
17.575 ± 0.968
12.068 ± 0.590
14.971 ± 0.521
19.864 ± 0.846
23.907 ± 0.998
18.595 ± 0.841
25.642 ± 1.237
Phylogenetic analysis of L1 families based on ORF2
One of the most striking features visible on the tree is that families with similar 5’UTRs do not form monophyletic groups indicating that L1 families have frequently recruited novel 5’UTR, either from unknown sources or from ancient families. The oldest families in our study carried an Lx promoter, which was replaced three times: once by the Fanc promoter (L1MdFanc_II) and by the V promoter twice (L1MdV_II and III). The Fanc promoter was replaced independently twice by the Mus promoter as L1MdMus_I and L1MdMus_II do not form a monophyletic group. The Mus promoter was eventually replaced by the V promoter (L1MdV_I) and went extinct. The F promoter was then resuscitated approximately 6.4 MY ago and gave rise to families L1MdF_I to V. Approximately 4.6 MY ago the A promoter was recruited yielding the modern A lineage which extend from families L1MdA_VII to I. Within this lineage, an additional recruitment occurred resulting in the L1MdN_I family. Finally the F promoter was recently recruited twice, approximately 2.2 MY by the L1MdGf_II family and approximately 1.2 MY by the Tf/Gf_I lineage. Thus we estimate that L1 in mouse has experienced 11 replacements of 5’UTR.
The topology of the ORF2 tree indicates that mouse L1 families evolved mostly as a single lineage. This does not mean that a single family or single lineage was active at a time. In fact, the co-existence of multiple active families characterizes the evolution of L1 for the last 13MY of mouse evolution. For instance between 1 and 2.5 MY ago, six families (L1MdTf_III, L1MdA_II, L1MdA_III, L1MdGf_II, LMdN_I, and L1MdF_I) were active in the mouse genome as attested by the overlap in their average pairwise divergence (Table 1). In some cases, several families evolved into lineages that diversified and co-existed with the dominant lineage for several MY. The lineage composed of L1MdF_I, II, and III is the one that co-existed the longest with the lineage that yielded the currently active families. L1MdF_I was active 2.12 MY ago, at about the same time as families L1MdA_III and L1MdN_I. These families, however, are all descendants of family L1MdF_IV which was active 6.4 MY ago (Figure 1 and Table 1). Thus the lineage consisting of L1MdF_I, II, and III co-existed with the lineage that produced L1MdA_III and L1MdN_I for more than 4 MY. Eventually the L1MdF lineage became extinct. Thus the cascade structure of the ORF2 tree, typical of the single lineage mode of evolution reported in other mammals, is consistent with a model in which multiple families are concurrently active until one of them attains replicative supremacy, coinciding with the extinction of its competitors.
Detection of recombination among murine L1 families
Because L1 families have frequently recruited novel promoters we decided to examine if L1 lineages have exchanged genetic information in other regions of the element. To this end, several methods implemented in the RDP 3.0 software were used: two substitution-based approaches, MaxChi  and Chimera , and two phylogenetic approaches, Bootscan  and RDP . Breakpoints and statistically significant events of genetic recombination detected by RDP were verified by visual inspection of the FL consensus alignment (see Additional file 3) and phylogenetic analyses. A minimum of six recombination events was detected.
The next oldest recombination event is between the ancestor of L1MdA_IV (which is the ancestor of L1MdA_I, II, and III) and L1MdF_II, near the 3’ end of the element (Figure 2D). A 666 bp region was transferred from L1MdF_II to the L1MdA_IV family. This fragment is also found in all L1MdA sequences derived from L1MdA_IV as well as the Gf and Tf families since they also acquired their ORF2 and 3’UTR from an ancestral L1MdA family. Similarly, a segment located in the coiled-coil domain of ORF1 was transferred from L1MdMus_II to L1MdA_VII and L1MdA_VI (Figure 2E). Subsequently an overlapping region was transferred from L1MdA_VII or L1MdA_VI to L1MdF_III. This segment is also found in L1MdGf_II as this family got its ORF1 from L1MdF_III.
It should be noted that our criteria for identifying recombination events were stringent, as we only considered the recombination of large segments to be significant. Thus it is plausible that exchanges of sequences of shorter length have occurred between L1 families but were not detected due to the small number of defining characters in some conserved regions of L1, such as ORF2. The number of recombination events reported here suggests that recombination has played a significant role in the evolution of novel L1 families in mouse and can occur across the entire length of L1.
Evolution of the ORFs
Summary of selection detection tests
Positively selected sites
Number of branches with positive selection
0.494 ± 0.275
0.608 ± 0.401
0.354 ± 0.371
5' terminus (1–1,170)
0.308 ± 0.411
3' terminus (1171-end)
0.229 ± 0.353
We examined the level of conservation of domains of ORF1 that are known to be functionally important [19, 59, 60]. Three domains have been identified: a coiled coil (CC) domain that mediate the formation of ORF1p trimers, a RNA-recognition motif (RRM), and a C-terminal domain (CTD). The 3’ half of ORF1, which contains the RRM and CTD domains, as well as approximately the first 50 amino acids of ORF1 are very conserved across families, in contrast with the CC domain that shows a high level of structural variation. We analyzed independently the 5’ terminus, the CC domain, and the 3’ half of ORF1 for evidence of selection using recombination breakpoints as boundaries. All the methods used strongly indicated that the 5’ terminus and the 3’ half of ORF1 are evolving under purifying selection. The PARRIS method rejected the hypothesis that a subset of amino acid is evolving under positive selection and the GABranch method showed that dN/dS has remained significantly lower than 1 in these regions during the entire evolutionary span covered by the analysis. This is not surprising, especially for the 3’ half of ORF1, as the RRM and CTD motifs were shown to be conserved across mammals . The SLAC, FEL, and REL programs failed to identify a single amino acid under positive selection at the 5’ end. In 3’, the REL method identified two amino acids under positive selection but these residues are likely to be false-positive as the changes in amino acid result from independent events of mutation at CpG nucleotides, which are known for their unusually high mutation rate.
More surprising is the degree of conservation at the amino acid level of the CC domain. Previous studies have shown that the CC domain of ORF1 has evolved under positive selection in primates [30, 39]. In the case of the mouse, surprisingly, the PARRIS method rejected the hypothesis that some amino acid evolved under positive selection, although a moderately high dN/dS ratio was obtained (0.608), and the GA Branch method failed to identify a single branch in the evolution of the coiled coil with a dN/dS >1. Out of the three methods (SLAC, FEL, and REL) used to detect selection at specific amino acids, only one (REL) identified two amino acids that could have evolved under positive selection. It is thus plausible that these two sites are false-positive as they have been identified by a single method. Even if these sites are evolving under positive selection, it remains true that the signature of positive selection in the mouse CC is much weaker than it is in human [30, 39].
Evidence for the lateral transfer of L1 families
We performed the first comprehensive analysis of L1 evolution since the completion of the mouse genome . The analysis is limited to the most recently active L1 families and covers approximately the last 13 MY of mouse evolution. As murine rodents evolve approximately eight times faster than hominoids, the amount of evolutionary change investigated here is similar to previous studies in humans that covered more than 80 MY of primate evolution [30, 35]. The results are consistent with the large number of analyses performed in the pre-genomic era [32, 33, 41–45, 50, 65–68] but, by focusing solely on intact FL elements, we were able to provide for the first time a complete picture of the evolution of mouse L1 families over the entire length of the element.
Evolution of L1 as a single lineage
The evolution of L1 in mouse fits the single lineage mode of evolution described previously in other mammals and particularly in human [30, 35, 63, 69]. This is exemplified by the similarity between the tree in Figure 1 and the tree based on the human ORF2 (Figure 2 in ). This model is based on the observation that L1 phylogenies have a typical cascade structure that is best explained by the successive activity of L1 families: a single family, or a group of closely related families, is active at a given point in time until a new family emerges and replaces the pre-existing family, which usually becomes extinct. In some instances, however, several lineages may co-exist until one eventually becomes extinct. This is the case of the L1MdF_I, II, and III lineage which co-existed with the dominant lineage for approximately 4 MY and of the Tf and L1MdA_I, II, and III lineages that co-existed for about 2 MY and are still active in the mouse genome. In ancestral primates a similar situation occurred but on a much longer period of evolutionary time as the L1PB and L1PA lineages co-existed for 30 MY . We previously observed that, in human, L1 lineages that co-exist for extended periods always have different promoter sequences. We proposed that families with different promoter sequences rely on different host-factors for their transcription and are consequently not relying on the same host-encoded resources . This situation allows them to co-exist as they are not using the same genomic ‘niche’. In mouse the same observation can be made. The lineage composed of L1MdF_I, II, and III co-existed with the main lineage when this one was dominated by families carrying the A promoter (L1MdA_III to VI). Similarly, the two lineages that are currently active, the L1MdA_I, II, and III and the L1MdTf/Gf, carry different, non-homologous 5’UTRs. Thus, it is possible that the conditions that allow for multiple lineages to co-exist are the same in mouse and in human. Unlike in modern human where a single family is currently active (the Ta family) , the modern house mouse genome harbors several families with different 5’ UTR and consequently present an excellent model to test experimentally the hypothesis that the activity of different 5’UTR is one of the conditions for the co-existence of families and lineages.
Acquisition and exchange of sequence during L1 evolution
The analysis of FL elements has revealed the extraordinary ability of L1 families to acquire novel motifs and to exchange sequences (Figures 2 and 3). The recruitment of novel 5’UTR sequences [30, 33] as well as the recombinant nature of some L1 families in mouse [45, 46] and rat [34, 69, 70] have long been described. Three mechanisms have been proposed to account for the mosaic nature of some families. First, recombination between genomic copies, that is at the level of DNA templates, could result in the formation of a novel transpositionally competent family. This hypothesis has been discounted on the basis that it is highly unlikely that a chance recombination event between two replicatively competent elements occurred while recombination between any of the hundreds of thousands L1 pseudogenes, the majority of which have suffered the effect of inactivating mutations, is much more likely to produce an inactive element . Second, recombination could occur at the time the L1 RNA is reverse-transcribed and could result from the formation of a RNA/DNA heteroduplex between the L1 RNA and a genomic copy at the insertion site . This model is supported by the observation that the recruitment of novel motifs seems to be directional as it is always a chronologically young 3’ end that recruits an older 5’ terminus . Third, mosaic elements could be produced if the L1 encoded reverse transcriptase switches RNA strand at the time of insertion. Polymerase strand-switching is a well-known feature of RNA viruses [72, 73]. This mechanism insures that recombination occurs between replicatively competent elements, that is elements that carry a 5’UTR capable of driving their transcription. The third model predicts that recombination occurs only between families that are simultaneously active whereas the first and second models do not have such a requirement. We found that the exchange of genetic information occurs both between simultaneously active families and by resuscitation of motifs from extinct families. For instance, the coiled-coil domain of L1MdMus_II has been recruited by L1MdA_VII about 4.6 MY ago, long after the extinction of L1MdMus_II which was active 8.23 MY ago. The L1MdGf_II family is also the product of a recombination between two families that were not active simultaneously, the L1MdF_III and the L1MdA_III families (which amplified 4.42 and 2.15 MY ago, respectively). All other instances of recombination occurred between families that were simultaneously active, which is consistent with the polymerase strand-switching model. Similarly, the acquisition of novel 5’UTRs tend to result from the transfer of 5’ termini between families that were active at the same time. This is exemplified by the evolution of the F-type which was transferred from L1MdFanc_I (active 6.80 MY ago) to the ancestor of L1MdF_V (at 6.43 MY) and subsequently transferred from L1MdF_I (active 2.12 MY ago) to the recently active L1MdTf and L1MdGf families.
Evolution of ORF1
The first ORF is arguably the least understood region of L1, although it has been the subject of much attention in the past few years [17–20, 59, 60, 74–78]. Its secondary structure has been resolved as a dumbbell shape resulting from the formation of a trimeric structure mediated by the coiled coil domain . It is established that it has RNA-binding abilities, mediated by the RRM, can act as a nucleic acid chaperone [19, 20] and form multimers in the presence of nucleic acids . Previous studies have shown that the 3’ half of ORF1 is very conserved  and our analysis confirms this is the case in mouse. In contrast, studies in human have demonstrated that the coiled-coil domain is evolving under strong positive selection as indicated by the high values of dN/dS reported in the evolution of this region [30, 39]. Such a rapid evolution at the amino-acid level is certainly adaptive and it was proposed that this was the result of an arms-race between L1 and its human host. This hypothesis was further supported by the fact that periods of adaptive evolution in the coiled coil coincide with period of intense L1 activity . However, we failed to find strong evidence of adaptive evolution in the mouse coiled coil. In contrast we found an extraordinary level of structural instability in this region (Figure 4), unexpected in a protein coding region critical for the multimeric structure of the functional protein. Instability in this region has also been described in the rat L1 suggesting a common role for these structural changes in these two species [34, 69]. Structural changes in the coiled coil occur so frequently that it is tempting to speculate that they are adaptive, and are evolutionarily equivalent to periods of intense amino acid replacement in humans.
We performed a comprehensive analysis of L1 evolution in mouse. This analysis covered the last 13 MY of mouse evolution, since the split between mouse and rat. The mouse L1 has evolved as a single lineage for most of its evolution, although co-existence between families carrying different promoter sequences was observed. L1 families have frequently acquired novel 5’UTR and have exchanged sequences over the entire length of the element. No evidence of rapid amino acid replacement in the ORF1 was detected, although it is likely that the structural instability of the CC domain is adaptive. The general pattern of evolution of mouse L1 is similar to the one in human suggesting that the nature of the interactions between L1 and its host might be similar in these two species. There are however some intriguing differences between mouse and human, particularly in the evolution of ORF1. These differences suggest that the molecular mechanisms involved in host-L1 interactions might be different in these two species.
Collection and classification of full-length L1 elements
Full-length (FL) elements were collected from the Mus musculus 2006 (mm8) genome built using the GPS . GPS conducted a BLAST type-search (WU-tBLASTn) of the genome using the conserved Reverse Transcriptase (RT) domain of ORF2 as a query. GPS then cut 7,000 bp upstream and downstream of the RT domain yielding a 14,000 bp fragment. A second WU-tBLASTn was then performed on the 14,000 bp cutouts to identify regions characteristic of L1 (ORF1, the endonuclease domain of ORF2, the RT domain, and the 3’UTR). In this analysis, GPS did not search for sequence identity at the 5’ end since L1 is known to frequently recruit novel sequences as 5’UTR [30, 33]. Thus, a file containing 3,000 bp upstream of ORF1 was generated for further analyses. The FL sequences were first sorted based on their 5’UTRs. Once elements were sorted based on their 5’UTRs, they were further categorized into families using a phylogenetic analysis of the 3’ terminus. A family is defined as a collection of elements that result from the activity of a highly homogenous group of progenitors, which are characterized by a unique combination of characters. In the first step of the phylogenetic analysis, neighbor joining trees  of elements sharing similar 5’UTRs were built. Distinct clusters were provisionally considered families and were validated by a second round of phylogenetic analysis based on the principle that elements belonging to the same family should yield a star phylogeny because they result from the activity of similar progenitors. These families were further confirmed by phylogenetic analysis performed on other regions of L1 to verify that the homogeneity of the families extend over the entire length of the element. Full-length consensus sequences were derived for each family and are available on Repbase. Phylogenetic analyses were performed using the neighbor joining (NJ) method  based on the maximum composite likelihood parameters distance included in the MEGA 5.01 software package . The model that best fits the data was determined for each alignment using MEGA. The robustness of each phylogenetic tree was assessed using a bootstrap procedure with 1,000 replicates. Families were named by the name of the 5’ promoter (A, F, Fanc, V, Lx, Mus, or N; see result) followed by a roman number. The smaller the roman number, the younger the family is. For instance families L1MdA_I, L1MdA_II, and L1MdA_III are subsets of the previously described L1MdA family; family L1MdA_I is younger than family L1MdA_II and family L1MdA_III is the oldest of the three. We kept the Gf  and Tf  names for the recently active Tf and Gf families because these names have been widely used in the literature.
Analysis of FL elements
NJ, maximum parsimony (MP), and maximum likelihood (ML) trees were calculated for each region of L1. Phylogenetic trees were reconstructed using the MEGA 5.01 package . The RDP3.0 program (Recombination Detection Program 3.0, available at http://darwin.uvigo.es/rdp/rdp.html) was used to search for evidence of recombination among families. RDP allows for the use of several recombination detection methods including substitution and phylogeny-based methods. Two substitution-based methods, MaxChi  and Chimaera , as well as a phylogenetic method, bootscan , were used to analyze the datasets. The RDP software also includes its own unique algorithm termed ‘RDP’  which is also a phylogenetic approach to detecting recombination. A window size of 50 bp was used to detect breakpoints between consensus sequences. Statistically significant events of recombination were verified by comparing phylogenetic trees on each side of the putative breakpoint.
To test for evidence of selection in the evolution of L1 several methods implemented in the web server http://www.datamonkey.com of the HyPhy program  were used. The first method uses a maximum likelihood approach (PARRIS) to determine if a proportion of site in an alignment evolves with a ratio dN/dS>1 . A ratio significantly >1 is indicative of positive selection whereas a ratio <1 is indicative of purifying selection. The second method, GABranch  can detect lineage-specific variation in selective pressure and requires no a priori specification of branches in a phylogeny that may have evolved under different values of dN/dS. The dN/dS test is however not very sensitive, particularly if selection acts on a few codons. For this reason we used three methods designed to detect the action of positive or negative selection at specific sites in an alignment: Single Likelihood Ancestor Counting (SLAC), a Random Effects Likelihood (REL), and Fixed Effects Likelihood (FEL) . For each dataset, the model that best fits the data was determined using the tool available at datamonkey.com. As selection detection methods are sensitive to recombination, we performed our analyses independently for each segment of L1 flanked by recombination breakpoint. Previous studies on human L1 have documented positive selection in the coiled-coil (CC) domain of ORF1 [30, 39]. CC structures are formed from two or more α-helical peptide chains that contain a distinct arrangement of non-polar side chains . Domains that can form CC consist of heptads (or seven residue repeats) with non-polar or hydrophobic residues in the first and fourth positions. The program COILS  was used to identify the position of the CC domain in each consensus sequence as well as the number of constitutive heptads.
Age and copy number of L1 families
The age of each subfamily was estimated by calculating the average pairwise divergence based on the 3’UTR. CpG dinucleotides and the highly mutable polypurine tract located in the 3’UTR were removed from alignment. The average divergence between copies as well as the standard error was calculated using the maximum likelihood parameter distance (using the MEGA 5.01 software). Divergences were converted to time assuming a neutral rodent genomic substitution rate of 1.1%/MY (calculated using the data presented on Table 5 of  and assuming a divergence Mus/Rattus at 13MY ).
Availability of supporting data
The consensus sequences are available in Repbase (http://www.girinst.org/repbase/).
- LINE-1 L1:
Long Interspersed Nuclear Elements-1
Million of year
Open reading frame
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.View ArticlePubMed
- Mouse Genome Sequencing C, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View Article
- Han JS, Boeke JD: LINE-1 retrotransposons: Modulators of quantity and quality of mammalian gene expression?. Bioessays. 2005, 27: 775-784. 10.1002/bies.20257.View ArticlePubMed
- Han JS, Szak ST, Boeke JD: Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature. 2004, 429: 268-274. 10.1038/nature02536.View ArticlePubMed
- Horie K, Saito ES, Keng VW, Ikeda R, Ishihara H, Takeda J: Retrotransposons influence the mouse transcriptome: implication for the divergence of genetic traits. Genetics. 2007, 176: 815-827.PubMed CentralView ArticlePubMed
- Akagi K, Li J, Stephens RM, Volfovsky N, Symer DE: Extensive variation between inbred mouse strains due to endogenous L1 retrotransposition. Genome Res. 2008, 18: 869-880. 10.1101/gr.075770.107.PubMed CentralView ArticlePubMed
- Boissinot S, Davis J, Entezam A, Petrov D, Furano AV: Fitness cost of LINE-1 (L1) activity in humans. Proc Natl Acad Sci U S A. 2006, 103: 9590-9594. 10.1073/pnas.0603334103.PubMed CentralView ArticlePubMed
- Boissinot S, Entezam A, Furano AV: Selection against deleterious LINE-1-containing loci in the human lineage. Mol Biol Evol. 2001, 18: 926-935. 10.1093/oxfordjournals.molbev.a003893.View ArticlePubMed
- Kazazian HH, Wong C, Youssoufian H, Scott AFDGP, Antonarakis SE: Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature. 1988, 332: 164-166. 10.1038/332164a0.View ArticlePubMed
- Chen JM, Stenson PD, Cooper DN, Ferec C: A systematic analysis of LINE-1 endonuclease-dependent retrotranspositional events causing human genetic disease. Hum Genet. 2005, 117: 411-427. 10.1007/s00439-005-1321-0.View ArticlePubMed
- Burwinkel B, Kilimann MW: Unequal homologous recombination between LINE-1 elements as a mutational mechanism in human genetic disease. J Mol Biol. 1998, 277: 513-517. 10.1006/jmbi.1998.1641.View ArticlePubMed
- Song M, Boissinot S: Selection against LINE-1 retrotransposons results principally from their ability to mediate ectopic recombination. Gene. 2007, 390: 206-213. 10.1016/j.gene.2006.09.033.View ArticlePubMed
- Cost GJ, Feng Q, Jacquier A, Boeke JD: Human L1 element target-primed reverse transcription in vitro. EMBO J. 2002, 21: 5899-5910. 10.1093/emboj/cdf592.PubMed CentralView ArticlePubMed
- Luan DD, Eickbush TH: RNA template requirements for target DNA-primed reverse transcription by the R2 retrotransposable element. Mol Cell Biol. 1995, 15: 3882-3891.PubMed CentralView ArticlePubMed
- Luan DD, Korman MH, Jakubczak JL, Eickbush TH: Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell. 1993, 72: 595-605. 10.1016/0092-8674(93)90078-5.View ArticlePubMed
- Januszyk K, Li PW, Villareal V, Branciforte D, Wu H, Xie Y, Feigon J, Loo JA, Martin SL, Clubb RT: Identification and solution structure of a highly conserved C-terminal domain within ORF1p required for retrotransposition of long interspersed nuclear element-1. J Biol Chem. 2007, 282: 24893-24904. 10.1074/jbc.M702023200.View ArticlePubMed
- Martin SL: Nucleic acid chaperone properties of ORF1p from the non-LTR retrotransposon, LINE-1. RNA Biol. 2010, 7: 706-711. 10.4161/rna.7.6.13766.PubMed CentralView ArticlePubMed
- Martin SL, Branciforte D, Keller D, Bain DL: Trimeric structure for an essential protein in L1 retrotransposition. Proc Natl Acad Sci U S A. 2003, 100: 13815-13820. 10.1073/pnas.2336221100.PubMed CentralView ArticlePubMed
- Martin SL, Bushman FD: Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon. Mol Cell Biol. 2001, 21: 467-475. 10.1128/MCB.21.2.467-475.2001.PubMed CentralView ArticlePubMed
- Martin SL, Cruceanu M, Branciforte D, Wai-Lun Li P, Kwok SC, Hodges RS, Williams MC: LINE-1 retrotransposition requires the nucleic acid chaperone activity of the ORF1 protein. J Mol Biol. 2005, 348: 549-561. 10.1016/j.jmb.2005.03.003.View ArticlePubMed
- Feng Q, Moran JV, Kazazian HH, Boeke JD: Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell. 1996, 87: 905-916. 10.1016/S0092-8674(00)81997-2.View ArticlePubMed
- Mathias SL, Scott AF, Kazazian HH, Boeke JD, Gabriel A: Reverse transcriptase encoded by a human transposable element. Science. 1991, 254: 1808-1810. 10.1126/science.1722352.View ArticlePubMed
- Minakami R, Kurose K, Etoh K, Furuhata Y, Hattori M, Sakaki Y: Identification of an internal cis-element essential for the human L1 transcription and a nuclear factor(s) binding to the element. Nucleic Acids Res. 1992, 20: 3139-3145. 10.1093/nar/20.12.3139.PubMed CentralView ArticlePubMed
- Swergold GD: Identification, characterization, and cell specificity of a human LINE-1 promoter. Mol Cell Biol. 1990, 10: 6718-6729.PubMed CentralView ArticlePubMed
- Howell R, Usdin K: The ability to form intrastrand tetraplexes is an evolutionarily conserved feature of the 3' end of L1 retrotransposons. Mol Biol Evol. 1997, 14: 144-155. 10.1093/oxfordjournals.molbev.a025747.View ArticlePubMed
- Martin SL, Li W-HP, Furano AV, Boissinot S: The structures of mouse and human L1 elements reflect their insertion mechanism. Cytogenet Genome Res. 2005, 110: 223-228. 10.1159/000084956.View ArticlePubMed
- Ostertag EM, Kazazian HH: Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome Res. 2001, 11: 2059-2065. 10.1101/gr.205701.PubMed CentralView ArticlePubMed
- Boissinot S, Chevret P, Furano AV: L1 (LINE-1) retrotransposon evolution and amplification in recent human history. Mol Biol Evol. 2000, 17: 915-928. 10.1093/oxfordjournals.molbev.a026372.View ArticlePubMed
- Hardies SC, Martin SL, Voliva CF, Hutchison CA: An analysis of replacement and synonymous changes in the rodent L1 repeat family. Mol Biol Evol. 1986, 3: 109-125.PubMed
- Khan H, Smit A, Boissinot S: Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res. 2006, 16: 78-87.PubMed CentralView ArticlePubMed
- Pascale E, Liu C, Valle E, Usdin K, Furano AV: The evolution of long interspersed repeated DNA (L1, LINE 1) as revealed by the analysis of an ancient rodent L1 DNA family. J Mol Evol. 1993, 36: 9-20. 10.1007/BF02407302.View ArticlePubMed
- Voliva CF, Martin SL, Hutchison CA, Edgell MH: Dispersal process associated with the L1 family of interspersed repetitive DNA sequences. J Mol Biol. 1984, 178: 795-813. 10.1016/0022-2836(84)90312-7.View ArticlePubMed
- Adey NB, Schichman SA, Graham DK, Peterson SN, Edgell MH, Hutchison CA: Rodent L1 evolution has been driven by a single dominant lineage that has repeatedly acquired new transcriptional regulatory sequences. Mol Biol Evol. 1994, 11: 778-789.PubMed
- Cabot EL, Angeletti B, Usdin K, Furano AV: Rapid evolution of a young L1 (LINE-1) clade in recently speciated Rattus taxa. J Mol Evol. 1997, 45: 412-423. 10.1007/PL00006246.View ArticlePubMed
- Smit AF, Toth G, Riggs AD, Jurka J: Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J Mol Biol. 1995, 246: 401-417. 10.1006/jmbi.1994.0095.View ArticlePubMed
- Ferguson NM, Galvani AP, Bush RM: Ecological and immunological determinants of influenza evolution. Nature. 2003, 422: 428-433. 10.1038/nature01509.View ArticlePubMed
- Holmes EC, Grenfell BT: Discovering the phylodynamics of RNA viruses. PLoS Comput Biol. 2009, 5: e1000505-10.1371/journal.pcbi.1000505.PubMed CentralView ArticlePubMed
- Fitch WM, Leiter JM, Li XQ, Palese P: Positive Darwinian evolution in human influenza A viruses. Proc Natl Acad Sci U S A. 1991, 88: 4270-4274. 10.1073/pnas.88.10.4270.PubMed CentralView ArticlePubMed
- Boissinot S, Furano AV: Adaptive evolution in LINE-1 retrotransposons. Mol Biol Evol. 2001, 18: 2186-2194. 10.1093/oxfordjournals.molbev.a003765.View ArticlePubMed
- Adey NB, Comer MB, Edgell MH, Hutchison CA: Nucleotide sequence of a mouse full-length F-type L1 element. Nucleic Acids Res. 1991, 19: 2497-10.1093/nar/19.9.2497.PubMed CentralView ArticlePubMed
- Casavant NC, Hardies SC: The dynamics of murine LINE-1 subfamily amplification. J Mol Biol. 1994, 241: 390-397. 10.1006/jmbi.1994.1515.View ArticlePubMed
- DeBerardinis RJ, Goodier JL, Ostertag EM, Kazazian HH: Rapid amplification of a retrotransposon subfamily is evolving the mouse genome. Nat Genet. 1998, 20: 288-290. 10.1038/3104.View ArticlePubMed
- Goodier JL, Ostertag EM, Du K, Kazazian HH: A novel active L1 retrotransposon subfamily in the mouse. Genome Res. 2001, 11: 1677-1685. 10.1101/gr.198301.PubMed CentralView ArticlePubMed
- Hardies SC, Wang L, Zhou L, Zhao Y, Casavant NC, Huang S: LINE-1 (L1) lineages in the mouse. Mol Biol Evol. 2000, 17: 616-628. 10.1093/oxfordjournals.molbev.a026340.View ArticlePubMed
- Mears ML, Hutchison CA: The evolution of modern lineages of mouse L1 elements. J Mol Evol. 2001, 52: 51-62.View ArticlePubMed
- Saxton JA, Martin SL: Recombination between subtypes creates a mosaic lineage of LINE-1 that is expressed and actively retrotransposing in the mouse genome. J Mol Biol. 1998, 280: 611-622. 10.1006/jmbi.1998.1899.View ArticlePubMed
- Schichman SA, Adey NB, Edgell MH, Hutchison CA: L1 A-monomer tandem arrays have expanded during the course of mouse L1 evolution. Mol Biol Evol. 1993, 10: 552-570.PubMed
- Wincker P, Jubier-Maurin V, Roizes G: Unrelated sequences at the 5' end of mouse LINE-1 repeated elements define two distinct subfamilies. Nucleic Acids Res. 1987, 15: 8593-8606. 10.1093/nar/15.21.8593.PubMed CentralView ArticlePubMed
- Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 2010
- Jubier-Maurin V, Cuny G, Laurent A-M, Paquereau L, Roizes G: A new 5' sequence associated with mouse L1 elements is representative of a major class of L1 termini. Mol Biol Evol. 1992, 9: 41-55.PubMed
- Jubier-Maurin V, Wincker P, Cuny G, Roizes G: The relationships between the 5' end repeats and the largest members of the L1 interspersed repeated family in the mouse genome. Nucleic Acids Res. 1987, 15: 7395-7410. 10.1093/nar/15.18.7395.PubMed CentralView ArticlePubMed
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.PubMed
- Jaeger J-J, Tong H, Buffetaut E: The age of Mus-Rattus divergence: paleontological data compared with the molecular clock. C R Acad Sci III. 1986, 302: 917-922.
- Maynard Smith J: Analyzing the mosaic structure of genes. J Mol Evol. 1992, 34: 126-129.
- Posada D, Crandall KA: Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci U S A. 2001, 98: 13757-13762. 10.1073/pnas.241370698.PubMed CentralView ArticlePubMed
- Martin DP, Posada D, Crandall KA, Williamson C: A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res Hum Retroviruses. 2005, 21: 98-102. 10.1089/aid.2005.21.98.View ArticlePubMed
- Martin D, Rybicki E: RDP: detection of recombination amongst aligned sequences. Bioinformatics. 2000, 16: 562-563. 10.1093/bioinformatics/16.6.562.View ArticlePubMed
- Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH: High frequency retrotransposition in cultured mammalian cells. Cell. 1996, 87: 917-927. 10.1016/S0092-8674(00)81998-4.View ArticlePubMed
- Hohjoh H, Singer MF: Sequence-specific single-strand RNA binding protein encoded by the human LINE-1 retrotransposon. EMBO J. 1997, 16: 6034-6043. 10.1093/emboj/16.19.6034.PubMed CentralView ArticlePubMed
- Khazina E, Weichenrieder O: Non-LTR retrotransposons encode noncanonical RRM domains in their first open reading frame. Proc Natl Acad Sci U S A. 2009, 106: 731-736. 10.1073/pnas.0809964106.PubMed CentralView ArticlePubMed
- Schichman SA, Severynse DM, Edgell MH, Hutchison CA: Strand-specific LINE-1 transcription in mouse F9 cells originates from the youngest phylogenetic subgroup of LINE-1 elements. J Mol Biol. 1992, 224: 559-574. 10.1016/0022-2836(92)90544-T.View ArticlePubMed
- Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Science. 1991, 252: 1162-1164. 10.1126/science.252.5009.1162.View ArticlePubMed
- Kordis D, Lovsin N, Gubensek F: Phylogenomic analysis of the L1 retrotransposons in Deuterostomia. Syst Biol. 2006, 55: 886-901. 10.1080/10635150601052637.View ArticlePubMed
- Casavant NC, Hardies SC: Shared sequence variants of Mus spretus LINE-1 elements tracing dispersal to within the last 1 million years. Genetics. 1994, 137: 565-572.PubMed CentralPubMed
- Rikke BA, Zhao Y, Daggett LP, Reyes R, Hardies SC: Mus spretus LINE-1 sequences detected in the Mus musculus inbred strain C57BL/6J using LINE-1 DNA probes. Genetics. 1995, 139: 901-906.PubMed CentralPubMed
- Casavant NC, Lee RN, Sherman AN, Wichman HA: Molecular evolution of two lineages of L1 (LINE-1) retrotransposons in the california mouse, Peromyscus californicus. Genetics. 1998, 150: 345-357.PubMed CentralPubMed
- Martin SL, Voliva CF, Hardies SC, Edgell MH, Hutchison CA: Tempo and mode of concerted evolution in the L1 repeat family of mice. Mol Biol Evol. 1985, 2: 127-140.PubMed
- Padgett RW, Hutchison CA, Edgell MH: The F-type 5' motif of mouse L1 elements: a major class of L1 termini similar to the A-type in organization but unrelated in sequence. Nucleic Acids Res. 1988, 16: 739-749. 10.1093/nar/16.2.739.PubMed CentralView ArticlePubMed
- Furano AV: The biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons. Prog Nucleic Acid Res Mol Biol. 2000, 64: 255-294.View ArticlePubMed
- Hayward BE, Zavanelli M, Furano AV: Recombination creates novel L1 (LINE-1) elements in Rattus norvegicus. Genetics. 1997, 146: 641-654.PubMed CentralPubMed
- Symer DE, Connelly C, Szak ST, Caputo EM, Cost GJ, Parmigiani G, Boeke JD: Human L1 retrotransposition is associated with genetic instability in vivo. Cell. 2002, 110: 327-338. 10.1016/S0092-8674(02)00839-5.View ArticlePubMed
- Coffin JM: Structure, replication, and recombination of retrovirus genomes: some unifying hypotheses. J Gen Virol. 1979, 42: 1-26. 10.1099/0022-1317-42-1-1.View ArticlePubMed
- Gilboa E, Mitra SW, Goff S, Baltimore D: A detailed model of reverse transcription and tests of crucial aspects. Cell. 1979, 18: 93-100. 10.1016/0092-8674(79)90357-X.View ArticlePubMed
- Kolosha VO, Martin SL: In vitro properties of the first ORF protein from mouse LINE-1 support its role in ribonucleoprotein particle formation during retrotransposition. Proc Natl Acad Sci U S A. 1997, 94: 10155-10160. 10.1073/pnas.94.19.10155.PubMed CentralView ArticlePubMed
- Kolosha VO, Martin SL: High-affinity, non-sequence-specific RNA binding by the open reading frame 1 (ORF1) protein from long interspersed nuclear element 1 (LINE-1). J Biol Chem. 2003, 278: 8112-8117. 10.1074/jbc.M210487200.View ArticlePubMed
- Martin SL: Ribonucleoprotein particles with LINE-1 RNA in mouse embryonal carcinoma cells. Mol Cell Biol. 1991, 11: 4804-4807.PubMed CentralView ArticlePubMed
- Martin SL: The ORF1 protein encoded by LINE-1: structure and function during L1 retrotransposition. J Biomed Biotechnol. 2006, 2006: 45621-PubMed CentralView ArticlePubMed
- Callahan KE, Hickman AB, Jones CE, Ghirlando R, Furano AV: Polymerization and nucleic acid-binding properties of human L1 ORF1 protein. Nucleic Acids Res. 2012, 40: 813-827. 10.1093/nar/gkr728.PubMed CentralView ArticlePubMed
- McClure MA, Richardson HS, Clinton RA, Hepp CM, Crowther BA, Donaldson EF: Automated characterization of potentially active retroid agents in the human genome. Genomics. 2005, 85: 512-523. 10.1016/j.ygeno.2004.12.006.View ArticlePubMed
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance and maximum parsimony methods. Mol Biol Evol. 2011, 28: 2731-2739. 10.1093/molbev/msr121.PubMed CentralView ArticlePubMed
- Delport W, Poon AF, Frost SD, Kosakovsky Pond SL: Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics. 2010, 26: 2455-2457. 10.1093/bioinformatics/btq429.PubMed CentralView ArticlePubMed
- Pond SL, Frost SD, Muse SV: HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2005, 21: 676-679. 10.1093/bioinformatics/bti079.View ArticlePubMed
- Scheffler K, Martin DP, Seoighe C: Robust inference of positive selection from recombining coding sequences. Bioinformatics. 2006, 22: 2493-2499. 10.1093/bioinformatics/btl427.View ArticlePubMed
- Pond SL, Frost SD: A genetic algorithm approach to detecting lineage-specific variation in selection pressure. Mol Biol Evol. 2005, 22: 478-485.View ArticlePubMed
- Kosakovsky Pond SL, Frost SD: Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol. 2005, 22: 1208-1222. 10.1093/molbev/msi105.View ArticlePubMed
- Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera , Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428: 493-521.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.