Revisiting the evolution of mouse LINE-1 in the genomic era
© Sookdeo et al.; licensee BioMed Central Ltd. 2013
Received: 19 July 2012
Accepted: 25 October 2012
Published: 3 January 2013
Skip to main content
© Sookdeo et al.; licensee BioMed Central Ltd. 2013
Received: 19 July 2012
Accepted: 25 October 2012
Published: 3 January 2013
LINE-1 (L1) is the dominant category of transposable elements in placental mammals. L1 has significantly affected the size and structure of all mammalian genomes and understanding the nature of the interactions between L1 and its mammalian host remains a question of crucial importance in comparative genomics. For this reason, much attention has been dedicated to the evolution of L1. Among the most studied elements is the mouse L1 which has been the subject of a number of studies in the 1980s and 1990s. These seminal studies, performed in the pre-genomic era when only a limited number of L1 sequences were available, have significantly improved our understanding of L1 evolution. Yet, no comprehensive study on the evolution of L1 in mouse has been performed since the completion of this genome sequence.
Using the Genome Parsing Suite we performed the first evolutionary analysis of mouse L1 over the entire length of the element. This analysis indicates that the mouse L1 has recruited novel 5’UTR sequences more frequently than previously thought and that the simultaneous activity of non-homologous promoters seems to be one of the conditions for the co-existence of multiple L1 families or lineages. In addition the exchange of genetic information between L1 families is not limited to the 5’UTR as evidence of inter-family recombination was observed in ORF1, ORF2, and the 3’UTR. In contrast to the human L1, there was little evidence of rapid amino-acid replacement in the coiled-coil of ORF1, although this region is structurally unstable. We propose that the structural instability of the coiled-coil domain might be adaptive and that structural changes in this region are selectively equivalent to the rapid evolution at the amino-acid level reported in the human lineage.
The pattern of evolution of L1 in mouse shows some similarity with human suggesting that the nature of the interactions between L1 and its host might be similar in these two species. Yet, some notable differences, particularly in the evolution of ORF1, suggest that the molecular mechanisms involved in host-L1 interactions might be different in these two species.
Long interspersed nuclear element-1 (LINE-1 or L1) constitutes the dominant category of transposable elements in mammalian genomes. L1s have accumulated in the genomes of their mammalian hosts in extremely large numbers and contribute to more than 20% of genome size in human and mouse [1, 2]. L1s have been a rich source of evolutionary novelties by providing motifs that can be recruited by the host either for the regulation of its own genes or within its coding sequences [3–6]. However, L1 activity can also be detrimental to the fitness of the host [7, 8], either by inserting within genes [9, 10] or by mediating chromosomal rearrangements through ectopic (non-allelic) recombination [11, 12]. L1 elements replicate using a copy-and-paste mechanism that involves the reverse-transcription of the L1 RNA at the insertion site [13–15]. L1 encodes the replicative machinery necessary for the retrotransposition reaction. It contains two open-reading frames (ORFs) that are both indispensable for L1 retrotransposition. ORF1 encodes a trimeric protein with RNA-binding properties and nucleic-acid chaperone activity [16–20]. ORF2 encodes an endonuclease that makes the first nick at the insertion site and a reverse-transcriptase that copies L1 RNA into DNA at the site of insertion [21, 22]. L1 has a 5’ untranslated region (UTR) that acts as an internal promoter [23, 24] and a 3’ UTR with a conserved poly-G tract of unknown function . The L1 retrotransposition reaction produces mostly 5’ truncated elements that are transpositionally inactive [26, 27]. As the vast majority of L1 insertions do not serve a function for the host, they accumulate mutations at the neutral rate so that young families of L1 elements are less divergent than older ones [28–32].
The pattern of L1 element evolution in mammals is very unusual. In most species analyzed so far, L1 evolves as a single lineage: a family of elements emerges, amplifies to hundreds or thousands of copies and then becomes extinct, being replaced by a more recently evolved family [30, 33–35]. This process is exemplified in human where a single lineage of replicatively dominant families has evolved over the last 40 MY . The reason(s) why L1 evolves as a single lineage remains unclear but the similarity between L1 and H3N2 influenza A virus evolution [36–38] suggests that the single lineage mode of evolution could result from a co-evolutionary arms race between L1 and its host. This hypothesis is supported by the observation that the coiled-coil domain of ORF1 harbors the signature of adaptive evolution, possibly in response to host repression , and that adaptive evolution apparently correlates with the replicative success of L1 families . However, in early primate evolution (from 70 to 40MY), multiple L1 lineages have co-existed in the human genome . Interestingly, co-existing lineages always had non-homologous 5’UTRs suggesting that their co-existence could be due to their reliance on different host factors for their transcription.
The patterns described above result mostly from the analysis of the human genome and it is unclear how patterns of evolution in human recapitulate L1 evolution in other species. It is thus important to examine in greater detail the evolution of L1 lineages in other mammals. Pre-genomics studies in the house mouse (Mus musculus) have demonstrated the presence of multiple concurrently active L1 families with non-homologous promoters [33, 40–48]. Recently active families are classified into two groups based on their promoter types (A or F types), whereas ancestral L1 families carry a third promoter, the V type. The co-existence of multiple L1 families with different promoters in extant mice recapitulates the situation in early primate evolution and provides a unique opportunity to investigate the interactions between concurrent L1 families and the molecular properties that would allow for such co-existence.
Previous L1 studies in mice were limited to sequence analysis performed on a few L1 loci, the majority of which were fragments of L1 inserts. No detailed study of L1 evolution in mouse has been performed since the completion of the mouse genome sequence . With the availability of this genome, we decided to perform a comprehensive analysis of full-length L1 elements to investigate the evolutionary dynamics of L1 in mouse. We present evidence that the diversification of mouse L1 has been influenced by frequent events of recombination across the entire length of the element, rapid structural changes in ORF1, as well as lateral transfer by inter-specific hybridization.
A total of 20,459 L1 inserts with complete reverse transcriptase (RT) domains were identified using the Genome Parsing Suite (GPS). L1 elements were first grouped based on their 5’UTR. This was done by comparing the 5’ end of all elements with a library of previously described mouse 5’UTR using the Repeatmasker program . The A, F, V, and Lx 5’UTR types have long been characterized [33, 50, 51] and the majority of elements could be assigned to one of these 5’UTR sequences. A number of elements however carried 5’UTRs distinct from these four types. These elements were aligned to each other and grouped into three novel types of 5’UTR: (1) a 5’UTR with similarity to the F type but with distinctive features, named Fanc (for F ancestral); (2) a 5’UTR that was not characterized before, named Mus (because it is absent from the rat genome); and (3) a 5’UTR that shows no similarity with any others, named N (for novel).
Once elements were sorted based on their 5’UTRs, they were further categorized into families using a phylogenetic analysis of the 3’ terminus. A family is defined as a collection of elements that result from the activity of a highly homogenous group of progenitors, which are characterized by a unique combination of characters. In the first step of the phylogenetic analysis, neighbor joining trees  of elements sharing similar 5’UTRs were built. Distinct clusters of sequences were provisionally considered families and were validated by a second round of phylogenetic analysis based on the principle that elements belonging to the same family should yield a star phylogeny (that is, a phylogenetic tree devoid of structure) because these elements result from the activity of very similar progenitors. These families were further confirmed by phylogenetic analysis performed on other regions of L1 to ensure that the homogeneity of the families extend over the entire length of the element.
Copy number, divergence, and age of mouse L1 families
Repeat masker classification
Genomic copy numberb
Number of FL elements
Average pairwise divergence (% ± S.E.)c
0.376 ± 0.073
2.939 ± 0.294
3.916 ± 0.304
4.346 ± 0.414
5.167 ± 0.341
8.554 ± 0.434
8.346 ± 0.414
0.462 ± 0.095
0.496 ± 0.087
2.233 ± 0.196
1.356 ± 0.250
3.929 ± 0.421
3.853 ± 0.278
4.537 ± 0.271
8.040 ± 0.400
11.627 ± 0.503
11.683 ± 0.487
12.366 ± 0.610
16.795 ± 0.821
L1VL1, L1Md_F, L1Md_F3
3.447 ± 0.212
15.257 ± 0.647
18.318 ± 0.855
17.575 ± 0.968
12.068 ± 0.590
14.971 ± 0.521
19.864 ± 0.846
23.907 ± 0.998
18.595 ± 0.841
25.642 ± 1.237
One of the most striking features visible on the tree is that families with similar 5’UTRs do not form monophyletic groups indicating that L1 families have frequently recruited novel 5’UTR, either from unknown sources or from ancient families. The oldest families in our study carried an Lx promoter, which was replaced three times: once by the Fanc promoter (L1MdFanc_II) and by the V promoter twice (L1MdV_II and III). The Fanc promoter was replaced independently twice by the Mus promoter as L1MdMus_I and L1MdMus_II do not form a monophyletic group. The Mus promoter was eventually replaced by the V promoter (L1MdV_I) and went extinct. The F promoter was then resuscitated approximately 6.4 MY ago and gave rise to families L1MdF_I to V. Approximately 4.6 MY ago the A promoter was recruited yielding the modern A lineage which extend from families L1MdA_VII to I. Within this lineage, an additional recruitment occurred resulting in the L1MdN_I family. Finally the F promoter was recently recruited twice, approximately 2.2 MY by the L1MdGf_II family and approximately 1.2 MY by the Tf/Gf_I lineage. Thus we estimate that L1 in mouse has experienced 11 replacements of 5’UTR.
The topology of the ORF2 tree indicates that mouse L1 families evolved mostly as a single lineage. This does not mean that a single family or single lineage was active at a time. In fact, the co-existence of multiple active families characterizes the evolution of L1 for the last 13MY of mouse evolution. For instance between 1 and 2.5 MY ago, six families (L1MdTf_III, L1MdA_II, L1MdA_III, L1MdGf_II, LMdN_I, and L1MdF_I) were active in the mouse genome as attested by the overlap in their average pairwise divergence (Table 1). In some cases, several families evolved into lineages that diversified and co-existed with the dominant lineage for several MY. The lineage composed of L1MdF_I, II, and III is the one that co-existed the longest with the lineage that yielded the currently active families. L1MdF_I was active 2.12 MY ago, at about the same time as families L1MdA_III and L1MdN_I. These families, however, are all descendants of family L1MdF_IV which was active 6.4 MY ago (Figure 1 and Table 1). Thus the lineage consisting of L1MdF_I, II, and III co-existed with the lineage that produced L1MdA_III and L1MdN_I for more than 4 MY. Eventually the L1MdF lineage became extinct. Thus the cascade structure of the ORF2 tree, typical of the single lineage mode of evolution reported in other mammals, is consistent with a model in which multiple families are concurrently active until one of them attains replicative supremacy, coinciding with the extinction of its competitors.
Because L1 families have frequently recruited novel promoters we decided to examine if L1 lineages have exchanged genetic information in other regions of the element. To this end, several methods implemented in the RDP 3.0 software were used: two substitution-based approaches, MaxChi  and Chimera , and two phylogenetic approaches, Bootscan  and RDP . Breakpoints and statistically significant events of genetic recombination detected by RDP were verified by visual inspection of the FL consensus alignment (see Additional file 3) and phylogenetic analyses. A minimum of six recombination events was detected.
The next oldest recombination event is between the ancestor of L1MdA_IV (which is the ancestor of L1MdA_I, II, and III) and L1MdF_II, near the 3’ end of the element (Figure 2D). A 666 bp region was transferred from L1MdF_II to the L1MdA_IV family. This fragment is also found in all L1MdA sequences derived from L1MdA_IV as well as the Gf and Tf families since they also acquired their ORF2 and 3’UTR from an ancestral L1MdA family. Similarly, a segment located in the coiled-coil domain of ORF1 was transferred from L1MdMus_II to L1MdA_VII and L1MdA_VI (Figure 2E). Subsequently an overlapping region was transferred from L1MdA_VII or L1MdA_VI to L1MdF_III. This segment is also found in L1MdGf_II as this family got its ORF1 from L1MdF_III.
It should be noted that our criteria for identifying recombination events were stringent, as we only considered the recombination of large segments to be significant. Thus it is plausible that exchanges of sequences of shorter length have occurred between L1 families but were not detected due to the small number of defining characters in some conserved regions of L1, such as ORF2. The number of recombination events reported here suggests that recombination has played a significant role in the evolution of novel L1 families in mouse and can occur across the entire length of L1.
Summary of selection detection tests
Positively selected sites
Number of branches with positive selection
0.494 ± 0.275
0.608 ± 0.401
0.354 ± 0.371
5' terminus (1–1,170)
0.308 ± 0.411
3' terminus (1171-end)
0.229 ± 0.353
We examined the level of conservation of domains of ORF1 that are known to be functionally important [19, 59, 60]. Three domains have been identified: a coiled coil (CC) domain that mediate the formation of ORF1p trimers, a RNA-recognition motif (RRM), and a C-terminal domain (CTD). The 3’ half of ORF1, which contains the RRM and CTD domains, as well as approximately the first 50 amino acids of ORF1 are very conserved across families, in contrast with the CC domain that shows a high level of structural variation. We analyzed independently the 5’ terminus, the CC domain, and the 3’ half of ORF1 for evidence of selection using recombination breakpoints as boundaries. All the methods used strongly indicated that the 5’ terminus and the 3’ half of ORF1 are evolving under purifying selection. The PARRIS method rejected the hypothesis that a subset of amino acid is evolving under positive selection and the GABranch method showed that dN/dS has remained significantly lower than 1 in these regions during the entire evolutionary span covered by the analysis. This is not surprising, especially for the 3’ half of ORF1, as the RRM and CTD motifs were shown to be conserved across mammals . The SLAC, FEL, and REL programs failed to identify a single amino acid under positive selection at the 5’ end. In 3’, the REL method identified two amino acids under positive selection but these residues are likely to be false-positive as the changes in amino acid result from independent events of mutation at CpG nucleotides, which are known for their unusually high mutation rate.
More surprising is the degree of conservation at the amino acid level of the CC domain. Previous studies have shown that the CC domain of ORF1 has evolved under positive selection in primates [30, 39]. In the case of the mouse, surprisingly, the PARRIS method rejected the hypothesis that some amino acid evolved under positive selection, although a moderately high dN/dS ratio was obtained (0.608), and the GA Branch method failed to identify a single branch in the evolution of the coiled coil with a dN/dS >1. Out of the three methods (SLAC, FEL, and REL) used to detect selection at specific amino acids, only one (REL) identified two amino acids that could have evolved under positive selection. It is thus plausible that these two sites are false-positive as they have been identified by a single method. Even if these sites are evolving under positive selection, it remains true that the signature of positive selection in the mouse CC is much weaker than it is in human [30, 39].
We performed the first comprehensive analysis of L1 evolution since the completion of the mouse genome . The analysis is limited to the most recently active L1 families and covers approximately the last 13 MY of mouse evolution. As murine rodents evolve approximately eight times faster than hominoids, the amount of evolutionary change investigated here is similar to previous studies in humans that covered more than 80 MY of primate evolution [30, 35]. The results are consistent with the large number of analyses performed in the pre-genomic era [32, 33, 41–45, 50, 65–68] but, by focusing solely on intact FL elements, we were able to provide for the first time a complete picture of the evolution of mouse L1 families over the entire length of the element.
The evolution of L1 in mouse fits the single lineage mode of evolution described previously in other mammals and particularly in human [30, 35, 63, 69]. This is exemplified by the similarity between the tree in Figure 1 and the tree based on the human ORF2 (Figure 2 in ). This model is based on the observation that L1 phylogenies have a typical cascade structure that is best explained by the successive activity of L1 families: a single family, or a group of closely related families, is active at a given point in time until a new family emerges and replaces the pre-existing family, which usually becomes extinct. In some instances, however, several lineages may co-exist until one eventually becomes extinct. This is the case of the L1MdF_I, II, and III lineage which co-existed with the dominant lineage for approximately 4 MY and of the Tf and L1MdA_I, II, and III lineages that co-existed for about 2 MY and are still active in the mouse genome. In ancestral primates a similar situation occurred but on a much longer period of evolutionary time as the L1PB and L1PA lineages co-existed for 30 MY . We previously observed that, in human, L1 lineages that co-exist for extended periods always have different promoter sequences. We proposed that families with different promoter sequences rely on different host-factors for their transcription and are consequently not relying on the same host-encoded resources . This situation allows them to co-exist as they are not using the same genomic ‘niche’. In mouse the same observation can be made. The lineage composed of L1MdF_I, II, and III co-existed with the main lineage when this one was dominated by families carrying the A promoter (L1MdA_III to VI). Similarly, the two lineages that are currently active, the L1MdA_I, II, and III and the L1MdTf/Gf, carry different, non-homologous 5’UTRs. Thus, it is possible that the conditions that allow for multiple lineages to co-exist are the same in mouse and in human. Unlike in modern human where a single family is currently active (the Ta family) , the modern house mouse genome harbors several families with different 5’ UTR and consequently present an excellent model to test experimentally the hypothesis that the activity of different 5’UTR is one of the conditions for the co-existence of families and lineages.
The analysis of FL elements has revealed the extraordinary ability of L1 families to acquire novel motifs and to exchange sequences (Figures 2 and 3). The recruitment of novel 5’UTR sequences [30, 33] as well as the recombinant nature of some L1 families in mouse [45, 46] and rat [34, 69, 70] have long been described. Three mechanisms have been proposed to account for the mosaic nature of some families. First, recombination between genomic copies, that is at the level of DNA templates, could result in the formation of a novel transpositionally competent family. This hypothesis has been discounted on the basis that it is highly unlikely that a chance recombination event between two replicatively competent elements occurred while recombination between any of the hundreds of thousands L1 pseudogenes, the majority of which have suffered the effect of inactivating mutations, is much more likely to produce an inactive element . Second, recombination could occur at the time the L1 RNA is reverse-transcribed and could result from the formation of a RNA/DNA heteroduplex between the L1 RNA and a genomic copy at the insertion site . This model is supported by the observation that the recruitment of novel motifs seems to be directional as it is always a chronologically young 3’ end that recruits an older 5’ terminus . Third, mosaic elements could be produced if the L1 encoded reverse transcriptase switches RNA strand at the time of insertion. Polymerase strand-switching is a well-known feature of RNA viruses [72, 73]. This mechanism insures that recombination occurs between replicatively competent elements, that is elements that carry a 5’UTR capable of driving their transcription. The third model predicts that recombination occurs only between families that are simultaneously active whereas the first and second models do not have such a requirement. We found that the exchange of genetic information occurs both between simultaneously active families and by resuscitation of motifs from extinct families. For instance, the coiled-coil domain of L1MdMus_II has been recruited by L1MdA_VII about 4.6 MY ago, long after the extinction of L1MdMus_II which was active 8.23 MY ago. The L1MdGf_II family is also the product of a recombination between two families that were not active simultaneously, the L1MdF_III and the L1MdA_III families (which amplified 4.42 and 2.15 MY ago, respectively). All other instances of recombination occurred between families that were simultaneously active, which is consistent with the polymerase strand-switching model. Similarly, the acquisition of novel 5’UTRs tend to result from the transfer of 5’ termini between families that were active at the same time. This is exemplified by the evolution of the F-type which was transferred from L1MdFanc_I (active 6.80 MY ago) to the ancestor of L1MdF_V (at 6.43 MY) and subsequently transferred from L1MdF_I (active 2.12 MY ago) to the recently active L1MdTf and L1MdGf families.
The first ORF is arguably the least understood region of L1, although it has been the subject of much attention in the past few years [17–20, 59, 60, 74–78]. Its secondary structure has been resolved as a dumbbell shape resulting from the formation of a trimeric structure mediated by the coiled coil domain . It is established that it has RNA-binding abilities, mediated by the RRM, can act as a nucleic acid chaperone [19, 20] and form multimers in the presence of nucleic acids . Previous studies have shown that the 3’ half of ORF1 is very conserved  and our analysis confirms this is the case in mouse. In contrast, studies in human have demonstrated that the coiled-coil domain is evolving under strong positive selection as indicated by the high values of dN/dS reported in the evolution of this region [30, 39]. Such a rapid evolution at the amino-acid level is certainly adaptive and it was proposed that this was the result of an arms-race between L1 and its human host. This hypothesis was further supported by the fact that periods of adaptive evolution in the coiled coil coincide with period of intense L1 activity . However, we failed to find strong evidence of adaptive evolution in the mouse coiled coil. In contrast we found an extraordinary level of structural instability in this region (Figure 4), unexpected in a protein coding region critical for the multimeric structure of the functional protein. Instability in this region has also been described in the rat L1 suggesting a common role for these structural changes in these two species [34, 69]. Structural changes in the coiled coil occur so frequently that it is tempting to speculate that they are adaptive, and are evolutionarily equivalent to periods of intense amino acid replacement in humans.
We performed a comprehensive analysis of L1 evolution in mouse. This analysis covered the last 13 MY of mouse evolution, since the split between mouse and rat. The mouse L1 has evolved as a single lineage for most of its evolution, although co-existence between families carrying different promoter sequences was observed. L1 families have frequently acquired novel 5’UTR and have exchanged sequences over the entire length of the element. No evidence of rapid amino acid replacement in the ORF1 was detected, although it is likely that the structural instability of the CC domain is adaptive. The general pattern of evolution of mouse L1 is similar to the one in human suggesting that the nature of the interactions between L1 and its host might be similar in these two species. There are however some intriguing differences between mouse and human, particularly in the evolution of ORF1. These differences suggest that the molecular mechanisms involved in host-L1 interactions might be different in these two species.
Full-length (FL) elements were collected from the Mus musculus 2006 (mm8) genome built using the GPS . GPS conducted a BLAST type-search (WU-tBLASTn) of the genome using the conserved Reverse Transcriptase (RT) domain of ORF2 as a query. GPS then cut 7,000 bp upstream and downstream of the RT domain yielding a 14,000 bp fragment. A second WU-tBLASTn was then performed on the 14,000 bp cutouts to identify regions characteristic of L1 (ORF1, the endonuclease domain of ORF2, the RT domain, and the 3’UTR). In this analysis, GPS did not search for sequence identity at the 5’ end since L1 is known to frequently recruit novel sequences as 5’UTR [30, 33]. Thus, a file containing 3,000 bp upstream of ORF1 was generated for further analyses. The FL sequences were first sorted based on their 5’UTRs. Once elements were sorted based on their 5’UTRs, they were further categorized into families using a phylogenetic analysis of the 3’ terminus. A family is defined as a collection of elements that result from the activity of a highly homogenous group of progenitors, which are characterized by a unique combination of characters. In the first step of the phylogenetic analysis, neighbor joining trees  of elements sharing similar 5’UTRs were built. Distinct clusters were provisionally considered families and were validated by a second round of phylogenetic analysis based on the principle that elements belonging to the same family should yield a star phylogeny because they result from the activity of similar progenitors. These families were further confirmed by phylogenetic analysis performed on other regions of L1 to verify that the homogeneity of the families extend over the entire length of the element. Full-length consensus sequences were derived for each family and are available on Repbase. Phylogenetic analyses were performed using the neighbor joining (NJ) method  based on the maximum composite likelihood parameters distance included in the MEGA 5.01 software package . The model that best fits the data was determined for each alignment using MEGA. The robustness of each phylogenetic tree was assessed using a bootstrap procedure with 1,000 replicates. Families were named by the name of the 5’ promoter (A, F, Fanc, V, Lx, Mus, or N; see result) followed by a roman number. The smaller the roman number, the younger the family is. For instance families L1MdA_I, L1MdA_II, and L1MdA_III are subsets of the previously described L1MdA family; family L1MdA_I is younger than family L1MdA_II and family L1MdA_III is the oldest of the three. We kept the Gf  and Tf  names for the recently active Tf and Gf families because these names have been widely used in the literature.
NJ, maximum parsimony (MP), and maximum likelihood (ML) trees were calculated for each region of L1. Phylogenetic trees were reconstructed using the MEGA 5.01 package . The RDP3.0 program (Recombination Detection Program 3.0, available at http://darwin.uvigo.es/rdp/rdp.html) was used to search for evidence of recombination among families. RDP allows for the use of several recombination detection methods including substitution and phylogeny-based methods. Two substitution-based methods, MaxChi  and Chimaera , as well as a phylogenetic method, bootscan , were used to analyze the datasets. The RDP software also includes its own unique algorithm termed ‘RDP’  which is also a phylogenetic approach to detecting recombination. A window size of 50 bp was used to detect breakpoints between consensus sequences. Statistically significant events of recombination were verified by comparing phylogenetic trees on each side of the putative breakpoint.
To test for evidence of selection in the evolution of L1 several methods implemented in the web server http://www.datamonkey.com of the HyPhy program  were used. The first method uses a maximum likelihood approach (PARRIS) to determine if a proportion of site in an alignment evolves with a ratio dN/dS>1 . A ratio significantly >1 is indicative of positive selection whereas a ratio <1 is indicative of purifying selection. The second method, GABranch  can detect lineage-specific variation in selective pressure and requires no a priori specification of branches in a phylogeny that may have evolved under different values of dN/dS. The dN/dS test is however not very sensitive, particularly if selection acts on a few codons. For this reason we used three methods designed to detect the action of positive or negative selection at specific sites in an alignment: Single Likelihood Ancestor Counting (SLAC), a Random Effects Likelihood (REL), and Fixed Effects Likelihood (FEL) . For each dataset, the model that best fits the data was determined using the tool available at datamonkey.com. As selection detection methods are sensitive to recombination, we performed our analyses independently for each segment of L1 flanked by recombination breakpoint. Previous studies on human L1 have documented positive selection in the coiled-coil (CC) domain of ORF1 [30, 39]. CC structures are formed from two or more α-helical peptide chains that contain a distinct arrangement of non-polar side chains . Domains that can form CC consist of heptads (or seven residue repeats) with non-polar or hydrophobic residues in the first and fourth positions. The program COILS  was used to identify the position of the CC domain in each consensus sequence as well as the number of constitutive heptads.
The age of each subfamily was estimated by calculating the average pairwise divergence based on the 3’UTR. CpG dinucleotides and the highly mutable polypurine tract located in the 3’UTR were removed from alignment. The average divergence between copies as well as the standard error was calculated using the maximum likelihood parameter distance (using the MEGA 5.01 software). Divergences were converted to time assuming a neutral rodent genomic substitution rate of 1.1%/MY (calculated using the data presented on Table 5 of  and assuming a divergence Mus/Rattus at 13MY ).
The consensus sequences are available in Repbase (http://www.girinst.org/repbase/).
Long Interspersed Nuclear Elements-1
Million of year
Open reading frame
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.