Modular organization and reticulate evolution of the ORF1 of Jockey superfamily transposable elements
© Metcalfe and Casane; licensee BioMed Central Ltd. 2014
Received: 12 March 2014
Accepted: 30 May 2014
Published: 1 July 2014
Long interspersed nuclear elements (LINES) are the most common transposable element (TE) in almost all metazoan genomes examined. In most LINE superfamilies there are two open reading frames (ORFs), and both are required for transposition. The ORF2 is well characterized, while the structure and function of the ORF1 is less well understood. ORF1s have been classified into five types based on structural organization and the domains identified. Here we perform a large scale analysis of ORF1 domains of 448 elements from the Jockey superfamily using multiple alignments and Hidden Markov Model (HMM)-HMM comparisons.
Three major lineages, Chicken repeat 1 (CR1), LINE2 (L2) and Jockey, were identified. All Jockey lineage elements have the same type of ORF1. In contrast, in the L2 and CR1 lineage elements, all five ORF1 types are found, with no one type of ORF1 predominating. A plant homeodomain (PHD) is much more prevalent than previously suspected. ORF1 type variations involving the PHD domain were found in many subgroups of the L2 and CR1 lineages. A Jockey lineage-like ORF1 with a PHD domain was found in both lineages. A phylogenetic analysis of this ORF1 suggests that it has been horizontally transferred. Likewise, an esterase containing ORF1 type was only found in two exclusively vertebrate L2 and CR1 groups, indicating that it may have been acquired in a vertebrate common ancestor and then transferred between the lineages.
The ORF1 of the CR1 and L2 lineages is very structurally diverse. The presence of a PHD domain in many ORF1s of the L2 and CR1 lineages is suggestive of domain shuffling. There is also evidence of possible horizontal transfer of entire ORF1s between lineages. In conclusion, while the structure of the ORF2 appears to be highly constrained and its evolution tree-like, the structure of the ORF1 within the CR1 and L2 lineages is much more variable and its evolution reticulate.
KeywordsLong interspersed nucleotide elements Non-long terminal retrotransposon Open reading frame 1 Plant homeodomain RNA recognition motif
Transposable elements (TEs) are mobile genetic elements found in nearly all eukaryotic genomes and are the major contributor to variation in genome size . They are genomic ‘invaders’, one type of genomic component involved in genomic conflict with the host genome. There is an increasing body of evidence suggesting that the evolution of TEs is reticulate [2–6]. For example, the envelope domain has been independently acquired by three Gypsy lineages .
The ORF1 classification is based on a sample of 14 ORF1s from 10 LINE clades . Clade allocation was based on the Repbase sequence title, which theoretically indicates the clade that the element belongs to . Here we explore the structure and evolution of the ORF1s of Jockey superfamily/group elements in more depth within a phylogenetic framework. We used all full-length Jockey superfamily/group sequences from the Repbase database  for two reasons. First, Repbase is the most comprehensive and widely used TE database. Second, many entries are consensus sequences, allowing us to examine a wide range of elements. We examined 448 full-length Jockey superfamily/group elements. ORF1 structures were determined by multiple alignment and HMM-HMM comparison against three protein databases. The structures were then mapped onto an APE and RT phylogeny. We identified ORF1 types in clades where they had been not previously described. We also identified structural variations of the ORF1 types. We propose that there has been ORF1 domain shuffling in Jockey superfamily/group elements, and that in some instances entire ORF1s may have been horizontally acquired.
Sequence retrieval from Repbase and Repeatmasker classification
One thousand two hundred forty nine Jockey superfamily/group sequences from the Jockey, Rex1, CR1, L2, L2A, L2B, Daphne, and Crack clades were downloaded from the Repbase database . These were classified by Repeatmasker as 536 CR1, 422 L2, 54 RexBarber, 206 Jockey, 20 L1 and 1 R1 type sequences. The L1 and R1 sequences were removed. Only one complete RexBarber sequence was found, so this was also removed. After aligning and removing all incomplete sequences, 451 sequences remained: 235 CR1, 87 Jockey and 129 L2 type sequences. Three sequences that did not fall clearly into a subgroup (see next section) - one sequence in the CR1 lineage and two in the L2 lineage - were not further analyzed.
Phylogenetic analysis, clade assignment and ORF1 domains identified
Identification of ORF1 domains
Av. RT nt% pairwise identityb
Av. aa% pairwise identityf
The lineages and subgroups identified were compared with the clade assignment based on the Repbase sequence name and the RTclass1 tool . Clade assignments were concordant with our phylogenetic analysis and Repeatmasker type except for the L2 sequences (Figure 3). These were split into four clades, Daphne, Kiri, L2 and L2B by the RTclass1 tool. The Repbase sequence names did not always reflect clade assignments [see Additional file 1].
The beginning of the ORF1 was identified in all sequence alignments except for two of the four L2 subgroup 8 sequences, which are also lacking the 5’ untranslated region (UTR). Three main domains were identified, a gag-like CCHC domain, an RRM motif and a PHD (Table 1). A sequence logo of the PHD and CCHC domains for all sequences in which they were found is shown in Additional file 2. Alignments for two examples of RRM domains (CR1 subgroup 3 and L2 subgroup 6) are shown in Additional file 3. The number of sequences in each subgroup from the three lineages, that is, CR1, L2 and Jockey, and the domains identified, are summarized in Table 1. Pairwise identity at the amino acid level for the ORF1 domain sequence alignments range from 21.7 to 85.9% and probabilities from 37.1 to 100% (Table 1). Only four domains have probabilities less than 85%, the RRM domain in the L2 subgroup 2, the zinc finger in the CR1 subgroup 7, the RRM domain in CR1 subgroup 4 and the RRM + CTD domain in CR1 subgroup 5.
Five ORF1 types were identified by Khazina and Weichenrieder  using a sample of 14 ORF1s from 10 LINE clades. Using as a basis the ORF1 structure described and the elements used by Khazina and Weichenrieder , we classified the ORF1s identified in this study into the same types, but have also added a subtype category A, B and C to describe variations (Figure 2). We have classified the ORF1 from the CR1 subgroup 2 as type V, which is an unclassified ORF1, because we were able to identify less than 10% of the entire ORF. Type I has at least one RRM domain immediately upstream of a CCHC zinc knuckle . In our analysis, this ORF1 type was found in all Jockey lineage elements, CR1 subgroup 3 and L2 subgroups 2, 8 and 10 (Figures 2, 4, 5, 6 and Table 1). Type II is found in the human L1 element and has a CC domain, a single RRM domain and a CTD domain . In the Research Collaboratory for Structural Bioinformatics (RCSB) and the Protein families (Pfam) databases the three domains are submitted as a single entry, transposase 22 (2yko_A and PF02994, respectively). This type was identified in several CR1 and L2 subgroups (Figures 2, 4, 6 and Table 1). In the L2 subgroup 5 a PHD domain was found downstream from the transposase 22, after a stop and start codon and therefore is probably at the beginning of the ORF2. A PHD domain was also identified in L2 subgroup 6 and CR1 subgroup 5, at the N-terminus of the ORF1 (Figures 2, 4, 6 and Table 1). For Type III, Khazina and Weichenrieder  predicted an occasional C-terminal RRM in addition to the PHD domain. A single PHD domain was found by us in CR1 subgroup 6 and L2 subgroup 4 and an RRM domain associated with an PHD domain in CR1 subgroup 4. Type IV is an ORF1 with an esterase domain , sometimes associated with a zinc finger/leucine zipper , and was identified in CR1 subgroup 7 and L2 subgroup 9.
ORF1 Clans clustering and phylogenetic analysis of Type I ORF1 domains
In most LINE superfamily/group elements there are two ORFs, in which the ORF2 codes for at least two domains, an APE and RT ( Figure 1). In contrast to the ORF2, the structure of the ORF1 is not only less well characterized but also more structurally variable. Based on a sample of 14 ORF1s from 10 LINE clades, ORF1s have been previously classified into five types, depending on the organization and type of domain present  (Figure 2). Elements from the Jockey superfamily/group exhibit the highest ORF1 diversity. This diversity is chiefly found in the CR1 elements, in which three of the five types have been identified . A large scale analysis of the ORF1 of Jockey superfamily/group elements has not been previously attempted. Here we map the structure of the ORF1 from 448 Jockey superfamily/group elements onto a phylogenetic framework.
ORF2 phylogenetic and clade analysis
Full-length elements from the eight clades of the Jockey superfamily/group, Jockey, Rex1, CR1, L2, L2A, L2B, Daphne and Crack, were assigned by phylogenetic analysis to three well supported lineages, L2, CR1 and Jockey (Figure 3). This assignment is consistent with the ‘type’ classification by Repeatmasker  [see Additional file 1]. Elements were further assigned to clades using the RTclass1 tool . Repbase sequence names theoretically reflect the clade that they are assigned to . Clade assignments were concordant with our phylogenetic analysis and Repeatmasker type except for the L2 sequences (Figure 3). These were split into four clades, Daphne, Kiri, L2 and L2B, by the RTclass1 tool (Figure 4). For these four clades, the Repbase sequence names did not consistently reflect clade assignments [see Additional file 1].
Diversity in ORF1 domains and structure
The ORF1s of the L2 and CR1 elements were found to be highly diverse, both in terms of structure and the number of types of ORF1s found (Figures 2, 4 and 6). All five ORF1 types  were identified in the L2 and CR1 lineages, in contrast, all elements in the Jockey lineage have a single type of ORF1 (Figure 5). Three structural variations of ORF1 types I and II  were identified that contained a PHD domain (Figure 2). A total of eight differently structured ORF1s were found in the L2 lineage, and seven in the CR1 lineage. While the type I and II ORF1s predominate in the L2 lineage and the ORF1 type III B was only found in the CR1 lineage, there is no clear ‘CR1-like’ or ‘L2-like’ ORF1.
For the ORF1 types II and III, the type classification is somewhat concordant with a clustering analysis of the RRM domains (Figure 7) and the top hits from the HMM-HMM analysis (Table 1). However the RRM domains from type I do not all cluster together and the top hits are not the same, suggesting similarity at the structural but not amino acid sequence level (Figure 7). A major homology region (MHR) has been previously identified in the TART, TAHRE and DOC elements of the Jockey lineage . In our analysis, these elements have a type IB ORF1 [see Additional file 1]. A visual comparison of the amino acid alignment of the MHR in the TART and DOC elements of the Jockey lineage  with our alignment of the RRM domain identifies the MHR as a RRM domain (data not shown).
Functions and putative functions of ORF1 domains
Current evidence suggests that the RRM, esterase and CCHC zinc-knuckle domains are all involved in transcript binding, stabilization and chaperoning. The L1 ORF1 RRM domain is a single-stranded nucleotide binding protein with nucleic acid chaperone activity, preferentially binding to RNA [24, 31]. Many RNA binding proteins have a modular structure, and the RRM domain itself is often found in multiple copies , as in the type I ORF1s (Figure 2). The CTD domain has been shown experimentally to assist the RRM domain in nucleic acid binding . The CTD and CCHC domains therefore probably act as accessory domains in RNA binding. In the type I ORF1s, the CCHC zinc-knuckles found are gag-like, a gene found in long terminal repeat-retrotransposons and retroviruses. In the HIV retrovirus the role of the CCHC domain has been demonstrated to include the chaperoning of the transcript as well as the full-length cDNA . Consistent with these findings, in SART1, a telomeric specific LINE R1 element, all three CCHC zinc-knuckle motifs are involved in the specific packaging of the mRNA into the ribonucleoprotein (RNP) complex . TART elements are Jockey clade elements that form the telomeres in Drosophila and therefore presumably perform an essential host cellular function. Intriguingly, RNP complexes for TART were found to be efficiently transported into the nucleus, unlike non-telomeric Jockey clade elements , suggesting that there may be a host control system of ‘friendly’ and ‘unfriendly’ RNP complexes. The structure of the esterase domain has been recently elucidated . The authors suggest that the esterase domain is involved in membrane targeting, maybe driving RNP assembly on membrane surfaces . As far as we can tell, there have been no functional studies specifically on the PHD domain in LINE elements. However the PHD domain in other proteins has been well-studied and have been shown to recognize modified histones [36, 37]. Domains in this class are known as ‘epigenetic readers’. Although there are examples of LINEs that target specific genome regions, such as tRNA genes, telomeres or microsatellites, in most LINEs with an APE domain the target specificity of host sequences has been relaxed [10, 38, 39]. This suggests that the PHD domain may be involved in general targeting of the host genome during integration. In some subgroups there is apparently a single ORF, with a PHD domain at the N-terminus (L2 subgroup 4 and CR1 subgroup 6). The 5’ UTR of LINE elements is widely variable , so it is difficult to generalize about their structure. However, some of these elements are reported by Repbase as ‘autonomous’ and the region 5’ to the PHD domain is highly repetitive, suggesting that these may be full-length elements. These elements may therefore be a reversion to R2 like elements, with a single ORF or may be TE parasites, using machinery of other elements to transpose.
Reticulate evolution and horizontal ORF1 acquisition
The possibility of horizontal ORF1 acquisition has been proposed to explain, for example, the presence of the esterase type ORF1 in elements from diverse phyla in phylogenetically disjunct LINE clades . In our analysis the esterase type ORF1 was found only in two exclusively vertebrate subgroups, the L2 subgroup 9 and CR1 subgroup 7. This suggests that in the Jockey superfamily/group this ORF1 type may have been acquired in a vertebrate common ancestor and then transferred between the lineages. Our results also suggest that the ORF1 of CR1 subgroup 3 and L2 subgroup 2 may also have been horizontally transferred, possibly within a mosquito host. This ORF1 has three CCHC zinc fingers downstream from two RRM domains. In a clustering analysis of individual RRM domains, the upstream RRM domains from CR1 subgroup 3 and L2 subgroup 2 cluster together (Cluster 4 in Figure 7), while the downstream RRM domains cluster together in a separate subgroup (Cluster 3 in Figure 7). In a phylogenetic analysis of all five domains, the two RRM and three CCHC domains, from all sequences with this type of ORF1, CR1 subgroup 3 sequences cluster with those of L2 subgroup 2 (Figure 8). All CR1 subgroup 3 sequences are from the mosquito, Aedes aegypti, and all except one sequence in L2 subgroup 2 are from mosquitoes, including Aedes aegypti (Table 1). All other CR1 mosquito sequences (subgroup 4) have the type III ORF1. We therefore speculate that the ORF1 has been horizontally acquired within a mosquito host. Recombination at the DNA or RNA level is one way in which ORF1s may be horizontally acquired . Due to their mode of replication, LINEs are often 5’ truncated upon insertion. This suggests a simple way an ORF1 may become associated with an un-related ORF2, resulting in ORF1 shuffling. If a TE is 5’ truncated in such a way that it has a complete ORF2 but no ORF1 and the insertion occurs into other type of TE in between the ORF1 and ORF2, this would result in a hybrid TE with the ORF1 of one type of TE and the ORF2 of another type of TE.
Modular organization and domain shuffling
A protein domain can be defined as an independent evolutionary unit that can either have an independent function or contribute to the function of a multidomain protein. The major molecular mechanism that leads to multidomain proteins and novel combinations is non-homologous recombination, sometimes referred to as ‘domain shuffling’ . The variability in domain type and organization in ORF1s identified here in the Jockey superfamily/group is also suggestive of domain shuffling. Within ORF1 types the chief difference we identified (Figure 2) is the variable presence and position of a PHD domain in the CR1 and L2 lineage elements. From our data, we cannot determine the direction of domain shuffling. The RRM and CCHC domains are found in the ORF1 of L1, I and Jockey superfamily/group elements [11, 13] indicating that they are ancient components of LINEs. The variability in ORF1 structure that is the result of the combination of various modules, seen here in Jockey superfamily/group elements, is concordant with an increasing body of data indicating that the origin and evolution of TEs is reticulate, that is, it involves extensive domain shuffling [2–6].
We inferred a phylogeny based on the APE and RT domains for full-length Jockey superfamily/group elements from the Repbase database. ORF1s structures were mapped onto the ORF2 phylogeny. All Jockey lineage elements have the same type of ORF1, with one to two RRM domains upstream of three CCHC domains. In contrast, in the L2 and CR1 lineage elements, all five ORF1 types are found, with no one type of ORF1 predominating. The structure of these ORF1s is indicative of domain shuffling. The PHD domain is much more prevalent than previously suspected; it was identified in four ORF1 types in many subgroups within the L2 and CR1 lineages and both upstream and downstream of the RRM domain. There was also evidence of reticulate evolution and possibly horizontal transfer of entire ORF1s. The ORF1 of the CR1 subgroup 3 and L2 subgroup 2 is unusual, a Jockey like ORF1 with a PHD domain upstream of the RRM domains. Our analyses suggest that this ORF1 has been horizontally transferred. From our data we could not determine the direction or origin of this transfer. The esterase domain type ORF1 was found only in two exclusively vertebrate subgroups from the L2 and CR1 lineages, indicating that it has been acquired in a vertebrate common ancestor and then may have been transferred between the lineages. Within the Jockey superfamily/group, while the structure of the ORF2 appears to be highly constrained and its evolution tree-like, the ORF1 structure of the L2 and CR1 lineages is much more variable and its evolution reticulate.
Sequence retrieval from Repbase, Repeatmasker classification and alignment
All 1,249 sequences from the Jockey superfamily/group  were downloaded from the Repbase database in April 2014 . The two lungfish sequences were taken from Metcalfe et al. . The sequences were classified into ‘type’ by screening against a database of transposable element encoded proteins as implemented by the web-based Repeatmasker program . Sequences were then conceptually translated, aligned using ClustalW as implemented in BioEdit and adjusted by eye . Incomplete sequences were removed.
Phylogenetic analysis and identification of subgroups
The ORF2 RT domain is typically used to classify TEs at both the superfamily/group and clade levels [7, 9]. Phylogenies based on the APE are generally concordant with RT phylogeny, but with less resolution . We therefore inferred two phylogenies, one based on the RT domain alone, and one based on a concatenation of the APE and RT domains. For both regions, the optimal model of amino acid substitution was estimated using MEGA 6  with default settings. A neighbor-joining tree was inferred using the highest-ranked substitution model (JTT matrix) and the robustness of the nodes estimated by 500 bootstrap replicates. The topology of the two trees was similar. The chief difference between the two was that in the tree based on APE and RT domains the sequences fell into three well-defined groups consistent with the Repeatmasker ‘type’ classification, whereas in the tree based on the RT domain alone, the Repeatmasker L2 ‘type’ sequences fell into two groups with poor bootstrap support for the relationship between the groups (data not shown). All subsequent analyses were therefore based on the tree inferred from a concatenation of the APE and RT domains.
Subgroups within the three large ‘type’ groups were identified based on ORF1 alignment and support by the phylogenetic analysis. Sequences were renamed according to the type identified by Repeatmasker and subgroup. The RTclass1 tool  was used to classify subgroups into clades. Because the RTclass1 tool allows the analysis of a single sequence at a time, at least two sequences from each subgroup were assigned to a clade. Percent pairwise identity within the reverse transcriptase at the amino acid level for both the lineages and the clades were estimated using Geneious .
Open reading frame 1 analysis
For each subgroup identified the region 5’ to the endonuclease domain and 3’ to the 5’ UTR was extracted as an alignment. For simplicity’s sake this region will be referred to ‘ORF1’, although some domains identified are most likely at the beginning of ORF2 or are at the 5’ end of a single ORF. The beginning of the ORF1 was identified by a methionine and checked against the Repbase EMBL file if the translation was available. Each subgroup was analyzed for similarity to known domains using HMM-HMM comparisons as implemented in HHpred  against the following databases, the RCSB Protein Data Bank  as at 27 December 2012, the Pfam database  as at 2 December 2011 and the Panther Classification System  as at 1 May 2012. For each region the top hit was taken as the hit with the highest probability, or the hit with the highest coverage with a high probability (>85%).
For sequences with top hits against the RCSB Protein Data Bank , the RCSB record was checked to determine the type of the domain identified. For RRM domains, the publication associated with the top hit at the RCSB Protein Data Bank  was used to find the RNP consensus sequences. The JnetPred secondary structure prediction software  as implemented in JalView  was used to identify beta-sheets and alpha-helices. Pcoils  was used to infer coiled-coil domains and to confirm the position of coiled-coil domains in transposase 22 domains. For each subgroup the pairwise percent identity at the amino acid level for the ORF1 was estimated using Geneious .
Clans clustering and phylogenetic analysis of ORF1 domains
Domains identified as RRM were extracted from the ORF1 sequences. The region between the RNP2 and RNP1 consensus sequences was used because this was the only region shared by all sequences. For sequences where two RRM domains were identified, each domain was extracted separately, the first domain labeled ‘U’ for upstream and the second domain labeled ‘D’ for downstream. The RRM domains were clustered using CLANS and Blastp with default values .
For subgroups where the ORF1 structure was two RRM domains upstream of three CCHC domains, the entire region containing the RRM and CCHC domains were extracted, aligned using MUSCLE  and a neighbor-joining phylogeny inferred using MEGA 6  with the highest ranked substitution model (JTT matrix) .
Cys2HisCys zinc-knuckle domains
chicken repeat 1
Hidden Markov model
long interspersed nuclear element
major homology region
open reading frame
protein families database
RNA recognition motif
Research Collaboratory for Structural Bioinformatics.
This work was supported by Centre National de la Recherche Scientifique under the program ‘Action Thématique Incitative sur Programme’ awarded to Didier Casane from 2006–2009. We would like to thank Guilhmere Cruz for help with the Clans clustering analysis. We would also like to thank three anonymous reviewers for their comments.
- Lynch M: The origins of eukaryotic gene structure. Mol Biol Evol 2006, 23: 450-468.View ArticlePubMedGoogle Scholar
- McClure MA: Evolution of retroposons by acquisition or deletion of retrovirus-like genes. Mol Biol Evol 1991, 8: 835-856.PubMedGoogle Scholar
- Lerat E, Brunet F, Bazin C, Capy P: Is the evolution of transposable elements modular? Genetica 1999, 107: 15-25. 10.1023/A:1004026821539View ArticlePubMedGoogle Scholar
- Malik HS, Eickbush TH: Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol 1999, 73: 5186-5190.PubMed CentralPubMedGoogle Scholar
- Marco A, Marín I: How Athila retrotransposons survive in the Arabidopsis genome. BMC Genomics 2008, 9: 219. 10.1186/1471-2164-9-219PubMed CentralView ArticlePubMedGoogle Scholar
- Llorens C, Muñoz-Pomer A, Bernad L, Botella H, Moya A: Network dynamics of eukaryotic LTR retroelements beyond phylogenetic trees. Biol Direct 2009, 4: 41. 10.1186/1745-6150-4-41PubMed CentralView ArticlePubMedGoogle Scholar
- Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH: A unified classification system for eukaryotic transposable elements. Nat Rev Genet 2007, 8: 973-982. 10.1038/nrg2165View ArticlePubMedGoogle Scholar
- Eickbush TH, Malik HS: Origins and Evolution of Retrotransposons. In Mobile DNA II. 2nd edition. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. Washington, DC, USA: ASM Press; 2002:1111-1143.View ArticleGoogle Scholar
- Kapitonov VV, Tempel S, Jurka J: Simple and fast classification of non-LTR retrotransposons based on phylogeny of their RT domain protein sequences. Gene 2009, 448: 207-213. 10.1016/j.gene.2009.07.019PubMed CentralView ArticlePubMedGoogle Scholar
- Malik HS, Burke WD, Eickbush TH: The age and evolution of non-LTR retrotransposable elements. Mol Biol Evol 1999, 16: 793-805. 10.1093/oxfordjournals.molbev.a026164View ArticlePubMedGoogle Scholar
- Khazina E, Weichenrieder O: Non-LTR retrotransposons encode noncanonical RRM domains in their first open reading frame. Proc Natl Acad Sci U S A 2009, 106: 731-736. 10.1073/pnas.0809964106PubMed CentralView ArticlePubMedGoogle Scholar
- Nakamura M, Okada N, Kajikawa M: Self-interaction, nucleic acid binding, and nucleic acid chaperone activities are unexpectedly retained in the unique ORF1p of zebrafish LINE. Mol Cell Biol 2012, 32: 458-469. 10.1128/MCB.06162-11PubMed CentralView ArticlePubMedGoogle Scholar
- Casacuberta E, Pardue M-L: Transposon telomeres are widely distributed in the Drosophila genus: TART elements in the virilis group. Proc Natl Acad Sci U S A 2003, 100: 3363-3368. 10.1073/pnas.0230353100PubMed CentralView ArticlePubMedGoogle Scholar
- Martin SL, Li J, Weisz JA: Deletion analysis defines distinct functional domains for protein-protein and nucleic acid interactions in the ORF1 protein of mouse LINE-1. J Mol Biol 2000, 304: 11-20. 10.1006/jmbi.2000.4182View ArticlePubMedGoogle Scholar
- Januszyk K, Li PW-L, Villareal V, Branciforte D, Wu H, Xie Y, Feigon J, Loo JA, Martin SL, Clubb RT: Identification and solution structure of a highly conserved C-terminal domain within ORF1p required for retrotransposition of long interspersed nuclear element-1. J Biol Chem 2007, 282: 24893-24904. 10.1074/jbc.M702023200View ArticlePubMedGoogle Scholar
- Kapitonov VV, Jurka J: The Esterase and PHD Domains in CR1-Like Non-LTR Retrotransposons. Mol Biol Evol 2003, 20: 38-46. 10.1093/molbev/msg011View ArticlePubMedGoogle Scholar
- Kapitonov VV, Jurka J: A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet 2008, 9: 411-412. 10.1038/nrg2165-c1View ArticlePubMedGoogle Scholar
- Jurka J, Kapitonov VVV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 2005, 110: 462-467. 10.1159/000084979View ArticlePubMedGoogle Scholar
- RepeatMasker Open-3.0 http://www.repeatmasker.org
- Tamura K, Stecher G, Peterson D, Filipski A, Kumar S: MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 2013, 30: 2725-2729. 10.1093/molbev/mst197PubMed CentralView ArticlePubMedGoogle Scholar
- FigTree [http://tree.bio.ed.ac.uk/software/figtree/] 
- Söding J, Biegert A, Lupas AN: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005, 33: W244-W248. 10.1093/nar/gki408PubMed CentralView ArticlePubMedGoogle Scholar
- Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Science 1991, 252: 1162-1164. 10.1126/science.252.5009.1162View ArticlePubMedGoogle Scholar
- Khazina E, Truffault V, Büttner R, Schmidt S, Coles M, Weichenrieder O: Trimeric structure and flexibility of the L1ORF1 protein in human L1 retrotransposition. Nat Struct Mol Biol 2011, 18: 1006-1015. 10.1038/nsmb.2097View ArticlePubMedGoogle Scholar
- Geneious version 6.0.5 [http://www.geneious.com/web/geneious/home] 
- Finn RD, Clements J, Eddy SR: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 2011, 39: W29-W37. 10.1093/nar/gkr367PubMed CentralView ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32: 1792-1797. 10.1093/nar/gkh340PubMed CentralView ArticlePubMedGoogle Scholar
- Frickey T, Lupas A: CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics 2004, 20: 3702-3704. 10.1093/bioinformatics/bth444View ArticlePubMedGoogle Scholar
- Fuller AM, Cook EG, Kelley KJ, Pardue M-L: Gag proteins of Drosophila telomeric retrotransposons: collaborative targeting to chromosome ends. Genetics 2010, 184: 629-636. 10.1534/genetics.109.109744PubMed CentralView ArticlePubMedGoogle Scholar
- Rashkova S, Athanasiadis A, Pardue M: Intracellular targeting of gag proteins of the drosophila telomeric retrotransposons. J Virol 2003, 77: 6376-6384. 10.1128/JVI.77.11.6376-6384.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Martin SL: Nucleic acid chaperone properties of ORF1p from the non-LTR retrotransposon, LINE-1. RNA Biol 2010, 7: 706-711. 10.4161/rna.7.6.13766PubMed CentralView ArticlePubMedGoogle Scholar
- Lunde BM, Moore C, Varani G: RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol 2007, 8: 479-490. 10.1038/nrm2178View ArticlePubMedGoogle Scholar
- Buckman JS, Bosche WJ, Gorelick RJ: Human immunodeficiency virus type 1 nucleocapsid Zn(2+) fingers are required for efficient reverse transcription, initial integration processes, and protection of newly synthesized viral DNA. J Virol 2003, 77: 1469-1480. 10.1128/JVI.77.2.1469-1480.2003PubMed CentralView ArticlePubMedGoogle Scholar
- Matsumoto T, Hamada M, Osanai M, Fujiwara H: Essential domains for ribonucleoprotein complex formation required for retrotransposition of telomere-specific non-long terminal repeat retrotransposon SART1. Mol Cell Biol 2006, 26: 5168-5179. 10.1128/MCB.00096-06PubMed CentralView ArticlePubMedGoogle Scholar
- Schneider AM, Schmidt S, Jonas S, Vollmer B, Khazina E, Weichenrieder O: Structure and properties of the esterase from non-LTR retrotransposons suggest a role for lipids in retrotransposition. Nucleic Acids Res 2013, 41: 10563-10572. 10.1093/nar/gkt786PubMed CentralView ArticlePubMedGoogle Scholar
- Musselman CA, Kutateladze TG: Handpicking epigenetic marks with PHD fingers. Nucleic Acids Res 2011, 39: 9061-9071. 10.1093/nar/gkr613PubMed CentralView ArticlePubMedGoogle Scholar
- Sanchez R, Zhou M-M: The PHD finger: a versatile epigenome reader. Trends Biochem Sci 2011, 36: 364-372.PubMed CentralPubMedGoogle Scholar
- Han JS: Non-long terminal repeat (non-LTR) retrotransposons: mechanisms, recent developments, and unanswered questions. Mob DNA 2010, 1: 15. 10.1186/1759-8753-1-15PubMed CentralView ArticlePubMedGoogle Scholar
- Zingler N, Weichenrieder O, Schumann GG: APE-type non-LTR retrotransposons: determinants involved in target site recognition. Cytogenet Genome Res 2005, 110: 250-268. 10.1159/000084959View ArticlePubMedGoogle Scholar
- Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA: Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol 2004, 14: 208-216. 10.1016/j.sbi.2004.03.011View ArticlePubMedGoogle Scholar
- Metcalfe CJ, Filée J, Germon I, Joss J, Casane D: Evolution of the Australian lungfish (Neoceratodus forsteri) genome: a major role for CR1 and L2 LINE elements. Mol Biol Evol 2012, 29: 3529-3539. 10.1093/molbev/mss159View ArticlePubMedGoogle Scholar
- Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 1999, 41: 95-98.Google Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235-242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer ELL, Eddy SR, Bateman A: The Pfam protein families database. Nucleic Acids Res 2010, 38: D211-D222. 10.1093/nar/gkp985PubMed CentralView ArticlePubMedGoogle Scholar
- Mi H, Muruganujan A, Thomas PD: PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 2013, 41: D377-D386. 10.1093/nar/gks1118PubMed CentralView ArticlePubMedGoogle Scholar
- Cuff JA, Barton GJ: Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 2000, 40: 502-511. 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-QView ArticlePubMedGoogle Scholar
- Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ: Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25: 1189-1191. 10.1093/bioinformatics/btp033PubMed CentralView ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia J-M, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14: 1188-1190. 10.1101/gr.849004PubMed CentralView ArticlePubMedGoogle Scholar
- Cole C, Barber JD, Barton GJ: The Jpred 3 secondary structure prediction server. Nucleic Acids Res 2008, 36: W197-W201. 10.1093/nar/gkn238PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.