Homologues of bacterial TnpB_IS605 are widespread in diverse eukaryotic transposable elements
© Bao and Jurka; licensee BioMed Central Ltd. 2013
Received: 13 November 2012
Accepted: 20 February 2013
Published: 1 April 2013
Bacterial insertion sequences (IS) of IS200/IS605 and IS607 family often encode a transposase (TnpA) and a protein of unknown function, TnpB.
Here we report two groups of TnpB-like proteins (Fanzor1 and Fanzor2) that are widespread in diverse eukaryotic transposable elements (TEs), and in large double-stranded DNA (dsDNA) viruses infecting eukaryotes. Fanzor and TnpB proteins share the same conserved amino acid motif in their C-terminal half regions: D-X(125, 275)-[TS]-[TS]-X-X-[C4 zinc finger]-X(5,50)-RD, but are highly variable in their N-terminal regions. Fanzor1 proteins are frequently captured by DNA transposons from different superfamilies including Helitron, Mariner, IS4-like, Sola and MuDr. In contrast, Fanzor2 proteins appear only in some IS607-type elements. We also analyze a new Helitron2 group from the Helitron superfamily, which contains elements with hairpin structures on both ends. Non-autonomous Helitron2 elements (CRe-1, 2, 3) in the genome of green alga Chlamydomonas reinhardtii are flanked by target site duplications (TSDs) of variable length (approximately 7 to 19 bp).
The phylogeny and distribution of the TnpB/Fanzor proteins indicate that they may be disseminated among eukaryotic species by viruses. We hypothesize that TnpB/Fanzor proteins may act as methyltransferases.
KeywordsDNA transposon TnpB Fanzor Helitron Helitron2 IS200/605 IS607 Methyltransferase
Transposable elements (TEs) are DNA segments that are duplicated and inserted into genomic DNA by a variety of mechanisms. There are two major groups of TEs: DNA transposons and retrotransposons. Retrotranposons are further divided into those containing long terminal repeats (LTRs), or LTR retrotransposons, and non-LTR retrotransposons, which are not flanked by LTRs. Typically, TEs encode only proteins essential for their reproduction and insertion, including reverse transcriptases and transposases (Tpases). Currently, there are four known types of transposases encoded by TEs. The most common type is the DDE-transposase encoded by most bacterial insertion sequences (IS), eukaryotic DNA transposons, and LTR retrotransposons. The second group is represented by reverse transcriptases (RT), encoded by a variety of non-LTR and LTR-retrotransposons. The third group includes tyrosine recombinases (YR) encoded by IS91, Helitron, IS200/IS605, Crypton, and DIRS-retrotransposon families [4, 5]. The last group is represented by serine recombinases (SR), encoded by IS607 family, Tn4451, and bacteriophage phiC31 . The structural features and specific transposition mechanisms differ fundamentally among these TE groups. Most DNA transposons are flanked by terminal inverted repeats (TIRs) and target site duplications (TSDs), and are transposed by the ‘cut-and-paste’ mechanism used by DDE transposases, although some use replicative mechanism (Tn3) , or are able to switch to replicative mode (for example, MuDr, Tn7 and IS903[8–11]). LTR-retrotransposons use RT and integrase (DDE-transposase) to complete their transposition. Non-LTR retrotransposons need both RT and endonuclease (EN) in their transposition process termed target site-primed reverse transcription (TPRT) . Transposons using YR and SR as Tpase lack TIRs and produce no TSDs upon insertion. However, their terminal hairpin structures (IS200/605 family) or terminal short direct repeats (Crypton) are important for transposition [3, 13, 14].
Elements from the IS200/IS605 and IS607 families usually encode a secondary protein (TnpB) of unknown function, in addition to transposase (TnpA). Three independent experiments on IS607, ISHp608, and ISDra2 elements (the latter two belong to the IS200/IS605 family), have shown that TnpB is dispensable for the transposition in Escherichia coli[14, 15] and Deinococcus radiodurans. Interestingly, numerous IS elements (for example, IS1341, IS809 and IS1136) encode TnpB as the only protein (putative transposase), but the supporting evidence for TnpB-mediated transposition is still missing. Like other elements from the IS200/IS605 and IS607 families, these TnpB-only transposons lack TIRs and TSDs. One possibility is that these elements represent non-autonomous derivatives of IS607 or IS200/IS605-like transposons, where TnpA is deleted. Due to this uncertainty, most of the TnpB-only elements are ambiguously assigned to the IS200/IS605 family in the ISfinder database (http://www-is.biotoul.fr) .
In this paper, we report two groups of TnpB-like proteins, named as Fanzor1 and Fanzor2 (collectively called Fanzor), from diverse eukaryotic genomes, including metazoans, fungi, and protists (amoeba, chlorophyte, stramenopile, choanoflagellate and rhodophyta), as well as dsDNA viruses that infect eukaryotes. Fanzor and TnpB protein both contain a constellation of strictly conserved residues stretching from the protein center to the C-terminus, D-X(125, 275)-[TS]-[TS]-X-X-[C4 zinc finger]-X(5,50)-RD. The C4 zinc finger is called OrfB_Zn_ribbon ([CDD:pfam07282]) in the Conserved Domain Database (CDD) . Phylogenetically, Fanzor1 proteins form a single separate clade, and Fanzor2 proteins co-cluster with a small set of bacterial TnpB proteins from the IS607 family. Fanzor1 proteins were captured by transposable elements from at least five different superfamilies: Mariner, Sola, IS4, Helitron and MuDr. Fanzor2 proteins are encoded by the IS607-type transposons. While biological function of the Fanzor/TnpB proteins is not known at present, there are indications that the Fanzor1 protein may be functioning as a methyltransferase. This is based on comparison of three elements, PGv-1, Mariner-2 _PGv and Mariner-1 _OLpv, each encoding three proteins, including Mariner-Tpase, endonuclease and either methyltransferase or Fanzor1 protein. Our data also suggest that viruses may facilitate spreading Fanzor proteins in eukaryotes.
The analysis of Fanzor proteins also revealed ‘one-ended transposition’ in three non-autonomous Helitron transposon families (CRe-1, 2, 3) in green algae Chlamydomonas reinhardtii. Of particular interest is the ‘one-ended’ group of Helitrons flanked by TSDs. One-ended transposition has been previously reported in IS91 family in bacteria [19, 20], but not as associated with generation of TSDs . Finally, we describe a new Helitron group (Helitron2) that is distinct from the canonical Helitron elements (Helitron1). Helitron1 elements contain only one hairpin structure at the 3′-subterminal region, and with conserved 5′-TC and CTRR-3′ ends . In contrast, Helitron2 elements carry two hairpin structures and short (8 to 15 bp) asymmetric terminal inverted repeats (ATIRs) at the ends. The 5′-ATIR is close to the 5′-terminus, pairing with its downstream nucleotides to form a 5′-hairpin structure; the 3′-ATIR is subterminally located, immediately upstream from the hairpin structure. Individual Helitron2-like elements were reported to differ from the canonical Helitron1 sequences in terms of their terminal features [21–24], however the features were not associated with any separate Helitron group. The characteristic Helitron2 features may help improve the performance of the automatic detection programs that are currently using only the Helitron1 features [25, 26].
Identification of the eukaryotic TnpB-like proteins
Species harboring Fanzor sequences
Number Fanzor1 family
Number Fanzor2 family
Salpingoeca sp. (ATCC 50818)
Rhizopus oryzae RA 99-880
Allomyces macrogynus ATCC 38327
Phycomyces blakesleeanus NRRL1555
Ashbya gossypii ATCC 10895
Eremothecium cymbalariae DBVPG#7215
Saccharomyces cerevisiae EC1118, Lalvin QA23
Polysphondylium pallidum PN500
Acanthamoeba castellanii strain Neff
Chlorella vulgaris strain NJ-7
Albugo laibachii Nc14
Ectocarpus siliculosus virus ([GeneBank:AF204951], 335-kb)
Shrimp white spot syndrome virus ([GenBank:AF332093], 305-kb)
Helicoverpa armigera granulovirus ([GenBank:EU255577], 169-kb)
Helicoverpa armigera multiple nucleopolyhedrovirus ([GenBank:EU730893], 154-kb)
Pseudaletia unipuncta granulovirus ([GenBank:EU678671], 176-kb)
Spodoptera frugiperda ascovirus 1a ([GenBank:AM398843], 157-kb)
Heliothis virescens ascovirus 3e ([GenBank:EF133465], 186-kb)
Mamestra configurata nucleopolyhedrovirus B ([GenBank:AY126275], 158-kb)
Phaeocystis globosa virus 12T ([GenBank:HQ634147], 460-kb)
Emiliania huxleyi virus 88 ([GenBank:JF974310], 397-kb)
Emiliania huxleyi virus 99B1 ([GenBank:FN429076], 377-kb)
Acanthamoeba polyphaga mimivirus ([GenBank:AY653733], 1181-kb)
Acanthamoeba castellanii mamavirus ([GenBank:JF801956], 1192-kb)
Megavirus chiliensis ([GenBank:JN258408], 1259-kb)
Paramecium bursaria Chlorella virus AR158 ([GenBank:DQ491003], 345-kb)
Paramecium bursaria Chlorella virus NY2A ([GenBank:DQ491002], 369-kb)
Cafeteria roenbergensis virus BV-PW1 ([GenBank:GU244497], 617-kb)
Feldmannia species virus ([GenBank:NC_011183], 155-kb)
Sequence feature and phylogeny of Fanzor proteins
The N-terminal halves of the Fanzor and TnpB proteins are highly diverged, but their C-terminal halves are relatively conserved and include strictly conserved amino acid motif D-X(125, 275)-[TS]-[TS]-X-X-[C4 zinc finger]-X(5, 50)-RD (Figure 1, see Additional file 4). To date, this long motif was found only in TnpB and Fanzor proteins, and it includes a short, previously characterized OrfB_Zn_ribbon domain ([CDD:pfam07282]). Given that Fanzor and TnpB are both associated with TEs, the shared motif strongly suggests that they are functional homologues, rather than unrelated proteins accidentally carrying the same domain.
Helix-turn-helix (HTH) domain ([CDD:pfam12323]: HTH_OrfB_IS605) is present in the N-terminal regions of some TnpB and Fanzor2 proteins (Figure 1), including those encoded by IS607, IS891, ISArma1 and ISvAR158 _1. Given that the alignment in this local area is relatively well conserved (see Additional file 4), this HTH domain is presumably present in other TnpB proteins, but due to the high sequence divergence whether or not a comparable HTH domain exists in Fanzor1 proteins could not be determined. Two additional amino acids are also extremely conserved in the Fanzor1 proteins (G500 and E536, Figure 1). However, this may reflect a smaller divergence of the Fanzor1 clade than that of the TnpB clade (Figure 2).
Fanzor1 protein in Tc/mariner elements
Fanzor1 protein in Helitron transposons
The three Fanzor1 families (CRe-1, 2, 3) are frequently 5′-truncated, and coupled with internal deletions (Figure 4A, 4F, see Additional file 8B). However, almost all copies are intact at the 3′-terminal regions (Figure 4F). This biased 3′-overabundance implies that duplication process by the rolling cycle replication starts from the 3′-end, which is analogous to the previously reported one-ended transposition in bacterial IS91 element . Data from Helitron-1N1 _CRe and Helitron-1N2 _CRe indicate that these Helitrons insert specifically downstream from the 5′-TTTT-3′ tetranucleotide, producing no TSDs (Figure 4E). However, this non-TSD feature only appears in CRe-1, 2, 3 insertions that terminate exactly at the consensus 5′-ends, such as the loci 2, 3, 8, 9, 13 in Figure 4A. Strikingly, most other insertions, especially 5′-truncated ones, are flanked by TSDs of variable length (approximately 7 to 19 bp; Figure 4B). In some cases much longer TSDs are observed (44, 50, 93, 242 and 443-bp long). Approximately 70% of CRe-1 (150 loci), 57% of CRe-2 (70 loci), and 10% of CRe-3 (35 loci) are flanked by TSDs. This varying percentage probably reflects different family ages, since CRe-1 is the youngest family with elements approximately 98% identical to the consensus. Interestingly, almost all of these 5′-TSDs are located downstream from the same tetranucleotide as observed in the Helitron-1N1 _CRe or Helitron-1N2 _CRe insertions (TTTT, or T-rich tetranucleotides: TTTG, TTTC, TCTT, TGTT), suggesting a common mechanism involved at least in the target recognition process, in the Helitron and the three non-autonomous Fanzor1 families. In some individual CRe-1, 2, 3 insertions, short extra sequences are present downstream the 5′-TSDs (locus 1 and 7, Figure 4A). The captured sequences can occur upstream from the normal consensus 5′-termini (locus 1, Figure 4A). Intriguingly, TSDs are extremely rare in the cases of the non-autonomous Helitron-1N1 _CRe and Helitron-1N2 _CRe elements. For example, only one out of 200 Helitron-1N1 _CRe elements is flanked by TSDs. Elements of the two families are 95 to 98% identical to their consensus sequences. It is not clear whether the difference between the three Fanzor1 elements and the two non-autonomous Helitron elements is caused by the Fanzor1 protein or by the relatively short length of the Helitron-1N1 _CRe elements (657 bp) or Helitron-1N2 _CRe elements (673 bp).
Features of Helitron2 elements
Fanzor1 protein in IS4-type elements
Fanzor1 protein in Sola2 elements
Fanzor1 protein in other transposable elements
Fanzor1 proteins were also found in DNA transposons from other superfamilies. For example, in the genomes of fungi Rhizopus oryzae, Phycomyces blakesleeanus and Mucor circinelloides, ROr-4, PBl-3 and MCi-4 elements, respectively, appear to belong to the MuDr superfamily (see Additional file 12). While these elements do not encode MuDR Tpase, all carry TIRs similar to those of confirmed MuDR elements (for example, MuDr-2 _PBl) and are flanked by 9-bp TSDs.
In the genomes of five insect-infecting viruses, five closely related Fanzor1 families, HVav-1 (Heliothis virescens ascovirus 3e), SFav-1 (Spodoptera frugiperda ascovirus 1a), PUgv-1 (Pseudaletia unipuncta granulovirus), HAgv-1 (Helicoverpa armigera granulovirus) and HAmn-1 (Helicoverpa armigera multiple nucleopolyhedrovirus), are flanked by 4-bp TSDs (TTAN) and 13-bp TIRs (see Additional file 13). However, they could not be assigned to any particular superfamily due to the lack of Tpase information.
As in the case of MCi-2, the classification of MCi-5 family is also unknown. Five MCi-5 elements (loci) were identified in M. circinelloides genome, three of which (Locus-1, 2, 3) appear to be complete elements, flanked by putative 6-bp TSDs (ATTTAT), while no significant TIRs were detected (Figure 8C). Interestingly, the Harbinger-type Tpase (2 exons) is encoded by three MCi elements (Locus-1, 2, 4; Figure 8C, see Additional file 15). It is unclear whether the Harbinger Tpases are involved in the transposition of MCi-5 elements, because, in contrast to other typical Harbinger elements, MCi-5 elements lack any obvious TIRs, and the potential TSDs (ATTTAT) are not 2 or 3-bp long as in other typical Harbinger elements .
In the red alga Cyanidioschyzon merolae genome, approximately 150 copies of CMe-1A elements are found, each approximately 80% identical to the consensus. Its complete consensus is shown to be around 3-kb long, but the TSDs could not be determined, probably due to high diversity. Interestingly, the 5′ 635-bp of CMe-1A is 95% identical to the entire sequence of another transposable element, TE-N2 _CMe, which is represented approximately by 70 copies in the genome (Figure 8D). Both CMe-1A and TE-N2 _CMe elements lack TIRs and their TE classification is unknown.
Fanzor2 proteins in IS607-like elements
Except for the Fanzor2 proteins, the only TnpA_IS607-like serine recombinases (SR) could be found in some Fanzor2 elements, such as ACa-1, -2, CRv-1, ISvMimi _1, ISvMimi _2, ISvAR158 _1, and ISvNY2A _1 (Figure 8E, see Additional file 1). In the bacterial IS elements that co-cluster with Fanzor2 elements, only TnpA_IS607-like serine recombinases (SR) were found, such as in ISArma1 (Figure 2). All these elements have no TIRs or TSDs, suggesting Fanzor2 and these IS elements might have a common origin.
The mysterious role of Fanzor/TnpB in transposition
Prokaryotic TnpB proteins are encoded by bacterial transposable elements of IS200/605 or IS607 family. Here we report two groups of TnpB homologues (Fanzor1 and Fanzor2) encoded by diverse transposable elements from different eukaryotic species, as well as from some large DNA viruses that infect eukaryotes. Fanzor and TnpB proteins are functionally uncharacterized, but they share the same set of extremely conserved motifs in their C-terminal halves: D-X(125,275)-[TS]-[TS]-X-X-[C4 zinc finger]-X(5,50)-RD (Figure 1). While Fanzor2 proteins are closer to prokaryotic TnpB, also encoded by IS607-like elements, Fanzor1 proteins are encoded by diverse TEs, and are more distantly related to TnpB than the Fanzor2 proteins (Figure 2).
TnpB/Fanzor proteins are not DDE-type Tpases. Why are they so frequently found in various transposons? Can Fanzor/TnpB represent a novel type of Tpase that could propagate DNA element alone? This possibility can be ruled out in IS200/605 or IS607 families, where tyrosine recombinase or serine recombinase (TnpA) is known to be the functional Tpase, and TnpB proteins appear to be dispensable for transposition [14–16, 32]. Alternatively, could TnpB/Fanzor represent a captured passenger gene with functions irrelevant for the transposition process, such as antibiotic resistance genes? This is also unlikely because they would be present in many different types of IS elements, rather than only in IS200/605 and IS607 families from bacterial genomes.
In a third scenario, TnpB/Fanzor proteins may function as regulatory proteins in an unknown transposition processes in vivo. In fact, the complexity of the transposition process has been studied in Tn7 transposon, which encodes five proteins and all are involved in transposition. The proteins are: TnsA (type II restriction endonuclease), TnsB (DDE Tpase), TnsC (a regulator between the TnsAB and TnsD or TnsE), TnsD (directing transposition to attTn7 sites) and TnsE (directing transposition to non-attTn7 sites) . Other transposon-encoded non-Tpase proteins, potentially involved in transposition were also reported recently by Kapitonov et al. . They include the SNF2 helicase in Inton and Enton, DEDDh nuclease in P and piggyBac, and RecQ Helicase in Academ. It is worth noting that Fanzor/TnpB proteins contain some DNA-binding domains: a zinc-finger-like domain near to the C-termini, and a N-terminal HTH domain in TnpB and Fanzor2 (probably, in the Fanzor1 proteins as well; Figure 1), suggesting their involvement in the transposition process.
The presumed function in transposition is also suggested by an example of an old Fanzor1 family, CMe-1A (Figure 8D). CMe-1A elements are approximately 80% identical to the family consensus, but some individual CMe-1A elements still encode intact Fanzor1 proteins. This long lasting coding capability would seems unusual for a “non-autonomous” family (CMe-1A) if no function is associated with the Fanzor1 protein. Analogous cases exist in the so-called HAL1 “non-autonomous” families derived from the L1 non-LTR retrotransposons, which encode the first open reading frame protein (ORF1p) only, instead of both ORF1p and ORF2p . ORF1p is a “nucleic acid chaperone with RNA binding  and nucleic acid chaperone activity , but ORF2p codes for the major Tpase with its endonuclease (EN) and reverse transcriptase (RT) activity. In the guinea pig genome the coding capacity of the ORF1p in the HAL1 retrotransposons has been maintained for a relatively long time (approximately 29 to 44 Myr) , implying that both the cis-encoded ORF1p and trans-encoded ORF2p are required for transposition of HAL1 elements.
Comparison of three virus-integrated Mariner transposons, PGv-1, Mariner-2 _PGv and Mariner-1 _OLpv (Figure 3) may provide some clues regarding the potential function of the TnpB/Fanzor protein. Each Mariner element encodes three proteins showing some functional parallelisms: Tpase, endonuclease, and methyltransferase in Mariner-2 _PGv and Mariner-1 _OLpv or Tpase, endonuclease and Fanzor in PGv-1. In bacteria, methytransferases and restriction endonucleases constitute the restriction-and-modification system important in many cellular processes. Therefore, it is interesting to see that both endonuclease and methyltransferase are encoded by some transposons (Mariner-2 _PGv and Mariner1-1 _OLpv). To our knowledge, the presence of methyltransferase in transposons has not been reported before. The potential role of the transposon-encoded methyltransferases in transposition remains largely unknown. Normally, DNA methylation is essential for inhibiting the expression and transposition of TEs [38, 39]. For example, methylation in the terminal sequence of transposons can prevent binding of transposase [40, 41]. Theoretically, methylation may also protect the DNA in transposome from cutting by restriction enzymes, especially in bacterial cells. Moreover, it was reported that deoxycytosine methylase (Dcm) and EcoRII methylase could increase the Tn3 transposition frequency in E.coli. There are other circumstantial data consistent with this methyltransferase-hypothesis. First, while the vast majority of TnpB proteins are annotated as transposases in the NCBI database, a handful of them are indeed annotated as DNA (cytosine-5-)-methyltransferases (for example, [GenBank:YP_001645687.1]). However, the basis for this annotation is not documented. Second, GipA ([GenBank:AAF98319.1]) is a TnpB-like protein encoded by an IS element carried by the lambdoid phage Gifsy-1. GipA has been shown to be a virulence gene in Salmonella enterica. Analogously, DNA adenine methylase (Dam) is known as an important factor in bacterial virulence [43–45]. The above observations are consistent with the possibility that Fanzor protein could be a methytransferase.
Fanzor elements in viruses
In the current dataset, 18 different large dsDNA eukaryotic viruses were found carrying Fanzor elements (Table 1). In contrast, only 24 eukaryotic species are found carrying Fanzor elements. This is unexpected given the relatively small genomes of these viruses. However, this may be partly explained by a possibility that Fanzor protein assumes the same role both in the viral infection and TE transposition. In a sense, both viruses and DNA TEs are selfish or parasitic episomes.
In the phylogenetic tree, the viral Fanzor proteins are intermingled with non-viral eukaryotic Fanzor proteins (Figure 2). This suggests that these large-genome viruses may play an extensive role in spreading Fanzor genes (or other TEs) among eukaryotes. Among currently sequenced metazoan species, only one insect species, hessian fly (M. destructor), was found to carry Fanzor elements. The HMa-1 element in H. magnipapillata probably originally also came from a virus genome. All the 13 Fanzor families in the M. destructor genome significantly co-cluster with 5 viral Fanzor families, including HAgv-1, SFav-1, PUgv-1, HAmv-1 and HVav-1 (PUgv-1, HAmv-1 and HVav-1 are not included in Figure 2). These viruses are all insect-infecting viruses suggesting that they may participate in spreading Fanzor elements. Interestingly, the genomes of Heliothis virescens ascovirus 3e (HVav, [GenBank:EF133465]) and Helicoverpa armigera multiple nucleopolyhedrovirus (HAmn, [GenBank:EU730893]), share no overall sequence similarity at all, but each of them contains one copy of a Fanzor element, HVav-1 and HAmn-1, respectively, 88% identical to each other over the entire length. Notably, the two viruses infect insect species of the same Noctuidae family. Finally, the Phaeocystis globosa virus 12T (PGv, [GenBank:HQ634147]) and Organic Lake phycodnavirus 1 (OLpv-1, [GenBank:HQ704802.1]) genome share no overall sequence similarity at all, except for the Mariner-2 _PGv and Mariner1 _OLpv elements in their genomes, respectively, which are 79% identical in their 5′-terminal regions (Figure 3). Both viruses infect phototrophic marine algae: PGv infects Phaeocystis globosa and OLpv-1 probably infects prasinophyte Pyramimonas.
Fanzor proteins are often found in chimeric elements represented by the following 4 sets of TEs: (1) PGv-1, Mariner-2 _PGv, Mariner-1 _OLpv and HMa-1 (Figure 3); (2) ESvi-1B and ESv-2 (Figure 6); (3) DFa-1, DFa-2 and DFa-3 (Figure 7A); (4) PPa-1, PPa-4 and PPa-5 (Figure 7B). The first two sets are from the virus genomes. The latter two sets of elements are present in two related slime mold species: D. fasciculatum and P. pallidum. These chimeric Fanzor elements probably also originated with the involvement of viruses.
Fanzor and TnpB are homologous proteins. Hypothetically, they may function as methytransferases. Eukaryotic Fanzor proteins are associated with many diverse eukaryotic viruses. The relatively small number of Fanzor elements in Eukaryotes probably reflects the fact that they were relatively recently transferred by viruses. A more frequent horizontal transfer in bacteria may account for the more common presence of the TnpB proteins in diverse bacteria and phages [47, 48]. The two clades of Fanzor elements (Fanzor1 and Fanzor2), might have originated from two independent transfers from bacteria to eukaryotes.
Transposons were automatically detected using custom-made scripts based on the methods described before . Consensus sequences of each family were constructed whenever possible. Potentially new TE proteins encoded by long ORFs, were screened out by TblastN against Rebase database . The PSI-Blast and TBLASTN screening for homologous proteins was done against all available sequence databases at the National Center for Biotechnology Information (NCBI) and at the Department of Energy Joint Genome Institute (JGI). To detect all distantly related eukaryotic proteins, multiple rounds of PSI-Blast were performed until no more new significant scores were detected. Each newly detected eukaryotic protein was used as query to repeat this procedure. In addition to NCBI databases, the following genome sequences were downloaded from the JGI:, Phycomyces blakesleeanus NRRL1555 and Mucor circinelloides (http://genome.jgi-psf.org/Phybl2/Phybl2.download.ftp.html, http://genome.jgi-psf.org/Mucci2/Mucci2.download.ftp.html). The TE-encoded multiple-exon genes were predicted by FGENESH program (http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind), and confirmed or refined with expressed sequence tag (EST) information whenever possible. Functional motifs in these proteins were identified by search against the Conserved Domain Database (CDD) (http://www.ncbi.nlm.nih.gov/cdd/). Multiple protein sequences were aligned by online MAFFT (v6.861b), using Web server (http://mafft.cbrc.jp/alignment/software/) . Sequence phylogenies were obtained using PhyML (v3)  available at Phylogeny.fr web server (http://www.phylogeny.fr/) , and the phylogeny tree was rendered by MEGA4 . The DNA and encoded protein sequences encoded by the TEs are listed in the Additional file 2 and Additional file 3.
Asymmetric terminal inverted repeats
Conserved Domain Database
Expressed sequence tag(s)
Long terminal repeat
Open reading frame
Terminal inverted repeats
Target site-primed reverse transcription
Target site duplications
This work was supported in part by the National Library of Medicine [P41 LM006252]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine or the National Institutes of Health.
- Chandler M, Mahillon J: Insertion sequences revisited. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington, DC: American Society for Microbiology Press, 305-366.View ArticleGoogle Scholar
- Kapitonov VV, Jurka J: Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA. 2001, 98: 8714-8719. 10.1073/pnas.151269298.PubMed CentralView ArticlePubMedGoogle Scholar
- Goodwin TJ, Butler MI, Poulter RT: Cryptons: a group of tyrosine-recombinase-encoding DNA transposons from pathogenic fungi. Microbiology. 2003, 149: 3099-3109. 10.1099/mic.0.26529-0.View ArticlePubMedGoogle Scholar
- Cappello J, Handelsman K, Lodish HF: Sequence of Dictyostelium DIRS-1: an apparent retrotransposon with inverted terminal repeats and an internal circle junction sequence. Cell. 1985, 43: 105-115. 10.1016/0092-8674(85)90016-9.View ArticlePubMedGoogle Scholar
- Goodwin TJ, Poulter RT: The DIRS1 group of retrotransposons. Mol Biol Evol. 2001, 18: 2067-2082. 10.1093/oxfordjournals.molbev.a003748.View ArticlePubMedGoogle Scholar
- Smith MC, Thorpe HM: Diversity in the serine recombinases. Mol Microbiol. 2002, 44: 299-307. 10.1046/j.1365-2958.2002.02891.x.View ArticlePubMedGoogle Scholar
- Grindley NDF: The movement of Tn3-like elements: transposition and cointegrate resolution. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington, DC: American Society for Microbiology Press, 272-302.View ArticleGoogle Scholar
- Robertson DS: Mutator activity in maize: timing of its activation in ontogeny. Science. 1981, 213: 1515-1517. 10.1126/science.213.4515.1515.View ArticlePubMedGoogle Scholar
- Robertson D: Differential activity of the maize mutator. Mol Genet Genomics. 1985, 200: 9-13. 10.1007/BF00383305.View ArticleGoogle Scholar
- May EW, Craig NL: Switching from cut-and-paste to replicative Tn7 transposition. Science. 1996, 272: 401-404. 10.1126/science.272.5260.401.View ArticlePubMedGoogle Scholar
- Tavakoli NP, Derbyshire KM: Tipping the balance between replicative and simple transposition. EMBO J. 2001, 20: 2923-2930. 10.1093/emboj/20.11.2923.PubMed CentralView ArticlePubMedGoogle Scholar
- Luan DD, Korman MH, Jakubczak JL, Eickbush TH: Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell. 1993, 72: 595-605. 10.1016/0092-8674(93)90078-5.View ArticlePubMedGoogle Scholar
- Barabas O, Ronning DR, Guynet C, Hickman AB, Ton-Hoang B, Chandler M, Dyda F: Mechanism of IS200/IS605 family DNA transposases: activation and transposon-directed target site selection. Cell. 2008, 132: 208-220. 10.1016/j.cell.2007.12.029.PubMed CentralView ArticlePubMedGoogle Scholar
- Kersulyte D, Mukhopadhyay AK, Shirai M, Nakazawa T, Berg DE: Functional organization and insertion specificity of IS607, a chimeric element of Helicobacter pylori. J Bacteriol. 2000, 182: 5300-5308. 10.1128/JB.182.19.5300-5308.2000.PubMed CentralView ArticlePubMedGoogle Scholar
- Kersulyte D, Velapatino B, Dailide G, Mukhopadhyay AK, Ito Y, Cahuayme L, Parkinson AJ, Gilman RH, Berg DE: Transposable element ISHp608 of helicobacter pylori: nonrandom geographic distribution, functional organization, and insertion specificity. J Bacteriol. 2002, 184: 992-1002. 10.1128/jb.184.4.992-1002.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Pasternak C, Ton-Hoang B, Coste G, Bailone A, Chandler M, Sommer S: Irradiation-induced deinococcus radiodurans genome fragmentation triggers transposition of a single resident insertion sequence. PLoS Genet. 2010, 6: e1000799-10.1371/journal.pgen.1000799.PubMed CentralView ArticlePubMedGoogle Scholar
- Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M: ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006, 34: D32-D36. 10.1093/nar/gkj014.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH: CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res. 2011, 39: D225-D229. 10.1093/nar/gkq1189.PubMed CentralView ArticlePubMedGoogle Scholar
- Bernales I, Mendiola MV, de la Cruz F: Intramolecular transposition of insertion sequence IS91 results in second-site simple insertions. Mol Microbiol. 1999, 33: 223-234. 10.1046/j.1365-2958.1999.01432.x.View ArticlePubMedGoogle Scholar
- Mendiola MV, Bernales I, de la Cruz F: Differential roles of the transposon termini in IS91 transposition. Proc Natl Acad Sci USA. 1994, 91: 1922-1926. 10.1073/pnas.91.5.1922.PubMed CentralView ArticlePubMedGoogle Scholar
- Kapitonov VV, Jurka J: Helitron-N1_SP, a family of autonomous helitrons in the sea urchin genome. Repbase Reports. 2005, 5: 394-394.Google Scholar
- Kapitonov VV, Jurka J: RPA70-Encoding helitrons in zebrafish. Repbase Reports. 2007, 7: 1179-1179.Google Scholar
- Yang HP, Barbash DA: Abundant and species-specific DINE-1 transposable elements in 12 drosophila genomes. Genome Biol. 2008, 9: R39-10.1186/gb-2008-9-2-r39.PubMed CentralView ArticlePubMedGoogle Scholar
- Coates BS, Sumerford DV, Hellmich RL, Lewis LC: A helitron-like transposon superfamily from lepidoptera disrupts (GAAA)(n) microsatellites and is responsible for flanking sequence similarity within a microsatellite family. J Mol Evol. 2010, 70: 275-288. 10.1007/s00239-010-9330-6.View ArticlePubMedGoogle Scholar
- Du C, Caronna J, He L, Dooner HK: Computational prediction and molecular confirmation of helitron transposons in the maize genome. BMC Genomics. 2008, 9: 51-10.1186/1471-2164-9-51.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang L, Bennetzen JL: Structure-based discovery and description of plant and animal helitrons. Proc Natl Acad Sci USA. 2009, 106: 12832-12837. 10.1073/pnas.0905563106.PubMed CentralView ArticlePubMedGoogle Scholar
- Dunin-Horkawicz S, Feder M, Bujnicki JM: Phylogenomic analysis of the GIY-YIG nuclease superfamily. BMC Genomics. 2006, 7: 98-10.1186/1471-2164-7-98.PubMed CentralView ArticlePubMedGoogle Scholar
- Roberts RJ, Vincze T, Posfai J, Macelis D: REBASE–a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 2010, 38: D234-D236. 10.1093/nar/gkp874.PubMed CentralView ArticlePubMedGoogle Scholar
- Cock JM, Sterck L, Rouze P, Scornet D, Allen AE, Amoutzias G, Anthouard V, Artiguenave F, Aury JM, Badger JH, Beszteri B, Billiau K, Bonnet E, Bothwell JH, Bowler C, Boyen C, Brownlee C, Carrano CJ, Charrier B, Cho GY, Coelho SM, Collén J, Corre E, Da Silva C, Delage L, Delaroque N, Dittami SM, Doulbeau S, Elias M, Farnham G, Gachon CM, Gschloessl B, Heesch S, Jabbari K, Jubin C: The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature. 2010, 465: 617-621. 10.1038/nature09016.View ArticlePubMedGoogle Scholar
- Bao W, Jurka MG, Kapitonov VV, Jurka J: New superfamilies of eukaryotic DNA transposons and their internal divisions. Mol Biol Evol. 2009, 26: 983-993. 10.1093/molbev/msp013.PubMed CentralView ArticlePubMedGoogle Scholar
- Yuan YW, Wessler SR: The catalytic domain of all eukaryotic cut-and-paste transposase superfamilies. Proc Natl Acad Sci USA. 2011, 108: 7884-7889. 10.1073/pnas.1104208108.PubMed CentralView ArticlePubMedGoogle Scholar
- Stanley TL, Ellermeier CD, Slauch JM: Tissue-specific gene expression identifies a gene in the lysogenic phage gifsy-1 that affects salmonella enterica serovar typhimurium survival in Peyer’s patches. J Bacteriol. 2000, 182: 4406-4413. 10.1128/JB.182.16.4406-4413.2000.PubMed CentralView ArticlePubMedGoogle Scholar
- Peters JE, Craig NL: Tn7: smarter than we thought. Nat Rev Mol Cell Biol. 2001, 2: 806-814. 10.1038/35099006.View ArticlePubMedGoogle Scholar
- Arkhipova IR, Batzer MA, Brosius J, Feschotte C, Moran JV, Schmitz J, Jurka J: Genomic impact of eukaryotic transposable elements. MDNA. 2012, 3: 19-Google Scholar
- Bao W, Jurka J: Origin and evolution of LINE-1 derived “half-L1” retrotransposons (HAL1). Gene. 2010, 465: 9-16. 10.1016/j.gene.2010.06.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Hohjoh H, Singer MF: Cytoplasmic ribonucleoprotein complexes containing human LINE-1 protein and RNA. EMBO J. 1996, 15: 630-639.PubMed CentralPubMedGoogle Scholar
- Martin SL, Bushman FD: Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon. Mol Cell Biol. 2001, 21: 467-475. 10.1128/MCB.21.2.467-475.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Bender J: Cytosine methylation of repeated sequences in eukaryotes: the role of DNA pairing. Trends Biochem Sci. 1998, 23: 252-256. 10.1016/S0968-0004(98)01225-0.View ArticlePubMedGoogle Scholar
- Zhou Y, Cambareri EB, Kinsey JA: DNA methylation inhibits expression and transposition of the neurospora Tad retrotransposon. Mol Genet Genomics. 2001, 265: 748-754. 10.1007/s004380100472.View ArticlePubMedGoogle Scholar
- Roberts D, Hoopes BC, McClure WR, Kleckner N: IS10 transposition is regulated by DNA adenine methylation. Cell. 1985, 43: 117-130. 10.1016/0092-8674(85)90017-0.View ArticlePubMedGoogle Scholar
- Reznikoff WS: The Tn5 transposon. Annu Rev Microbiol. 1993, 47: 945-963. 10.1146/annurev.mi.47.100193.004501.View ArticlePubMedGoogle Scholar
- Yang MK, Ser SC, Lee CH: Involvement of E. coli dcm methylase in Tn3 transposition. Proc Natl Sci Counc Repub China B. 1989, 13: 276-283.PubMedGoogle Scholar
- Low DA, Weyand NJ, Mahan MJ: Roles of DNA adenine methylation in regulating bacterial gene expression and virulence. Infect Immun. 2001, 69: 7197-7204. 10.1128/IAI.69.12.7197-7204.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Heusipp G, Falker S, Schmidt MA: DNA adenine methylation and bacterial pathogenesis. Int J Med Microbiol. 2007, 297: 1-7.View ArticlePubMedGoogle Scholar
- Giacomodonato MN, Sarnacki SH, Llana MN, Cerquetti MC: Dam and its role in pathogenicity of salmonella enterica. J Infect Dev Ctries. 2009, 3: 484-490.View ArticlePubMedGoogle Scholar
- Yau S, Lauro FM, DeMaere MZ, Brown MV, Thomas T, Raftery MJ, Andrews-Pfannkoch C, Lewis M, Hoffman JM, Gibson JA, Cavicchioli R: Virophage control of antarctic algal host-virus dynamics. Proc Natl Acad Sci USA. 2011, 108: 6163-6168. 10.1073/pnas.1018221108.PubMed CentralView ArticlePubMedGoogle Scholar
- Koonin EV, Makarova KS, Aravind L: Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol. 2001, 55: 709-742. 10.1146/annurev.micro.55.1.709.View ArticlePubMedGoogle Scholar
- Keeling PJ, Palmer JD: Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet. 2008, 9: 605-618. 10.1038/nrg2386.View ArticlePubMedGoogle Scholar
- Bao Z, Eddy SR: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002, 12: 1269-1276. 10.1101/gr.88502.PubMed CentralView ArticlePubMedGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110: 462-467. 10.1159/000084979.View ArticlePubMedGoogle Scholar
- Katoh K, Kuma K, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005, 33: 511-518. 10.1093/nar/gki198.PubMed CentralView ArticlePubMedGoogle Scholar
- Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59: 307-321. 10.1093/sysbio/syq010.View ArticlePubMedGoogle Scholar
- Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard JF, Guindon S, Lefort V, Lescot M, Claverie JM, Gascuel O: Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 2008, 36: W465-W469. 10.1093/nar/gkn180.PubMed CentralView ArticlePubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.