The diversification of PHIS transposon superfamily in eukaryotes
© Han et al. 2015
Received: 13 April 2015
Accepted: 17 June 2015
Published: 24 June 2015
PHIS transposon superfamily belongs to DNA transposons and includes PIF/Harbinger, ISL2EU, and Spy transposon groups. These three groups have similar DDE domain-containing transposases; however, their coding capacity, species distribution, and target site duplications (TSDs) are significantly different.
In this study, we systematically identified and analyzed PHIS transposons in 836 sequenced eukaryotic genomes using transposase homology search and structure approach. In total, 380 PHIS families were identified in 112 genomes and 168 of 380 families were firstly reported in this study. Besides previous identified PIF/Harbinger, ISL2EU, and Spy groups, three new types (called Pangu, NuwaI, and NuwaII) of PHIS superfamily were identified; each has its own distinctive characteristics, especially in TSDs. Pangu and NuwaII transposons are characterized by 5′-ANT-3′ and 5′-C|TNA|G-3′ TSDs, respectively. Both transposons are widely distributed in plants, fungi, and animals; the NuwaI transposons are characterized by 5′-CWG-3′ TSDs and mainly distributed in animals.
Here, in total, 380 PHIS families were identified in eukaryotes. Among these 380 families, 168 were firstly reported in this study. Furthermore, three new types of PHIS superfamily were identified. Our results not only enrich the transposon diversity but also have extensive significance for improving genome sequence assembly and annotation of higher organisms.
KeywordsTransposable elements PHIS Diversification Identification
Transposable elements (TEs) are fragments of DNA that can move from one site to another in a genome [1, 2]. TEs are classified into two classes (class 1 and class 2) according to their mechanism of transposition. The transposition mechanism of class 1 elements can be described as copy-and-paste mode, whereas class 2 transposons can be transposed by cut-and-paste mechanism. Recently, more and more genome sequencing revealed that TEs constitute the largest components of most eukaryotic genomes [2–13]. TEs not only have significant impact on the evolution of the host genomes and biological complexity but also are challenges for host genome sequencing, assembly, and annotation due to their repeatability. Thus, the knowledge about TEs characteristics and categories will promote the development of genomics.
In the past decade, many studies focused on identification, annotation, and function of TEs. So far, huge amounts of TEs have been identified and annotated. For example, 42 class 1 superfamilies and 19 class 2 superfamilies were annotated and cataloged in the RepBase database. However, the number of reported TEs could be just the tip of the iceberg. There are a larger number of TEs to be annotated due to their great diversification. For instance, 658 families were classified into unknown TEs in the silkworm; 163 unknown TE families in the maize and about 0.38 % of mouse genome sequences are unknown TEs [12–14]. Thus, the work of identification and annotation of TEs is far from finished.
Recently, we have identified a new group of cut-and-paste transposons designated as Spy . Spy transposons are distinct from all other groups of DNA transposons by their strong insertion preference within the AAATTT motif and the lack of target site duplications (TSDs) upon insertion. In addition, we showed that PIF/Harbinger, ISL2EU, and Spy are evolutionarily related and share a preference for insertion into AT-rich target sequences . For instance, the ISL2EU transposons are characterized by 5′-AT-3′ TSDs and the PIF/Harbinger transposons by 5′-TWA-3′ [16, 17]. Thus, these three groups PIF/Harbinger, ISL2EU, and Spy were classified into the same superfamily that is designated as “PHIS”. The PHIS transposon superfamily is high polymorphism in the target sequences, coding capacity, and conserved motifs of transposase . It is common to find some distinct groups within a given superfamily. Previously, variable nucleotide composition and length of TSDs were found in some superfamilies [16–18]. However, the detailed diversification of PHIS transposon superfamily still remains unclear.
Here, we systematically identified and analyzed PHIS transposons in 836 sequenced eukaryotic genomes using transposase homology search combined with structure approach. Totally, 380 PHIS families including 212 previously reported families and 168 unpublished families were identified in this study. The 380 PHIS families are classified into six groups including three previously reported groups (PIF/Harbinger, ISL2EU, and Spy) and three new groups, called Pangu, NuwaI, and NuwaII. Each new group has its own particular characteristics, especially in TSDs.
The landscape of PHIS transposons in eukaryotic genomes
To investigate the detailed diversification and evolution of PHIS superfamily in eukaryotes, we systematically identified and analyzed the characteristics and distribution of PHIS transposons in 836 eukaryotic genomes using transposase homology search and structure approach. Finally, we identified 380 PHIS transposon families. Furthermore, each of the PHIS consensus sequence defined in this study was subject to homology search against RepBase (as of October 20, 2014) and National Center for Biotechnology Information (NCBI) non-redundant (nr) nucleotide database using Censor and BlastN program. The results of these searches showed that 168 of 380 PHIS families were not reported, and other TEs (212) had been released and cataloged in RepBase, NCBI, or published papers .
Meanwhile, 25 families belong to ISL2EU group. Among these 25 families, 8 families were firstly identified in this study. The others had been cataloged in RepBase (Additional file 1: Table S2). These families shared the following characteristics. (1) The TSDs are 5′-AT-3′ di-nucleotide; however, there is a conserved single A nucleotide in the flank of 5′ terminal of TSDs and a conserved single T nucleotide in the flank of 3′ terminal of TSDs (Additional file 2: Figure S1). Thus, we speculated that the target site sequence of ISL2EU transposons is A|AT|T (where ‘|’ marks the cut site), the analysis of paralogous empty sites further confirmed the target site sequence of ISL2EU. Additional file 2: Figure S2 shows the possible generation mechanism of this TSDs. (2) Most autonomous candidate transposons of ISL2EU contain two ORFs, one ORF encoding the DDE, HTH, and THAP domain-containing transposase, the other ORF encoding a DNA-binding protein with a YqaJ exonuclease domain. Similar to a standard mentioned before, TEs with two intact ORFs are defined as the potential active transposons. Thus, 12 potential active families of ISL2EU group were identified in the eukaryotic genomes (Additional file 1: Table S2 and Fig. 2b) (3). The TIR length ranges from 6 to 259 bp, and the first two nucleotides of TIRs are usually “GG” di-nucleotide (Fig. 1). (4) The average length of consensus sequences of autonomous elements is ~4840 bp. (5) These families are distributed in 14 species. All these species belong to animals.
In this study, we found 54 families that belong to the Spy transposons; however, we did not identify any new Spy transposon family. All these families have been identified in previous study, and the characteristics of Spy transposons were also shown previously . Besides the above three identified PHIS groups (PIF/Harbinger, ISL2EU, and Spy), we also found three new types of PHIS transposons distinct from the previous PHIS transposons in TSDs, and these new types transposons are called Pangu, NuwaI, and NuwaII, respectively.
Characterization and distribution of Pangu transposons
These 34 Pangu transposons are distributed in 15 eukaryotic genomes. These species include two coleopterans, one dipteran, one arachnidan, one molluscan, one hydrozoan, one anthozoan, two ascomycetes, three basidiomycetes, one heterokontophyta, and two algae (Fig. 2). And these species are widely distributed in plants, fungi, and animals. Thus, the Pangu transposons could be ancient elements in the eukaryotic genomes. To estimate the abundance of Pangu transposons in the eukaryotic genomes, the consensus sequence of each family of Pangu was used as query in BlastN (e < 10−5) search against the corresponding genome. A copy for the same family was defined by e value less than e −5, length larger than 50 bp, and nucleotide identity larger than 80 %. Finally, we identified 3270 copies of Pangu group in the eukaryotic genomes (Additional file 1: Table S3, Additional file 3: Table S4, and Fig. 2c).
Characterization and distribution of NuwaI transposons
Characterization and distribution of NuwaII transposons
The transposase of NuwaII is very similar to that of NuwaI in the coding capacity, conserved motifs, and second enzyme structure. For instance, the most autonomous elements of NuwaII transposons contain two ORFs, one ORF encoding the DDE motif-containing transposase (Additional file 2: Figure S3), and the other ORF encoding a Myb/SANT domain-containing protein. Twenty-two potential active NuwaII families with the two intact ORFs were identified in the eukaryotic genomes (Additional file 1: Table S7 and Fig. 2b). In the secondary structure of NuwaII transposase, the first D is located between two beta-sheets, the second D is typically between a beta-sheet and an alpha-helix, and the last E occurs within an alpha-helix (Fig. 5). The average length of consensus sequences of autonomous candidates is ~4685 bp; TIRs length of each family ranges from 13 to 46 bp, and the first two nucleotides of most TIRs are conserved GG. These NuwaII transposons are distributed in 12 species, including 1 turtle, 1 amphibian, 3 bony fishes, 1 amphioxus, 1 tunicate, 1 anthozoan, 2 basidiomycetes, 1 monocot, and 1 eudicot (Fig. 2a). Meanwhile, these species are also distributed in the kingdoms of plants, fungi, and animals. Thus, the NuwaII transposons could be also relatively old elements. Finally, we found 7564 copies of NuwaII group. The genomic abundance and copy number of each NuwaII family in each species are shown in Fig. 2c and Additional file 3: Table S8.
Evolutionary relationships of PHIS transposons
Identification and characterization of PHIS transposons
Previous study suggested that the PHIS is a DNA transposon superfamily with a great diversity in the eukaryotic genomes . However, the detailed diversification and evolution of PHIS superfamily are still unknown. In this study, we systematically identified PHIS transposons in the eukaryotic genomes. A total of 380 families of PHIS superfamily were identified in 112 sequenced eukaryotic genomes. These families were classified into six groups based on the characteristic of each family’s TSDs. Among these groups, three (PIF/Harbinger, ISL2EU, and Spy) have been reported in the previous studies [15, 20, 21]. Beside the above three groups, we found three new transposon groups, called Pangu, NuwaI, and NuwaII.
These types shared similar transposases with DDE motif. However, each group has unique TSDs distinguished from others (Additional file 2: Figure S2). According to the criteria of previous TE classification , the transposases can be aligned over their entire catalytic regions (e value less than e −4), then they belong to the same superfamily. The same group of a superfamily was defined by the same TSD composition. In addition, previous studies showed that variable length or composition of TSDs have been identified in some superfamilies, such as 8–9 bp TSDs in Merlin superfamily, 5–8 bp in hAT, 2–4 bp in CMC, and 4–5 bp in Ginger [16, 22, 23]. Thus, it may be better to define Spy, PIF/Harbinger, and ISL2EU and Pangu, NuwaI, and NuwaII as different groups (at the same level) of the same superfamily (PHIS).
To estimate the abundance of each group in the eukaryotic genomes, the consensus sequence of each family of each group was used as a query in BlastN (e < 10−5) search against corresponding genome. Finally, we found that the abundances of these transposon groups varied in the eukaryotic genomes. For instance, there were 41,385 copies of PIF/Harbinger group, 3647 copies of ISL2EU, 13,089 copies of SPY, 3270 copies of Pangu, 3845 copies of NuwaI, and 7562 copies of NuwaII in the eukaryotic genomes (Additional files 1 and 3: Table S1–S8 and Fig. 2c). However, it should be noted that PHIS transposons were investigated using transposase homology search. Thus, some nonautonomous PHIS transposons (such as MITEs) might be missed in this study. In addition, we found that the number of potential active families varied. For example, there were 88 potential active families of PIF/harbinger, 12 families of ISL2EU, 18 families of SPY, 2 families of Pangu, 11 families of NuwaI, and 22 families of NuwaII in the eukaryotes (Fig. 2b). Furthermore, the abundance of each group was significantly positively correlated with the number of potential active families (Pearson’s product-moment correlation, r = 0.9816605, P = 0.0005). This phenomenon is easy to understand, and the more potential active families will have more copies for a group of PHIS transposon superfamily.
Most groups of PHIS superfamily include two ORFs, one coding for transposase containing DDE motif and the other ORF encoding a DNA-binding protein. However, SPY transposons include only one transposase containing DDE motif . In addition, the additional ORFs of the four groups (including Pangu, PIF/Harbinger, NuwaI, and NuwaII) encode a protein with Myb/SANT domain except that of the ISL2EU transposon that encodes a protein with the Yqaj domain. At present, the functions of the additional ORFs are still unknown, and whether these ORFs are related to the transposition mechanisms also remains unclear . This question could be answered using biochemical studies in the future.
The results of species distribution of PHIS transposons showed that the PHIS elements are completely absent in mammals, birds, sponges, sharks, and coelacanths. This is consistent with a previous study . In addition, it is interesting to see that in some lineages, there is only one of the six groups of PHIS superfamily or only one of the six groups is absent. To our knowledge, the above results could be caused by two reasons. First, some PHIS transposons were lost or degenerated in some species by drift or selection in their original lineages. Second, some species gain different families from other species through horizontal transfer (HT). In addition, almost all of the DNA transposons have the ability of HT, and more and more HT of DNA transposons have been reported in the eukaryotic genomes [25–29]. Furthermore, previous studies suggested that PIF/Harbinger experienced HT events between Drosophila species . However, HT of PHIS transposons remains to be studied in the future.
Evolutionary relationships of PHIS transposons
The result of phylogenetic analysis showed that Pangu elements formed a single clade and were adjacent to IS5 group in the phylogenetic tree. In addition, both Pangu and IS5 transposons shared the same target site sequence (5′-ANT-3′). Furthermore, Pangu elements were widely distributed in plants, fungi, and animals. Thus, we proposed that Pangu is a relatively old PHIS group in the eukaryotic genomes.
Meanwhile, NuwaI and NuwaII transposons formed a single clade in the phylogenetic tree, and they shared the same coding capacity (encoding two ORFs) and the conserved domains (DDE motif and Myb/SANT domain). However, the TSDs of NuwaI are significantly different from the NuwaII transposons. NuwaI and NuwaII transposons should belong to two different groups of PHIS superfamily. Nevertheless, these two types might diverge recently. Thus, the two types cannot be distinguished from each other in the phylogenetic tree.
HarbingerS-9_PI and Harbinger-4_TV had been released as PIF/Harbinger families cataloged in RepBase. However, our phylogenetic analysis indicated that HarbingerS-9_PI was grouped into the clade of Pangu group. Meanwhile, Harbinger-4_TV was grouped into the IS5 clade (Fig. 6). However, we could not find distinct target site duplications (TSDs) in the flank of HarbingerS-9_PI and Harbinger-4_TV families. Right now, we cannot judge if both families should belong to which group of PHIS superfamily.
In the present study, 380 PHIS transposon families were identified in 112 of 836 sequenced eukaryotic genomes using transposase homology search and structure approach. Among these families, 168 families are firstly identified in this study. We systematically analyzed their characteristics including TSDs, TIRs, coding capacity, conserved transposase domain and species distribution, etc. The phylogenetic analysis based on the core catalytic DDE domain of these identified transposases showed that these PHIS transposon families were divided into five clusters including three previous reported clusters (PIF/Harbinger, ISL2EU, and Spy) and two new clusters (Pangu and Nuwa). Nuwa cluster includes two groups called NuwaI and NuwaII. Furthermore, each new group has its own distinctive characteristics, especially in target site sequences. For instance, the Pangu transposons are characterized by 5′-ANT-3′ TSDs, the NuwaI transposons by 5′-CWG-3′, and the NuwaII transposons by 5′-C|TNA|G-3′. Our results reveal the diversification and evolution of PHIS transposons in the eukaryotic genomes and imply that further study on the generation mechanism of varied target sequences of PHIS superfamily will promote the development of new transgenic vectors.
Identification of PHIS superfamily
Eukaryotic genomes including animals (295 species), plants (105 species), fungi (315 species), and protists (121 species) were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/) (as of January 16, 2014), and the information of each species is listed in Additional file 3: Table S9. All published autonomous PHIS elements were downloaded from RepBase (v19.07) . PHIS elements of eukaryotic genomes were identified using the transposase homology search that includes three steps (Additional file 2: Figure S4): (1) the transposase sequences of published PHIS elements were used as a query to do TblastN and TESeeker searches against each genome , where a hit with e value less than 10−4 was considered as candidate PHIS sequence; (2) each candidate PHIS nucleotide sequence was used as a query to BlastN search (e value < e −5, sequence length >50 bp, and nucleotide identity >80 %) against the corresponding genome; (3) the sequences of each cluster were extended in both directions using a Perl script and aligned using MUltiple Sequence Comparison by Log-Expectation (MUSCLE) , then the boundaries of each cluster were manually defined.
Characterization and phylogenetic analysis of PHIS superfamily
To estimate the abundance of each PHIS family in the corresponding genome, the consensus sequence of each family was used as a query in BlastN search against the corresponding genome. Finally, the sequences with the e value less than e −5, length larger than 50 bp, and a minimum nucleotide identity of 80 % were classified as members of the same family. Transposase coding sequences, transposase domains, secondary structures of representative transposases, and the paralogous empty sites were analyzed as described previously . Sequence logos of TIRs and TSDs were created by WebLogo (http://weblogo.berkeley.edu/logo.cgi) . Multiple sequences alignments were performed using MUSCLE software with default parameters. The phylogenetic tree was constructed based on the DDE domains of transposases using MrBayes software (v3.1.2)  with the Blosum model and other parameters with default. The Blosum model was estimated by protest-3.2 software . Meanwhile 3,000,000 generations of Bayesian inference were performed.
Miniature inverted-repeat transposable elements
Terminal inverted repeats
Target site duplications
This work was supported by the National Natural Science Foundation of China (No. 31471197 to ZZ and No. 31401106 to MJH), Postdoctoral Science Foundation of Chongqing (No. Xm2014080 to MJH), and Chongqing Graduate Student Research Innovation Project (CYB14041).
- Feschotte C, Jiang N, Wessler SR. Plant transposable elements: where genetics meets genomics. Nat Rev Genet. 2002;3:329–41.PubMedView ArticleGoogle Scholar
- Finnegan DJ. Eukaryotic transposable elements and genome evolution. Trends Genet. 1989;5:103–7.PubMedView ArticleGoogle Scholar
- de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7:e1002384.PubMed CentralPubMedView ArticleGoogle Scholar
- Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, et al. The genome sequence of the malaria mosquito Anopheles gambiae. Science. 2002;298:129–49.PubMedView ArticleGoogle Scholar
- Kapitonov VV, Jurka J. Molecular paleontology of transposable elements in the Drosophila melanogaster genome. Proc Natl Acad Sci U S A. 2003;100:6569–74.PubMed CentralPubMedView ArticleGoogle Scholar
- Kidwell MG. Transposable elements and the evolution of genome size in eukaryotes. Genetica. 2002;115:49–63.PubMedView ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921.PubMedView ArticleGoogle Scholar
- Meyers BC, Tingey SV, Morgante M. Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Res. 2001;11:1660–76.PubMed CentralPubMedView ArticleGoogle Scholar
- Nene V, Wortman JR, Lawson D, Haas B, Kodira C, Tu ZJ, et al. Genome sequence of Aedes aegypti, a major arbovirus vector. Science. 2007;316:1718–23.PubMedView ArticleGoogle Scholar
- Sanmiguel P, Bennetzen JL. Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons. Ann Bot. 1998;81:37–44.View ArticleGoogle Scholar
- Vicient CM, Suoniemi A, Anamthawat-Jónsson K, Tanskanen J, Beharav A, Nevo E, et al. Retrotransposon BARE-1 and its role in genome evolution in the genus hordeum. Plant Cell. 1999;11:1769–84.PubMed CentralPubMedView ArticleGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–62.PubMedView ArticleGoogle Scholar
- Xu HE, Zhang HH, Xia T, Han MJ, Shen YH, Zhang Z. BmTEdb: a collective database of transposable elements in the silkworm genome. Database (Oxford). 2013;2013:bat055.Google Scholar
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–5.PubMedView ArticleGoogle Scholar
- Han MJ, Xu HE, Zhang HH, Feschotte C, Zhang Z. Spy: a new group of eukaryotic DNA transposons without target site duplications. Gonome Bio Evol. 2014;6:1748–57.View ArticleGoogle Scholar
- Yuan YW, Wessler SR. The catalytic domain of all eukaryotic cut-and-paste transposase superfamilies. Proc Natl Acad Sci U S A. 2011;108:7884–9.PubMed CentralPubMedView ArticleGoogle Scholar
- Bao W, Jurka MG, Kapitonov VV, Jurka J. New superfamilies of eukaryotic DNA transposons and their internal divisions. Mol Biol Evol. 2009;26(5):983–93.PubMed CentralPubMedView ArticleGoogle Scholar
- Kapitonov VV, Jurka J. A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet. 2008;9:411–2.PubMedView ArticleGoogle Scholar
- Zhang X, Feschotte C, Zhang Q, Jiang N, Eggleston WB, Wessler SR. P instability factor: an active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases. Proc Natl Acad Sci U S A. 2001;98:12572–7.PubMed CentralPubMedView ArticleGoogle Scholar
- Kapitonov VV, Jurka J. Molecular paleontology of transposable elements from Arabidopsis thaliana. Genetica. 1999;107:27–37.PubMedView ArticleGoogle Scholar
- Walker EL, Eggleston WB, Demopulos D, Kermicle J, Dellaporta SL. Insertions of a novel class of transposable elements with a strong target site preference at the r locus of maize. Genetics. 1997;146:681–93.PubMed CentralPubMedGoogle Scholar
- Feschotte C. Merlin, a new superfamily of DNA transposons identified in diverse animal genomes and related to bacterial IS1016 insertion sequences. Mol Biol Evol. 2004;21:1769–80.PubMedView ArticleGoogle Scholar
- Bao W, Kapitonov VV, Jurka J. Ginger DNA transposons in eukaryotes and their evolutionary relationships with long terminal repeat retrotransposons. Mob DNA. 2010;1:3.PubMed CentralPubMedView ArticleGoogle Scholar
- Jiang N, Bao Z, Zhang X, Hirochika H, Eddy SR, McCouch SR, et al. An active DNA transposon family in rice. Nature. 2003;421:163–7.PubMedView ArticleGoogle Scholar
- Schaack S, Gilbert C, Feschotte C. Promiscuous DNA: horizontal transfer of transposable elements and why it matters for eukaryotic evolution. Trends Ecol Evol. 2010;25:537–46.PubMed CentralPubMedView ArticleGoogle Scholar
- Daniels SB, Peterson KR, Strausbaugh LD, Kidwell MG, Chovnick A. Evidence for horizontal transmission of the P transposable element between Drosophila species. Genetics. 1990;124:339–55.PubMed CentralPubMedGoogle Scholar
- Maruyama K, Hartl DL. Evidence for interspecific transfer of the transposable element mariner between Drosophila and Zaprionus. J Mol Evol. 1991;33:514–24.PubMedView ArticleGoogle Scholar
- Zhang HH, Xu HE, Shen YH, Han MJ, Zhang Z. The origin and evolution of six miniature inverted-repeat transposable elements in Bombyx mori and Rhodnius prolixus. Genome Biol Evol. 2013;5:2020–31.PubMed CentralPubMedView ArticleGoogle Scholar
- Gilbert C, Hernandez SS, Flores-Benabib J, Smith EN, Feschotte C. Rampant horizontal transfer of SPIN transposons in squamate reptiles. Mol Biol Evol. 2012;29:503–15.PubMed CentralPubMedView ArticleGoogle Scholar
- Casola C, Lawing AM, Betrán E, Feschotte C. PIF-like transposons are common in drosophila and have been repeatedly domesticated to generate new host genes. Mol Biol Evol. 2007;24:1872–88.PubMedView ArticleGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–7.PubMedView ArticleGoogle Scholar
- Kennedy RC, Unger MF, Christley S, Collins FH, Madey GR. An automated homology-based approach for identifying transposable elements. BMC Bioinformatics. 2011;12:130.PubMed CentralPubMedView ArticleGoogle Scholar
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.PubMed CentralPubMedView ArticleGoogle Scholar
- Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–100.PubMed CentralPubMedView ArticleGoogle Scholar
- Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–4.PubMedView ArticleGoogle Scholar
- Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27:1164–5.PubMedView ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.