The contribution of transposable elements to size variations between four teleost genomes
© Gao et al. 2016
Received: 20 October 2015
Accepted: 11 January 2016
Published: 9 February 2016
Teleosts are unique among vertebrates, with a wide range of haploid genome sizes in very close lineages, varying from less than 400 mega base pairs (Mb) for pufferfish to over 3000 Mb for salmon. The cause of the difference in genome size remains largely unexplained.
In this study, we reveal that the differential success of transposable elements (TEs) correlates with the variation of genome size across four representative teleost species (zebrafish, medaka, stickleback, and tetraodon). The larger genomes represent a higher diversity within each clade (superfamily) and family and a greater abundance of TEs compared with the smaller genomes; zebrafish, representing the largest genome, shows the highest diversity and abundance of TEs in its genome, followed by medaka and stickleback; while the tetraodon, representing the most compact genome, displays the lowest diversity and density of TEs in its genome. Both of Class I (retrotransposons) and Class II TEs (DNA transposons) contribute to the difference of TE accumulation of teleost genomes, however, Class II TEs are the major component of the larger teleost genomes analyzed and the most important contributors to genome size variation across teleost lineages. The hAT and Tc1/Mariner superfamilies are the major DNA transposons of all four investigated teleosts. Divergence distribution revealed contrasting proliferation dynamics both between clades of retrotransposons and between species. The TEs within the larger genomes of the zebrafish and medaka represent relatively stronger activity with an extended time period during the evolution history, in contrast with the very young activity in the smaller stickleback genome, or the very low level of activity in the tetraodon genome.
Overall, our data shows that teleosts represent contrasting profiles of mobilomes with a differential density, diversity and activity of TEs. The differences in TE accumulation, dominated by DNA transposons, explain the main size variations of genomes across the investigated teleost species, and the species differences in both diversity and activity of TEs contributed to the variations of TE accumulations across the four teleost species. TEs play major roles in teleost genome evolution.
KeywordsTransposable elements Teleosts Genome size evolution Activity Diversity
TEs are mobile genetic units and are a major constituent of a cell’s “mobilome”. They exhibit a broad range of diversity in their structure and transposition mechanisms, and are subdivided into two classes depending on their transposition mode: via RNA for class I retrotransposons and via DNA for class II transposons . Class I retrotransposons include long terminal repeat retrotransposons (LTRs), long interspersed nuclear elements (LINEs), and short interspersed nuclear elements (SINEs) . Class II transposons can be divided into three major subclasses: cut-and-paste DNA transposons, rolling-circle DNA transposons (Helitrons), and self-synthesising DNA transposons (Polintons/Mavericks) . Cut-and-paste transposons, which are very diverse, have been classified into superfamilies (hAT, Tc1/Mariner, etc.) based on the similarity of their transposases and on shared structural features, including the terminal inverted repeat (TIR) sequence and the length of the target site duplication (TSD) that flanks the TIR and is generated during integration . Due to their unique ability to transpose, and because they frequently amplify, TEs are major determinants of genome size [4, 5] and have been highly influential in shaping the structure and evolution of eukaryotic genomes. TEs constitute the largest component of mammalian genomes [6–8]; using the RepeatMasker approach [9, 10] it was predicted that approximately half of the human genome is covered by TEs, while recent annotation by the P-clouds pipeline suggests the TE coverage in human genome may be closer to two-thirds . Most TEs of mammals are belong to class I retrotransposons, and the L1 family of LINEs is still active [6–8, 10].
Teleostean fish constitute the most diverse vertebrate group, and this diversity is also reflected in the diversity of their genome size and structure . Although the available genome sequences for analysis (over 10 species) is minuscule in the huge species diversity of this clade, four representative teleost species, zebrafish (Danio rerio, Dr), medaka (Oryzias latipes, Ol), stickleback (Gasterosteus aculeatus, Ga), and tetraodon (Tetraodon nigroviridis, Tn), being of particular interest both experimentally and evolutionarily, have been sequenced as well [13–16]. Medaka, stickleback, and tetraodon belong to the superorder of Acanthopterygii, zebrafish belongs to the superorder of Ostariophysi; they all arose in the triassic period and are relatively close compared with the other class fishes . However, genome sizes vary across these four teleost species by over four times. The zebrafish genome, with a size of approximately 137.17 Mb, is the largest, followed by the medaka with 869.00 Mb, then the stickleback with 461.53 Mb, while the tetraodon genome, with 358.62 Mb, is the smallest . The variation of genome size between these close lineages remains largely unexplained. Transposable elements (TEs), as a major component of vertebrate genomes, may be a potential source for understanding the fish genome evolution. The initial annotations of four teleost (zebrafish, medaka, stickleback, and tetraodon) genomes have suggested that major differences in TE content exist between lineages [13–16]; and comparisons of TE diversity and evolution have revealed that teleost genomes contain the highest diversity of TE superfamilies in vertebrates , however, the TE contents in the early assembles of medaka, stickleback, and tetraodon tent to be underestimated and inaccurate due to the repeat database is far from complete; information on the distribution of TE diversity and density, and the evolution dynamics intra-species of teleosts, and the knowledge of the roles of TEs in teleost genome architecture and evolution is still reduced and fragmented. To better understand the different success rates of TEs and the evolution of genomes within teleosts, in this study we re-annotated the mobilomes of four representative teleost species (zebrafish, medaka, stickleback, and tetraodon) by using multiple de novo repeat prediction pipelines (RepeatModel, MGEScan-non-LTR, LTRharvest, RetroTector) with a combination of known repeat elements from the RepBase database; we identified diverse autonomous families of DNA transposons (hAT and Tc1 superfamilies) and retrotransposons, investigated the evolutionary pattern of TEs and the phylogenetic relationship among various TE clades and superfamilies, and highlighted the differences of TE activity, diversity and abundance within four teleost species. By integrating analyses of these four teleost species, we can perform a comprehensive analysis of mobilomes across the four species and make inferences about the causes of genome size variations within the four teleosts.
Dramatically different expansion of TEs across the four teleost genomes
TE coverage in teleost genomesa
Total interspersed repeats
The greatest difference in TEs between the teleost species lies in the abundance of class II TEs (DNA transposons; Table 1 and Fig. 1a). This class of repeats has a striking amplification in the largest genome of zebrafish, where they contribute over 41.07 % (562.49 Mb) of the sequenced genome. In the second largest genome, the medaka, DNA repeats contribute 11.00 % (77.14 Mb) of the genome (Table 1). However, the proliferation of DNA transposons in the smaller genomes of the stickleback and tetraodon is weak, and this class of TEs only represents 4.47 % (19.96 Mb) and 1.55 % (4.68 Mb) of their sequenced genomes, respectively (Table 1). Retrotransposons (class I transposons), including SINE, LINE and LTR repeats, also display different expansions between teleost species. The overall contents for retrotransposons represent 12.00 % (164.29 Mb) of the zebrafish genome, which is substantially higher than that in the medaka (8.37 %/58.71 Mb), stickleback (6.61 %/29.50 Mb), and tetraodon (4.00 %/12.08 Mb) genomes; the zebrafish represents the highest abundance of both LTR (5.90 %) and SINE (2.24 %) retrotransposons across teleost species; while the medaka shows the highest accumulation of LINEs at 4.97 % of the total sequenced genome (Table 1). Compared with other types of TEs, SINEs represent a relatively weak proliferation in most teleost species except zebrafish (Table 1). The proportion of satellites in the zebrafish genome (1.50 %) is higher than that observed in the medaka (0.16 %), stickleback (0.09 %), and tetraodon (0.08 %) genomes. The proportion of simple repeats in the zebrafish genome (0.99 %) is higher than that in the tetraodon (0.74 %), medaka (0.29 %) and stickleback (0.25 %) genomes (Table 1).
Dramatically different accumulation of DNA transposons across the four teleost genomes
Abundance of DNA transposons in teleost genomes
TE coverage (copy number/base pairs masked/%)
Cut and paste TE
These DNA transposons dominate the size variation in teleost genomes; the larger genomes accumulate many more DNA repeats than smaller ones. Typically, over 100 times more genome content (562.49 Mb) derived from DNA transposon amplification was identified in the zebrafish than in the tetraodon (4.68 Mb), and almost all types of DNA repeats appear to occur more frequently in the larger genomes than the smaller ones (Tables 1 and 2). Two dominant families of cut-and-paste DNA transposons in all four teleost species are hAT and Tc1/Mariner (Table 2). Four of the other cut-and-paste DNA superfamilies (CMC-EnSpm, PIF-Harbinger, Kolobok, and PiggyBac) have also amplified to significant numbers (over 1 %) in the zebrafish genome. In addition to hAT and Tc1/Mariner, the PIF-Harbinger superfamily in the medaka genome has amplified to significant numbers as well, and comprised 1.34 % (11.66 Mb) of the genomic sequences. The other superfamilies did not show significant expansion (<1 %) in the four teleost genomes (Fig. 2 and Table 2).
Different distribution of LINE and LTR family diversity within the four teleost genomes
Distribution of LINE families in teleost genomesa
Distribution of LTR families in teleost genomesa
In total, 10, 1, and 16 ERV families were identified in the genomes of the zebrafish, medaka, and stickleback, respectively, and no ERV families were detected in the tetraodon genome (Table 4 and Fig. 7). These ERVs were classified into 2 clades (Eplison retrovirus and Spuma retrovirus) and belong to the Class I and Class III ERV groups by phylogenetic analysis. No ERVs of Class II was detected in teleost species (Fig. 7). The majority of teleost ERVs belong to the known clade of Eplison retroviruses of Class I ERV, which has been reported in fishes and Xenopus [27, 28]. Only one ERV, from the zebrafish genome, is branched with known foamy virus proteins from mammals , and classified as the Spuma clade of Class III ERV (Fig. 7).
Differential proliferation dynamics of class I TEs across the four teleost genomes
Generally, the retrotransposons within the larger genomes of the zebrafish and medaka have been active over an extended time period, in contrast with the predominantly recent activity in the smaller stickleback genome, or the extremely low level of activity in the tetraodon genome (Fig. 8). Both LTRs and LINEs in the zebrafish and stickleback genomes show evidence of very strong recent activity, in contrast to the recent decrease in activity for most types of retrotransposons in teleost species. Compared with other retrotransposons, SINEs present a very low level of activity in most teleost species, except for the zebrafish, where this repeat type has undergone one round of substantial accumulation between the divergence of 10 and 15 %, followed by a dramatic decrease in recent activity. Current activity is very limited, as shown by the distribution of very few repeats with <5 % divergence from the consensus (Fig. 8).
The clades of L1, L2, RTE, and Rex-Babar are the major repeat types of LINE in teleost species and have experienced substantial expansion during their evolutionary histories, while the other clades did not get significant amplification (Additional file 4: Table S4). The predominant clade of LINEs in most teleost genomes is L2, which contributes 1.61, 1.57, and 1.20 % to the genomes of the zebrafish, medaka, and stickleback, respectively (Additional file 4: Table S4). An in-depth divergence analysis revealed that the L2 clade has been highly active over an extended time period and shows predominantly recent activity in these teleost species (Additional file 5: Figure S1A, B, and C). The second most abundant clade of LINEs in zebrafish is L1, which represents 1.24 % coverage of the genome, with highly recent activity (Additional file 4: Table S4 and Additional file 5: Figure S1). RTE in medaka and Rex-Babar in stickleback represent the second most abundant clade of LINEs, respectively, Rex-Babar is the major clade of LINE in the tetraodon lineage, whereas the activity of all other clades of LINE within this lineage is very limited (Additional file 4: Table S4 and Additional file 5: Figure S1). The substantial recent expansion of Rex-Babar within the stickleback and tetraodon genomes was in contrast with the weak accumulation of this clade in the lineages of the zebrafish and medaka (Additional file 4: Table S4 and Additional file 5: Figure S1).
The most abundant group of LTRs in all four teleost species is Gypsy, which comprises 2.42, 1.24, 1.85, and 1.24 % of the zebrafish, medaka, stickleback, and tetraodon genomes, respectively. This group exhibits a distinct mode of evolution with a substantially recent accumulation within the zebrafish and stickleback genomes, in contrast with the relatively old proliferation dynamics within the medaka and tetraodon lineages (Additional file 4: Table S4 and Additional file 6: Figure S2). The DIRS group shows significant proliferation only in the zebrafish lineage (1.06 %) with predominantly recent activity, which is very rare within the other three teleost species. Substantial expansion of ERVs within the zebrafish (0.66 %) and stickleback (0.96) lineages was observed, which is relatively higher than that in the medaka (0.08 %) and tetraodon (0.18 %) lineages; while apparent accumulations of Ngaro in the zebrafish (0.89 %) and medaka (0.65) lineages were observed, compared to an extremely low abundance in the stickleback (0.11 %) and tetraodon (0.12 %) lineages (Additional file 4: Table S4 and Additional file 6: Figure S2).
TE proliferation and genomic expansion in teleosts
Using species-specific TE libraries, which combine the update RepBase database, and the de novo repeats extracted by multipiplines, we re-annotated the mobilomes of the four representative teleosts (zebrafish, medaka, stickleback, tetraodon). The estimated fraction of repeats within zebrafish in this study (56.49 %) is similar to the 52.2 % of the previous report , and substantially higher than that of most investigated vertebrates, including carp (31.3 %) , lizards (34.4 %) , western clawed frog (34.5 %) , and birds (7–9 %) [33, 34], but comparable to the 45–52 % density in some mammalian genomes . However, the coverage of repeat contents in the genome of the medaka (33.70 %) by this study is much higher (about 16.2 %) than that in the early TE annotation of the medaka genome . This disagreement may be due to a significant original underestimation, since the medaka repeat database is far from complete and dense repeat regions are underrepresented in the previous draft assembly. While the density of interspersed repeats in the tetraodon genome (7.13 %) is clearly higher than the 2.7 % observed in the its close relative, fugu , previous size estimations suggested that the tetraodon genome might be more compact than the genome of fugu . The coverage of repeats within the stickleback genome (14.21 %) annotated in the current study is far below the 25.2 % of the previous estimate ; the cause of this discrepancy is unclear, since the annotation method in that report is unavailable.
In this study, we confirmed that teleosts are unique among vertebrates in their overall TE composition, which represents an extraordinarily different expansion of TEs (7.13–56.49 %) across four lineages that far exceeds the variation of TEs reported in extant mammals (36–52 %) [8, 35], salamanders (25–48 %) , or birds (7–9 %) [33, 34]. The relationship between genome size and TE coverage in different organisms has previously revealed a general positive trend [5, 18, 38, 39]; species with larger genomes have commensurately larger proportions of TE-derived DNA. Our findings confirmed this correlation within the four teleost lineages, and the total TE contents estimated for our four teleost species match very well with the predictions based on genome size, which were well illustrated by the smallest genome of the tetraodon (7.13 % comprised of TEs) and the largest genome of the zebrafish (56.49 % comprised of TEs). Furthermore, this study uncovered that the difference is largely due to the differential expansion of class II TEs (DNA transposons) across the four teleost species. These results suggest that the differential expansion of TEs, particularly DNA transposons, is a major molecular mechanism contributing to the size variation of genomes in the four teleost species. This is similar to that in western clawed frog as an amphibian , but contrasts with most mammals and reptiles, where the expansion of the genome is dominated by LTR or non-LTR retrotransposons [7, 8, 10, 31, 37].
Comparison of the diversity and activity of TEs between the four teleost genomes
In the current study, we found that teleost fish genomes represent extremely high diversity of TEs compared with the other vertebrate genomes, which is in agreement with the previous studies [18, 21, 22, 40]; furthermore, we performed a systematic comparative analysis of the intra-lineage diversity and activity of TEs across the four teleosts, and our data suggested that the differences in genome content among taxa are not limited to differences in a specific type of TE accumulation. The differences in both the diversity and activity of TEs contribute to the variances of TEs across teleost lineages. The diversity of TEs at the group level across teleost genomes is broadly similar, but the diversity at the clade (superfamily) and family level shows significant differences, and the smaller genomes have reduced clade (superfamily) and family diversity compared with the larger genomes, which has also been observed in snake lineages . On the other hand, species differences in TE activity may result in changes in TE accumulation as well. In the current study, we found that zebrafish, with a fairly high TE content, represents a long-lasting and higher level of TE activity in its evolutionary history compared with the other three teleost lineages, and many DNA, LTR and LINE families show evidence of recent and ongoing proliferation, while most types of these transposons in the medaka, stickleback, and tetraodon genomes represent either a relatively young expansion and/or a rapid decrease in activity, or extremely low activity during their evolutionary history. Uncovering the reasons of the variation of diversity and activity across these teleost species is a very difficult task, particularly because TEs can also be introduced through horizontal transfer into lineages. The fertilization way, body temperature, and host defense mechanisms in opposition to TE activity (or family competition) have been suggested as biological features that may shape susceptibility to TEs in vertebrates [42, 43]. Internal fertilization may minimize exposure of gametes (and embryos) to horizontal transfer of TEs compared with external fertilization, however the four teleost lineages share the same fertilization way, and the body temperature of the four investigated teleosts, varying with the temperature of their surroundings, may also not be the principal determinant. Thus the family competition, the capacity to replicate and compete with other TEs, which is determined by the host defense mechanisms and TE itself, may be the major determinant of TE differences across the four teleost species. At least two host controlling mechanisms of the family competition of TEs: (i) cosuppression usually mediated by small interfering RNA (siRNA) and (ii) methylation, have been proved in C. elegans  and mice , may play roles in the evolution of diversity and activity of TEs in teleost as well. However, tests of these hypotheses and critical reevaluation will be required for further deep understanding of the regulation, mobility, and rates of expansion and extinction of TEs in teleosts.
Evolutionary dynamics of TEs in teleost genomes compared with other vertebrates
Evolutionary dynamics of TEs between vertebrates differ drastically. The genomes of mammals and birds contain few types of TE lineages which are very abundant but relatively inactive [7, 10, 33, 34]. However, our study distinctly shows that the level of class I and class II transposon diversity and activity in teleost genomes is much higher than that seen in either bird or mammalian genomes [16, 39, 46, 47], is similar to that observed in coelacanths  and cod , and comparable with the prevalence in lizards and western clawed frog [31, 32]. Recently active TEs (with a divergence of less than 5 %) are more common in teleost genomes than in mammals or birds [8, 10, 33, 34].
The estimated fractions of LINEs in teleost genomes (1.97–4.97 %) are substantially lower than in lizards (12.34 %) and mammals (about 20 %) [6, 8, 10, 31], and comparable to that of birds (6 %), coelacanths (6.43 %), cod (3.3 %), and western clawed frog (5.4 %) [32–34, 48, 49]. However, LINEs within teleost genomes represent extremely high diversity with 6 groups. The L1 clade of LINEs contains numerous families and shows signs of recent activity. Some clades of LINEs were observed in teleost genomes, but were absent from western clawed frog, lizards, chickens and humans [10, 31, 32, 34]. Many LINE clades and families within teleost genomes seem to be recent insertions, based on their divergence analysis; this is similar to the proliferation dynamics of LINEs in lizards and western clawed frog [31, 32]. Among these is an unusually high diversity of very young families of L1 retrotransposons in the zebrafish genome, which represents the most diverse group of LINEs, containing four branches (Swimmer, Tx1-a, Tx1-b, and Tx1-c). Each branch yields highly prolific families, yet this group only covers 1.24 % of the zebrafish genome. This contrasts with observations of both mammalian and bird genomes, where only a single active family of L1 of LINEs has predominated over 10 Mya, with about a 20 % coverage of genome. In birds the most predominant TE elements are CR1 LINEs (about 6 % of the genome) and these have been demonstrated to be degenerated and nonfunctional [7, 10, 34].
Compared to lizards, western clawed frog, mammals, and birds [7, 10, 31, 32, 34], LTR retrotransposons are also very diverse and active in teleost genomes. Representatives of the seven major groups of LTR elements, including endogenous retroviruses (BEL/PAO, Copia, DIRS, ERV, Gypsy, Ngaro), with diverse clades and numerous families were identified. In particular, an unexpectedly high diversity of Gypsy (7 clades) and BEL/PAO (3 clades) were found in teleost genomes, and each clade contains diverse active families. While the Ngaro group is absent in western clawed frog and lizards [31, 32], only ERV may still be active in birds and mammals, and all other LTR groups (BEL/PAO, Copia, DIRS, Gypsy, and Ngaro) are absent or only present as fossils [7, 9, 33, 34]. This high diversity of LTR retrotransposons was already noted within teleost genomes in previous analysis [14, 40]. The estimated fractions of LTRs within the lineages vary from 1.95 % of the tetraodon genome to 5.90 % of the zebrafish, which are substantially higher than in coelacanths (0.86 %), and comparable to that in cod (4.88 %) and western clawed frog (1.75 %) [32, 48, 49].
Teleosts are unique among vertebrates in their proliferation dynamics of DNA transposons; DNA transposons vary dramatically in abundance across teleost species, dominate the variations in genome size, and also represent the highest level of diversity among vertebrates. The coverage of DNA transposons varies across teleost genomes, from 1.55 % in the tetraodon genome to 41.07 % in the zebrafish. The zebrafish genome contains a marked excess of DNA transposons, which is unique among sequenced vertebrate genomes, and is substantially higher than in very close lineages of carp (17.53 %). Indeed, only western clawed frog genome, which is comprised of 25 % DNA transposons, are comparable. The estimated fractions of DNA transposons in the medaka (11.00 %) and stickleback (4.47 %) genomes are substantially higher than in coelacanths (0.20 %) , lungfish (1.3 %) , birds (less than 1 %) [33, 34] and mammals (less than 3 %) [7, 10], but comparable to that in lizards (8.86 %) , salamanders (6.37 %) , and cod (6.39 %) .
The diversity of teleost DNA transposons, which was already noted previously [18, 30], far exceeds that in other examined vertebrates, including mammals, birds, coelacanths, cod, lizards, and western clawed frog [31, 32, 34, 46, 48]. A particularly high abundance and diversity of hAT and Tc1/Mariner was found in teleost genomes. Nine superfamilies of DNA transposons, including Ginger, Sola, CMC-EnSpm, Crypton, Dada, MULE-MuDR, P, PIF-ISL2EU, and Academ, were observed in teleosts that were absent in lizards, western clawed frog, and coelacanths [31, 32, 48]. In addition, diverse autonomous hAT and Tc1/Mariner subfamilies were identified in teleost genomes, suggesting that the DNA transposons seem to be relatively young and active in teleosts, in contrast to the few recently active DNA transposons found in mammals and birds [7, 10, 33, 34]. Overall, teleosts have an extremely wide diversity and high level of activity of TEs, but represent a significantly different success of TEs across lineages, while mammalian genomes are enriched with L1 elements but a low level of diversity and have a high degree of TE expansion, and bird genomes exhibit low TE density with very little mobile element activity.
In this study, we investigated the diversity, activity, and abundance distribution of TEs among four closely related teleost species. In contrast to other vertebrates, teleosts display contrasting profiles of mobilomes across the four investigated lineages. The larger genomes represent a higher diversity and activity within each family and a greater abundance of TEs compared with the smaller genomes. The differences in TE expansion, dominated by DNA transposons, explain the main size variation in the four teleost genomes, and the species differences in both the diversity and activity of TEs contribute to the variations in TE accumulations. TEs play pivotal roles in teleost genome evolution.
Computational identification of interspersed repeats
The zebrafish (GRCz10), medaka (MEDAKA1), stickleback (BROADS1), and tetraodon (TETRAODON8) genomes were downloaded from the Ensembl database (http://asia.ensembl.org/index.html). The repeat contents of the zebrafish, medaka, stickleback and tetraodon genomes were assessed using RepeatMasker (http://www.repeatmasker.org/), RepeatModeler (http://repeatmasker.org/RepeatModeler.html) and ab initio repeat prediction programmes. The RepBase (http://www.girinst.org/) of consensus repeat sequences  was used to identify repeats in the genome derived from known classes of elements. RepeatModeler was used to build de novo repeats. The autonomous hAT and Tc1/Mariner DNA transposons were queried using TBLASTN to detect the presence of coding sequences related to all known DNA transposon superfamilies in RepBase . The top 10–40 non-overlapping hits (generally Evalue <10−5) were extracted, along with 500 bp of flanking sequence, aligned using a local installation of MUSCLE , and used to construct consensus sequences. For each consensus, coding sequences were predicted by using Open Reading Frame (ORF) Finder (http://www.ncbi.nlm.nih.gov/projects/gorf/). The non-LTR retrotransposons were identified by MGEScan-non-LTR , and the LTR retrotransposons, including endogenous retroviruses (ERVs), were identified by LTRharvest  and RetroTector . The autonomous LTRs were classified into families based on amino acid sequence similarity (80 %) of the ORF containing RT domain; while the autonomous LINEs were classified into families based on the structure of ORFs and amino acid sequence similarity (80 %) of the ORF2.
Repeats characterized as putative TEs by the previous approach were joined to the RepBase database of TEs (update 20150807), and the redundancies were filtered out to create a custom library for comparison to find the distribution and coverage of TEs in the genome using RepeatMasker (RepeatMasker -open-4.0.5). The redundant repeats were removed based on the 80-80 rule, which considers two sequences as belonging to same TE family if they can be aligned over more than 80 % of their length, with over 80 % identity. The new non-redundant repeats of the four teleost species were given in fasta file format in Additional files (Additional files 7, 8, 9 and 10).
Bootstrapped (1000 replicates) neighbour-joining (NJ) phylogenetic trees were generated using MEGA5  based on a muscle multiple protein alignment with the conserved domain of the DNA transposases or RT (reverse transcription) domain of retrotransposons. For the hAT superfamily, we used a conserved 39 aa-long region of hAT transposase  to build the alignment, and then deduced the NJ tree. For the Tc1/Mariner superfamily, the NJ tree was generated by using a multiple sequence alignment with the most conserved domain of the Tc1/Mariner transposase (about 150 aa) corresponding to the catalytic “DDE” domain, as in . For retrotransposons (LINEs, LTRs and ERVs), the NJ tree was generated by using an amino acid multiple alignment of the conserved RT domain from retrotransposons and reference elements. All these alignment are available upon request.
Divergence distribution of interspersed repeats
The average number of substitutions per site (K) for each fragment was estimated according to the divergence levels reported by RepeatMasker, using the one-parameter Jukes-Cantor formula K = −300/4 × Ln(1–D × 4/300) as in , where D represents the proportion of sites that differ between the fragmented repeat and the consensus sequence.
long interspersed nuclear elements
long terminal repeats
mega base pairs
open reading frame
short interspersed nuclear elements
target site duplication
terminal inverted repeat
This work was funded by the Natural Science Foundation of China (NSFC) (31200920), by the National Major Transgenic Project of China (2014ZX08006005-008), and by the Priority Academic Program Development of Jiangsu Higher Education Institutions.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Finnegan DJ. Eukaryotic transposable elements and genome evolution. Trends Genet. 1989;5(4):103–7.View ArticlePubMedGoogle Scholar
- Xiong Y, Eickbush TH. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 1990;9(10):3353–62.PubMed CentralPubMedGoogle Scholar
- Kapitonov VV, Jurka J. A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet. 2008;9(5):411–2. author reply 414.View ArticlePubMedGoogle Scholar
- Petrov DA. Evolution of genome size: new approaches to an old problem. Trends Genet. 2001;17(1):23–8.View ArticlePubMedGoogle Scholar
- Piegu B, Guyot R, Picault N, Roulin A, Sanyal A, Kim H, et al. Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res. 2006;16(10):1262–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Groenen MA, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 2012;491(7424):393–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420(6915):520–62.View ArticlePubMedGoogle Scholar
- Li R, Fan W, Tian G, Zhu H, He L, Cai J, et al. The sequence and de novo assembly of the giant panda genome. Nature. 2010;463(7279):311–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996;6(6):743–8.View ArticlePubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921.View ArticlePubMedGoogle Scholar
- de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7(12), e1002384.PubMed CentralView ArticlePubMedGoogle Scholar
- Volff JN. Genome evolution and biodiversity in teleost fish. Heredity (Edinb). 2005;94(3):280–94.View ArticleGoogle Scholar
- Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431(7011):946–57.View ArticlePubMedGoogle Scholar
- Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J, et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature. 2012;484(7392):55–61.PubMed CentralView ArticlePubMedGoogle Scholar
- Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007;447(7145):714–9.View ArticlePubMedGoogle Scholar
- Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013;496(7446):498–503.PubMed CentralView ArticlePubMedGoogle Scholar
- Hedges SB, Marin J, Suleski M, Paymer M, Kumar S. Tree of life reveals clock-like speciation and diversification. Mol Biol Evol. 2015;32(4):835–45.PubMed CentralView ArticlePubMedGoogle Scholar
- Chalopin D, Naville M, Plard F, Galiana D, Volff JN. Comparative analysis of transposable elements highlights mobilome diversity and evolution in vertebrates. Genome Biol Evol. 2015;7(2):567–80.PubMed CentralView ArticlePubMedGoogle Scholar
- Rho M, Tang H. MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes. Nucleic Acids Res. 2009;37(21), e143.PubMed CentralView ArticlePubMedGoogle Scholar
- Volff JN, Korting C, Altschmied J, Duschl J, Sweeney K, Wichert K, et al. Jule from the fish Xiphophorus is the first complete vertebrate Ty3/Gypsy retrotransposon from the Mag family. Mol Biol Evol. 2001;18(2):101–11.View ArticlePubMedGoogle Scholar
- Volff JN, Bouneau L, Ozouf-Costaz C, Fischer C. Diversity of retrotransposable elements in compact pufferfish genomes. Trends Genet. 2003;19(12):674–8.View ArticlePubMedGoogle Scholar
- Fischer C, Bouneau L, Coutanceau JP, Weissenbach J, Ozouf-Costaz C, Volff JN. Diversity and clustered distribution of retrotransposable elements in the compact genome of the pufferfish Tetraodon nigroviridis. Cytogenet Genome Res. 2005;110(1–4):522–36.View ArticlePubMedGoogle Scholar
- Llorens C, Munoz-Pomer A, Bernad L, Botella H, Moya A. Network dynamics of eukaryotic LTR retroelements beyond phylogenetic trees. Biol Direct. 2009;4:41.PubMed CentralView ArticlePubMedGoogle Scholar
- Goodwin TJ, Poulter RT. A group of deuterostome Ty3/ gypsy-like retrotransposons with Ty1/ copia-like pol-domain orders. Mol Genet Genomics. 2002;267(4):481–91.View ArticlePubMedGoogle Scholar
- Bae YA, Moon SY, Kong Y, Cho SY, Rhyu MG. CsRn1, a novel active retrotransposon in a parasitic trematode, Clonorchis sinensis, discloses a new phylogenetic clade of Ty3/gypsy-like LTR retrotransposons. Mol Biol Evol. 2001;18(8):1474–83.View ArticlePubMedGoogle Scholar
- Marin I, Llorens C. Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data. Mol Biol Evol. 2000;17(7):1040–9.View ArticlePubMedGoogle Scholar
- Martineau D, Bowser PR, Renshaw RR, Casey JW. Molecular characterization of a unique retrovirus associated with a fish tumor. J Virol. 1992;66(1):596–9.PubMed CentralPubMedGoogle Scholar
- Sinzelle L, Carradec Q, Paillard E, Bronchain OJ, Pollet N. Characterization of a Xenopus tropicalis endogenous retrovirus with developmental and stress-dependent expression. J Virol. 2011;85(5):2167–79.PubMed CentralView ArticlePubMedGoogle Scholar
- Jern P, Sperber GO, Blomberg J. Use of endogenous retroviral sequences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy. Retrovirology. 2005;2:50.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu P, Zhang X, Wang X, Li J, Liu G, Kuang Y, et al. Genome sequence and genetic diversity of the common carp. Cyprinus Carpio Nat Genet. 2014;46(11):1212–9.View ArticlePubMedGoogle Scholar
- Alfoldi J, Di Palma F, Grabherr M, Williams C, Kong L, Mauceli E, et al. The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature. 2011;477(7366):587–91.PubMed CentralView ArticlePubMedGoogle Scholar
- Hellsten U, Harland RM, Gilchrist MJ, Hendrix D, Jurka J, Kapitonov V, et al. The genome of the Western clawed frog Xenopus tropicalis. Science. 2010;328(5978):633–6.PubMed CentralView ArticlePubMedGoogle Scholar
- International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004;432(7018):695–716.View ArticleGoogle Scholar
- Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Kunstner A, et al. The genome of a songbird. Nature. 2010;464(7289):757–62.PubMed CentralView ArticlePubMedGoogle Scholar
- Renfree MB, Papenfuss AT, Deakin JE, Lindsay J, Heider T, Belov K, et al. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. Genome Biol. 2011;12(8):R81.PubMed CentralView ArticlePubMedGoogle Scholar
- Neafsey DE, Palumbi SR. Genome size evolution in pufferfish: a comparative analysis of diodontid and tetraodontid pufferfish genomes. Genome Res. 2003;13(5):821–30.PubMed CentralView ArticlePubMedGoogle Scholar
- Sun C, Shepard DB, Chong RA, Lopez Arriaza J, Hall K, Castoe TA, et al. LTR retrotransposons contribute to genomic gigantism in plethodontid salamanders. Genome Biol Evol. 2012;4(2):168–83.PubMed CentralView ArticlePubMedGoogle Scholar
- Vitte C, Panaud O. LTR retrotransposons and flowering plant genome size: emergence of the increase/decrease model. Cytogenet Genome Res. 2005;110(1–4):91–107.View ArticlePubMedGoogle Scholar
- Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 2006;16(10):1252–61.PubMed CentralView ArticlePubMedGoogle Scholar
- Basta HA, Buzak AJ, McClure MA. Identification of novel retroid agents in Danio rerio, Oryzias latipes, Gasterosteus aculeatus and Tetraodon nigroviridis. Evol Bioinform Online. 2007;3:179–95.PubMed CentralPubMedGoogle Scholar
- Castoe TA, Hall KT, Mboulas MLG, Gu WJ, de Koning APJ, Fox SE, et al. Discovery of highly divergent repeat landscapes in snake genomes using high-throughput sequencing. Genome Biol Evol. 2011;3:641–53.PubMed CentralView ArticlePubMedGoogle Scholar
- Abrusan G, Krambeck HJ. Competition may determine the diversity of transposable elements. Theor Popul Biol. 2006;70(3):364–75.View ArticlePubMedGoogle Scholar
- Huang CR, Burns KH, Boeke JD. Active transposition in genomes. Annu Rev Genet. 2012;46:651–75.PubMed CentralView ArticlePubMedGoogle Scholar
- Sijen T, Plasterk RH. Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature. 2003;426(6964):310–4.View ArticlePubMedGoogle Scholar
- Walsh CP, Chaillet JR, Bestor TH. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat Genet. 1998;20(2):116–7.View ArticlePubMedGoogle Scholar
- Yuan YW, Wessler SR. The catalytic domain of all eukaryotic cut-and-paste transposase superfamilies. Proc Natl Acad Sci U S A. 2011;108(19):7884–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Ellinghaus D, Kurtz S, Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 2008;9:18.PubMed CentralView ArticlePubMedGoogle Scholar
- Amemiya CT, Alfoldi J, Lee AP, Fan S, Philippe H, Maccallum I, et al. The African coelacanth genome provides insights into tetrapod evolution. Nature. 2013;496(7445):311–6.PubMed CentralView ArticlePubMedGoogle Scholar
- Star B, Nederbragt AJ, Jentoft S, Grimholt U, Malmstrom M, Gregers TF, et al. The genome sequence of Atlantic cod reveals a unique immune system. Nature. 2011;477(7363):207–10.PubMed CentralView ArticlePubMedGoogle Scholar
- Metcalfe CJ, Filee J, Germon I, Joss J, Casane D. Evolution of the Australian lungfish (Neoceratodus forsteri) genome: a major role for CR1 and L2 LINE elements. Mol Biol Evol. 2012;29(11):3529–39.View ArticlePubMedGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110(1–4):462–7.View ArticlePubMedGoogle Scholar
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Sperber GO, Airola T, Jern P, Blomberg J. Automated recognition of retroviral sequences in genomic data--RetroTector. Nucleic Acids Res. 2007;35(15):4964–76.PubMed CentralView ArticlePubMedGoogle Scholar
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011;28(10):2731–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Kempken F, Windhofer F. The hAT family: a versatile transposon group common to plants, fungi, animals, and man. Chromosoma. 2001;110(1):1–9.View ArticlePubMedGoogle Scholar
- Pritham EJ, Feschotte C, Wessler SR. Unexpected diversity and differential success of DNA transposons in four species of entamoeba protozoans. Mol Biol Evol. 2005;22(9):1751–63.View ArticlePubMedGoogle Scholar