In a previous work, we uncovered the presence of Galileo elements in 6 of the 12 sequenced Drosophila genomes . Among them, the D. mojavensis genome showed the highest variability in Galileo sequence and structure. A small sample of 16 nearly-complete copies that contained transposase-coding sequences and 20 non-autonomous copies was analyzed. Analysis of the TIR sequence variation showed that the copies clustered in four different groups or subfamilies (that were named C, D, E and F). Two of these subfamilies, C and D, harbored truncated transposase coding regions, while the other two groups were only composed of non-autonomous copies (mainly 2 TIR structure). The existence of different groups in the same genome suggested several amplification bursts in the past. Furthermore, a high variability in TIR length was detected. Since the TIR length is the most characteristic feature of Galileo elements, the D. mojavensis genome offered the opportunity to study this trait in detail.
Here, we carried out a thorough analysis of Galileo variation and distribution in the D. mojavensis genome sequence. In the present work we have uncovered the existence of at least five subfamilies of Galileo elements. Four of them contain nearly complete copies with transposase-coding segments, which implies the putative co-existence of four fully functional subgroups. The co-existence of different subgroups or subfamilies has previously been reported for D. melanogaster P-element and other transposons [27–30]. There are two main hypotheses that would explain the co-existence of different subfamilies in the same genome: horizontal transfer (HT) and genomic vertical diversification. Under the first hypothesis, in the case of HT events, the Galileo element could have arrived to D. mojavensis via some close spatiotemporal species, such as mites or other intimate parasites [31–34]. If the five subfamilies (C, D, E, F and X) had arrived through this mechanism, this would imply at least five independent events of successful HT and invasion of the D. mojavensis genome. If our estimation of the age of each subfamily is taken into account, these horizontal transfer events would have happened in an approximately 5 myr period, which would mean an average of one horizontal transfer event per myr. When the variability of the age nodes is taken into account, this time range reaches approximately 9.5 myr (from 0.125 to 0.02 changes/time, 11.36 and 1.81 myr, respectively), which would mean approximately 0.53 horizontal transfers per myr. This would imply something like a ‘Galileo bombing’ against D. mojavensis genome in the past. This HT rate is higher than the 0.04 HT/myr/family obtained by Bartolomé et al. ; even if we divide our estimation among the number of Galileo subfamilies, we still get a higher rate of 0.1 HT/myr/subfamily. This massive HT seems very unlikely.
On the other hand, the different Galileo subfamilies could have diverged vertically from an ancestral resident in the genome. This putative ancestor sequence would have existed approximately 18 myr ago (0.20 units/relative time, considering 0.011 changes/position/myr ), as seen in our Bayesian ultrametric tree (BEAST) (Figure 2B). Such functional differentiation could have been driven by specific selective pressures to form several subfamilies producing distinct Galileo transposases to overcome the cell transposition repression. When a new transposase appears along with high-affinity sequences, a transposition burst would happen. After that, truncated copies of the successfully transposed ones would appear, rendering deletion derivatives, 2T, 2RT and solo-TIR copies. In each subfamily, all these structural types would appear independently and could spread while they conserved the affinity for the enzymes encoded elsewhere in the genome by an autonomous copy [17, 18, 37]. This is the landscape Galileo presents in the D. mojavensis genome.
Another factor that could influence the Galileo diversification would be the genetic drift, which is very sensitive to the host population structure. D. mojavensis is a species with very divergent populations that are considered as geographical races or even subspecies. It could be possible that a different Galileo subfamily evolved in each isolated population and secondary contacts between these populations mixed the different groups. However, our time estimation of each subfamily is not in agreement with the putative ages of the different D. mojavensis races, which would probably be less than one myr [38, 39]. Thus, population structure seems not to explain the existence of Galileo subfamilies in D. mojavensis.
Nevertheless, the two explanations, horizontal transfer and genomic vertical diversification, are not mutually exclusive. Thus, a combination of the two phenomena could have happened. However, vertical diversification of Galileo subfamilies seems at this time more parsimonious. Our estimations indicate that the D. mojavensis Galileo subfamilies had a common ancestor approximately 18 myr ago. This is showing us that Galileo has an old history in D. mojavensis, which is in agreement with the Galileo ancient origin in the genus . Likewise, recent data have uncovered the existence of Galileo elements in many other members of the Drosophila repleta species group, besides D. buzzatii and D. mojavensis (Andrea Acurio, Deodoro Oliveira and Alfredo Ruiz, in preparation). However, although the Galileo last common ancestor in the genus could be as old as the origin of the Drosophila genus, the subfamilies found in D. mojavensis diversified quite recently (4 to 9 myr ago). Consequently, only closely related species to D. mojavensis are expected to harbor these very same subfamilies, and different subfamilies probably exist in more distantly related species.
The genomic dynamics of transposons helps us to understand the variety of Galileo copies in the D. mojavensis genome. The natural cycle of a DNA transposon would begin with the invasion of a new genome by a fully functional transposon, through horizontal transfer [32, 34, 37] or perhaps by remodeling/reactivation of an inactive one. After that, since class II transposition depends entirely on the cell replication and repairing machinery of the double-strand breaks (DSB), the truncated copies start to appear due to errors in the repair process. Likewise, the truncated copies that would maintain the sequences recognized by the transposase, would be able to spread better than the complete copies, probably due to overcoming the putative length penalty some transposons suffer . Moreover, even shorter copies would appear, the so-called MITEs and, eventually, the transposon would become inactivated and disappear [6, 32].
Galileo element structures clearly show this dynamic. The nearly-complete copies are 5.2 kb average length and a gradient of shorter copies with different deletions appeared. This way, there is a group of copies where no transposase sequence is found and they are composed almost entirely of TIR. Maybe these copies could be considered as Galileo MITEs but there are some drawbacks for this definition. First of all, the main trait of a MITE is its length, usually less than 600 bp [4, 6, 41]. Galileo 2-TIR elements are 1.7 to 2.2 kb average length, mainly due to the TIR length per se. Secondly, although the 2TIR copies outnumber the nearly-complete ones, the number of copies is not as high as the thousands of copies reached by MITEs in some genomes . Finally, since in Galileo the changes from the most complete copies to the 2TIR elements are traceable virtually in all copies, we propose a 2TIR-element tag for this deletion-derivative kind of Galileo copies.
Regarding the Galileo TIR dynamics, we have observed length expansion and contraction. On the one hand, for the contraction, the genomic deletion rate in TEs has been studied and would explain how this would happen . On the other hand, the expansion of the TIR would be a bit more complex than deletion. The expansion of the TIR in the F groups is mainly due to the expansion and contraction of the direct tandem repeats which are located inside the TIR. A different number of tandem repeats are found when the two TIRs of a Galileo F copy are compared, rendering independent TIR dynamism. This would be in agreement with the statement that any region generated by duplication can thereafter be duplicated [42, 43]. Furthermore, the tandem repeats in the TIR or in subterminal regions of transposons have been proposed to harbor secondary binding sites for the transposase [30, 44–46]. In our case, Galileo elements also present these tandem repeats (subfamilies G and F [23, 24]) and they contain secondary binding sites at least in Dbuz\GalileoG (Marzo M, Liu D, Ruiz A and Chalmers R, submitted). The multiple binding sites seem to be a convergent trait that appears in different transposable element superfamilies and could be positively selected for an improved transposition reaction, thanks to a higher transposition machinery affinity.
Besides the tandem repeat expansion, we have detected another source of TIR extension: the recruitment of internal sequences to extend the TIR. This could be due to the structure of the Galileo sequences, where two close inverted repeats of at least 600 bp long might attract recombination, whether due to the DSB after transposon excision, the structural instability or ectopic recombination as a result of being a genomic dispersed repetition. We could suggest that Galileo would behave similarly to the segmental duplications in addition to its transpositional nature. Segmental duplications are repetitive regions of the genome that are able to recombine, exchange and convert sequences . For example, if a Galileo copy suffers a DSB in the TIR2 (due to a problem during the replication step, for example) it could be repaired through NAHR. If for repairing this TIR2, it uses as template the TIR1 of a copy of the same subfamily (the two TIR present 98% to 100% nucleotide identity between the TIRs of the same Galileo copy), the copied tract could be longer than the TIR1 and include other internal regions of the element. In that case, since the TIR1 is being copied where the TIR2 is located, the region that was downstream of the TIR1 would appear upstream of the TIR2 as well, becoming a repetitive sequence in inverted orientation and extending the TIR span. The result is TIR1-F1-F1-TIR2. The expansion of inverted repeat sequences has been reported for segmental duplications and Polintons inverted repeats (TE); thus, the dynamics of inverted repeats seems a general genomic dynamic trait [12, 43, 48].
Then, we can imagine that ectopic recombination and genomic conversion would be acting among all Galileo copies and different products could appear, among them the chimeric elements. In these cases, if one of the exchange breakpoints (of the conversion tract) is located inside the element, it would generate a chimeric element with two well-defined segments from two different subfamilies. These chimeric copies resemble the Galileo copies found in the breakpoints of polymorphic inversions in D. buzzatii which is in agreement with the Galileo inversion generations due to ectopic recombination [19–21]. Furthermore, if the two exchange breakpoints are located inside the element, this would produce, for example, the X-E-X copies and, probably, this could be the origin of the whole E subfamily as well.
We would like to propose that long TIRs, although they imply a handicap for the transposition reaction , could be useful for the survival of the transposon: the more the recombination rate among these sequences is due to the length of the TIRs, the more chance there is for a new Galileo subfamily to appear. There would be more raw material for the transposase to choose from and a new transposition burst would be triggered. The TIR length dynamics, along with the chimeric origin observed among Galileo copies is in agreement with an important dynamic DNA exchange of sequences and recombination [43, 47, 48]. Thus, this would explain why different non-related class II transposons present subfamilies with long TIRs and why TIR length is not a reliable feature for transposon classification [30, 44, 46, 49].
Generally, the mutations or inactivation of the transposase sequence drives the death of a transposon, because without the transposition reaction there is no duplication of the sequences. The fact that we have not found any Galileo functional transposase, points out that Galileo may be an inactive element. However, our Galileo sequences LTT plot, where the accumulation of nodes in the tree is depicted, did not show any decrease or stationary rate of Galileo sequences duplication. Thus, if Galileo is not still active, it has stopped working quite recently. In this regard, it is worth mentioning that in genome sequencing projects, there are heterochromatic regions that have not been sequenced. Furthermore, there is much variability among the individuals of a species that is not represented by only one genome sequence. We cannot discard the existence of Galileo active sequences in other individuals or other genomic regions of D. mojavensis.