Diversity of transposable elements and repeats in a 600 kb region of the fly Calliphora vicina

Background Transposable elements (TEs) are a very dynamic component of eukaryotic genomes with important implications (e.g., in evolution) and applications (e.g., as transgenic tools). They also represent a major challenge for the assembly and annotation of genomic sequences. However, they are still largely unknown in non-model species. Results Here, we have annotated the repeats and transposable elements present in a 600 kb genomic region of the blowfly Calliphora vicina (Diptera: Calliphoridae) which contains most of the achaete-scute gene complex of this species. This is the largest genomic region to be sequenced and analyzed in higher flies outside the Drosophila genus. We find that the repeat content spans at least 24% of the sequence. It includes 318 insertions classified as 3 LTR retrotransposons, 21 LINEs, 14 cut-and-paste DNA transposons, 4 helitrons and 33 unclassified repeats. Conclusions This is the most detailed description of TEs and repeats in the Calliphoridae to date. This contribution not only adds to our knowledge about TE evolution but will also help in the annotation of repeats on Dipteran whole genome sequences.


Background
Transposable elements (TEs) are a common feature in eukaryotic genomes and constitute a major player in many of the processes that shape the genome and control gene expression [1,2]. TEs can occupy a significant but highly variable portion of the genome. For example, at least 46% of the initial sequence of the human genome was recognized as TEs, and this percentage is probably higher than 50% when other repeats are considered [3]. Amongst species of Diptera sequenced to date the repeat content of euchromatic regions varies from only 6% in Drosophila melanogaster [4] to 16% in Anopheles gambiae [5], 28% in Culex quinquefasciatus [6] and 47% in Aedes aegypti [7]. TEs and other repeats pose a big challenge for the assembly and annotation of genomic sequences. Although many programs have been developed for the detection of TEs, most are difficult to use and their performance has not been properly tested [8]. They mostly rely on similarity to annotated elements or on the detection of known structures. The availability of well-annotated elements is thus of great help for their automatic detection and annotation.
Detailed description of TEs is not only important for genome annotation but also essential for understanding genome structure, function and evolution. The presence of TEs can affect gene structure and gene expression in several ways: from local effects on the expression of adjacent genes, to global effects such as the generation of large chromosome rearrangements or transpositions [2,9]. TEs are also important contributors to evolutionary adaptation [10]. Furthermore they contain historical information about the genome, and can be used as a sort of paleontological record. They provide a tool with which to solve evolutionary relationships and classification of species [11][12][13][14]. Moreover, TEs have a direct application for transgenesis where they can be used as insertion vectors. Knowledge of the TE repertoire of a target species has important implications for vector choice, as it will influence the stability of the transgenes. These methods are not only valuable research tools but are also being developed for the control of pest species in the wild [15].
TEs are divided into two main classes according to their structure and mechanism of transposition [16]. Class I elements, also called retrotransposons, transpose by reverse transcription of an RNA intermediate (DNA-RNA-DNA) mediated by a retrotranscriptase, whereas Class II elements transpose directly from DNA to DNA. Within each of these classes, TEs are further subdivided mainly on the basis of the structural features of their sequences [17,18]. Class I elements are divided into two main types: with or without Long Terminal Repeats (LTR elements and non-LTR elements), such as LINEs and SINEs. Class II elements include cut-and-paste DNA transposons, rolling-circle DNA transposons (Helitrons) and self-synthesizing DNA transposons (Polintons). Cut-and-paste DNA transposons are characterized by the presence of Terminal Inverted Repeats (TIRs) flanking a transposase that catalyses the transposition reaction. Helitrons have been classified as Class II-DNA transposons that use a "rolling circle" (RC) mode of transposition [19].
The Calliphoridae is a monophyletic family of calyptrate Muscomorpha (Diptera). These flies are of economic importance as a cause of myiasis in humans and animals, and as vectors of pathogens causing dysentery and other diseases. The larvae of most species are scavengers of carrion and dung, and fulfil an important ecological function in the decomposition of animal remains. They are among the first colonizers of cadavers, making them particularly useful for forensic entomology, predominantly to establish a minimum time since death, or minimum post-mortem interval [20]. This method usually relies on morphological identification of samples collected on corpses. Distinguishing between closely related taxa, such as Calliphora vicina and Calliphora vomitoria, can be a difficult process with major implications for post-mortem interval estimation. Mitochondrial sequences, like COI and COII, have been used for species identification but in some cases an overlap between intra-and inter-specific variability renders this method unreliable [20]. Measures to develop a TE-based simple and efficient marker system for the identification of forensically important carrion flies are currently being developed [21]. However, the retrotransposon landscape of carrion fly genomes remains largely unknown.
Here we provide an inventory and classification of the TEs and other repeats found in 6 BAC clones covering most of the Achaete-Scute Complex of C. vicina. These sequences include the genes achaete (ac), scute (sc) and lethal of scute (l'sc) which are highly regulated and surrounded by large regulatory regions. It is a 600 kb euchromatic region of the 750 Mb C. vicina genome. We have identified 318 insertions classified as 75 different repeats; 42 of which are TEs and 33 are unclassified repeats. Elements which are complete or present at high copy number are described in some detail. We also discuss probable cases of horizontal transfer.

Results
We have analysed a 613,063 bp genomic region within which we have identified a total of 318 TE insertions and repeats (Table 1, Table 2, Figure 1, Additional file 1, Additional file 2). The repeats have been classified and are described below.
Class I -RNA-mediated TEs LTR retroposons LTR elements are characterized by the presence of direct long terminal repeats (LTRs) that range from a few hundred base pairs to more than five kilobases long [17]. Between the LTRs there are generally only one or two open reading frames (ORFs) that encode a polymerase (pol) protein and a protein related to the retroviral group-associated antigen (gag) protein. The pol protein contains reverse transcriptase (RT), ribonuclease H (RNaseH), protease (PR) and integrase (IN) domains that are important for the process of retrotransposition. The gag protein binds nucleic acids or forms a nucleocapside shell. Some LTR retrotransposons also have an env (envelope)-like domain that encodes a transmembrane receptor-binding protein that allows the transmission of retroviruses.
We have identified three LTR retrotransposon elements, each with one insertion. These elements are recent insertions; all three are full length, have identical or almost identical LTRs and at least two of the three insertions are polymorphic (see below).
Isis-like This is the largest identified repeat with 10,995 bp ( Figure 2, Additional file 3: Figure S1). It is closely related to the Isis TE recently described in Drosophila buzzatii [22]. It belongs to the Osvaldo lineage of the Gypsy family. The LTRs of Isis-like are 2577 and 2574 bp long and there are 4 bp Target Site Duplications (TSD: CGTG) and two ORFs. The first ORF encodes a 531-amino acid (aa) gag protein with a 40% identity (and 70% similarity) with Isis. It contains a RING finger domain which is absent in Isis but present in Osvaldo (also from the same family). The second ORF encodes a 1,137-aa pol protein, which has 60% identity (and 85% similarity) with the Isis pol protein. However, Isis-like lacks the env domain and the LTR of both elements are very different (742 vs. 2574 bp long). This is a recent insertion, less than 25,000 years old, and is polymorphic as it is present in only one of the two sequenced alleles covering this region.  Figure S2). It belongs to the CsRn1 lineage of the Gypsy family [23]. This lineage is characterized by the presence of a PBS complementary to tRNA-Trp, a CHCC gag motif and the GPY motif in the 3 0 of the Integrase protein, all of which are present in this element. However, it seems to present a 6 bp TSD (CAAGTG) instead of the 4 bp TSD typical of the group. We have estimated this insertion to be 350,000 years old, which makes it the oldest of the three LTR elements.

Pao_Cv1
The last LTR element identified belongs to the Pao family, and is related to the Ninja-I element.
Pao_Cv1 is 6420 bp long, has 355 bp long LTRs, and one ORF coding for a 1881 aa protein ( Figure 2, Additional file 5: Figure S3). It has 5 bp TSDs (GCGGG). It is inserted inside a mariner element. This insertion is polymorphic and furthermore the two LTRs are completely identical which indicates that it is very young (less than 88,000 years old).

Non-LTR retroposons (LINEs)
A total of 29 insertions have been classified as 21 different LINE elements, most of which are short and degraded fragments. The insertions average 745 bp in size and ten of them are smaller than 500 bp, whereas size typically ranges from 1 to 7 kb for this group [24]. The absence of canonical sequences for comparison makes it difficult to classify them properly. This is particularly acute for the LAO elements, from which we have found many very short fragments (for eight out of ten putative elements the longest fragment is smaller than 1 kb, the smallest being 83 bp only) ( Table 1). We cannot exclude the possibility that some of the insertions we have defined as separate elements are in reality different regions of the same element. The size and degraded nature of these elements suggests they are all old insertions. Overall the identified LINEs span 18 kb of the sequenced region (2.9%).

Class II -DNA transposons Cut-and-paste DNA transposons
Cut-and-paste DNA transposons are characterized by 10 to 200 bp terminal inverted repeats (TIRs) flanking one or more ORFs encoding a transposase. We have identified 14 different cut-and-paste DNA elements with a total of 89 insertions spanning 7.86% of the sequenced region.
One element belongs to the MITE family, two to the Chapaenov family, one to the hAT family, and the remaining 10 to the IS630-Tc1-mariner (ITm) superfamily. The most common elements belong to the Mariner family of the ITm superfamily.

Cv-mar1
The most frequent transposon is Cv-mar1 with 41 different insertions that span overall more than 30 kb. All insertions are partially degraded and range from 320 to 1296 bp, the consensus sequence is 1,275 bp long ( Figure 3, Additional file 6: Figure S4). This element shows 78% identity at the nucleotide level with the Desmar1 mariner element from the Hessian fly Mayetiola destructor [25][26][27] (Additional file 7: Figure S5). Its TIRs have been identified by similarity to those of Desmar1 [25], with which they show 3 nucleotide (nt) substitutions and 1 nt insertion. However, the 5 0 TIR of Cv-mar1 is incomplete and the 3 0 TIR is present in only a single copy of the element (the fragment of the consensus sequence derived from a single element is delimited by a blue dash in Additional file 6: Figure S4). Although none of the annotated elements displays a complete transposase, we were able to derive a "complete" copy from the consensus sequence.
In position 993 (shown in red) the consensus sequence has a T that results in a stop codon in the transposase, however a third of the sequences have an A at this position, which would result in an arginine (R) residue. The next stop codon is in the same position as that of the Desmar1 element (Additional file 8: Figure S6). If we consider this longer transposase it is 345 aa long.

Cv-mar2
In the region analysed there are 14 copies of Cv-mar2 which span a total of 6 kb. The average insertion is 440 bp long, with the longest being 989 bp. Although none of the insertions is full length we were able to derive a consensus full length sequence which is 1299 bp long ( Figure 3, Additional file 9: Figure S7), individual copies are 77% to 91% identical to the consensus. It has 35 bp TIRs and a 344 aa transposase. However, this consensus element would be non-functional as the TIRs have five mismatches and the transposase has four stop codons and commences with a leucine instead of a methionine. This element is very similar to the Mariner1_DYa from Drosophila yakuba [28]. The consensus obtained has a 78% identity at the nucleotide level with Mariner1_DYa and the two transposases show 73% identity at the amino acid level (Additional file 10: Figure S8 and Additional file 11: Figure S9).

DD37E_Cv1
The DD37E_Cv1 element belongs to the ITm-DD37E family [26]. This family was first discovered in mosquitos and is characterized by a unique DD37E catalytic domain. The full-length copy of this element is 1298 bp long with a 354 aa ORF and 27 bp ITRs ( Figure 3, Additional file 12: Figure S10). At both ends of the insertion we find the TA sequence, the canonical dinucleotide target site duplication of the family [29]. Three additional copies are fragmented, highly degraded and in two cases enclose other nested repeats. This element has been present in the C. vicina genome for a long time (presence of degraded insertions). The identification of a full-length copy suggests this element has also been active recently in Calliphora.

Rolling circle (RC) transposons -Helitrons
Helitrons have been classified as class II-DNA transposons that use a "rolling circle" mode of transposition [19]. They encode proteins similar to helicases, ssDNA-binding proteins and replication initiation proteins [4,19]. Helitrons lack inverted repeats but are characterized by muchconserved termini and hairpin structures close to the 3 0 end. As with other TEs, the Helitrons present both autonomous and non-autonomous elements. DINE-1 and mini-me elements from Drosophila, which show some unique characteristics, are now classified as non-autonomous Helitrons [30,31]. They lack coding capacity, do not have these characteristic termini, but  Figure  S1, Additional file 4: Figure S2, and Additional file 5: Figure S3, respectively.   Figure S4, Additional file 9: Figure S7, Additional file 12: Figure S10, Additional file 13: Figure S11, and Additional file 14: Figure S12.
have subterminal inverted repeats and the hairpin structures at the 3 0 region [30]. Four different elements of the Helitron family are present in our sample. Two of them show a high copy number, with 40 and 41 insertions, respectively. Helitrons cover 5.01% of the analysed sequence.
Helitron2_Cv Was identified by similarity to the 5 0 region of the Arylphorin subunit from C. vicina (X63340). RepeatMasker indicated it is related to Helitron-1N1_Dvir and mini-me elements [32]. We have annotated 41 copies of this element, from 136 to 767 bp long. The consensus sequence is 750 bp long (Figure 3, Additional file 13: Figure S11). Eight copies are full length and show a 95% to 97% identity with the consensus. Helitron2_Cv shows the structural features of non-autonomous DINE1-like Helitrons: 11 bp subTIRs, partial inverted repeats next to the 5 0 subTIRs, GTCY-rich protosatellites and short hairpin stem-loops (with 9 bp stems) next to the 3 0 end of the element. It is closely related to the autonomous and non-autonomous elements Helitron-1-Dvir and Helitron-1N1_Dvir of D. virilis [32]. Helitron2_Cv shows a 65% and 70% identity in the 5 0 region (up to protosatellite repeat) and 3 0 end (last 100 bp), respectively, with the D. virilis elements. Copies of this element represent 3% of the sequenced region. Given the level of divergence of the full length insertion, autonomous copies of this element probably exist in the C. vicina genome.
Helitron3_Cv This is also a DINE1-like Helitron. We have identified 40 copies that range from 71 to 821 bp. They can be divided into two subtypes, whose consensus sequences are 395 and 396 bp long. The consensus of the two subtypes differs in one nucleotide indel and 54 nucleotide substitutions, half of which are located in the region just after the protosatellite repeat. All features typical of DINE1-like Helitrons are present except the 3 0 subTIR (Figure 3, Additional file 14: Figure S12). The protosatellite repeat (GTCT) 2 is expanded in 3 of the insertions: one has 4 repeats, another 5 repeats and the third 108 repeats.

Unclassified repeats
These repeats have been mainly identified by similarity within and between BAC sequences and with other published Calliphora sequences (blastnnon-redundant nucleotide NCBI database). They are mostly short and with no obvious structure or similarity with known elements. Overall these repeats span 5.24% of the analysed region.

Unknown 5
This repeat was first identified by blastn to the nonredundant NCBI database, as it is present in intergenic or intronic regions of two different alleles of the Xdh gene of C. vicina (M30316, M30488). We have annotated 20 insertions of this element in the region we analysed. The consensus sequence is 275 bp long (Additional file 15: Figure S13). The 5 0 region of the element is rich in polyA and polyT tracts, whereas the 3 0 region of the element is highly conserved between copies (red region in Additional file 15: Figure S13). However, no structural features or internal repeats could be recognized.

Unknown 6
A short fragment of this element was first identified by RepeatMasker as a fragment of a Helitron. However, in this sequence, which is present 12 times in the C. vicina sequences, we could not identify any of the features of a Helitron and thus it remains unclassified. The consensus sequence of this element is 488 bp long (Additional file 16: Figure S14). From nucleotide 1 to 465 the sequence is palindromic (with 92% identity).

Unknown 20
This element was first identified by blastn with similarity to a Lucilia cuprina intronic sequence (M89990). There are 10 insertions of this sequence present in the region of C. vicina that was analysed. The consensus sequence is 140 bp long (Additional file 17: Figure S15). No structural features or internal repeats were identified which could help classify this repeat.

Candidates of horizontal transfer
Four of the analysed repeats show a remarkable similarity with elements from other species. To assess the possibility of horizontal transfer we have taken a closer look at these elements and checked their distribution on available sequences (NCBI and Insect genome sequencessee Methods). These elements are the LTR element Isis, the DNA cut-and-paste elements Cv-mar1 and Cv-mar2, and the Helitron Helitron2_Cv.
The elements Isis from D. buzzatii and Isis-like from C. vicina have 40% and 60% identity in their ORFs, however they differ in the presence of the RING (present only in Isis-like) and env (present only in Isis) domains. The sequence (and length) of their LTRs is also very different. Of the sequenced genomes, only D. mojavensis presents an Isis element. We have found no evidence of Isis-like. The limited distribution of these elements suggests that they arrived by horizontal transfer to the D. buzzatii-D. mojavensis ancestor (after the split of D. virilis) and to C. vicina (or its ancestors).
The Mariner element Cv-mar2 is present in D. yakuba (Mariner1_Ya) with which it shows 78% identity over its whole length. We have also found several hits with 80% identity in the ants Camponeatus floridanus and Harpegnathos saltator (Hymenoptera), covering 80% and 60% of the length of the element, respectively. We found no evidence of this element in other species. Its high similarity and limited distribution suggest its transmission by horizontal transfer between Diptera and Hymenoptera which diverged approx. 300 Myr ago.
The Helitron2_Cv is similar to Helitron-1N1_Dvir from D. virilis. They have 50% identity over the whole element, and 65% to 70% identity at the 5 0 and 3 0 end, respectively. Multiple hits with 60% to 90% identity around sequenced genes of Lucilia, Musca and other species show that this element is very common within the Muscomorpha. No hits were found in the whole genome sequences with Helitron2_Cv. Using Helitron-1N1_Dvir as query, we find multiple hits in Drosophila species but nothing outside the Drosophila genus. This suggests that this element is vertically transmitted, the absence of hits in other insect is probably due to evolution of the sequence of this element.

Discussion
We have analysed a small (600 kb) region of the Calliphora genome. It contains most of the Achaete-Scute complex: with the genes ac, sc and l'sc. The low gene density in this region is due to the presence of large regulatory regions (Negre and Simpson, submitted). It is euchromatic in nature although we do not know its position in the chromosome or whether it is representative of the genome in terms of TE content and diversity but there are no reasons that would indicate otherwise. The discussion that follows is only a first approximation to the repeat landscape of this fly species, C. vicina, which has a big genome with 750 Mb (Spencer Johnston personal communication).

Fraction of genomic DNA occupied by repeats
Repeats span 24% of the region analysed (600 kb). This percentage is relatively high but not unusual for fly genomes. Larger genomes usually show a higher proportion of repeats; however, repeat content is not proportional to genome size and is highly variable between dipteran genomes (Table 3). For example, there are several species whose genome is around 200 Mb with a repeat content ranging from 3% to 25%.
Repeat content is also variable within genomes, being most abundant in heterochromatin and pericentromeric regions. Unfortunately, we have no information about the position within the chromosome of the region we analysed. In D. melanogaster it is close to the tip of the X chromosome, however chromosomes are very dynamic in terms of gene order, so we do not expect the position to be necessarily conserved.

Abundance of the different classes of repeats
If we look at the distribution of repeats in Dipterans, the abundance of the different classes appears to be constant within lineages independently of total repeat content, but very divergent between lineages (Table 3). In D. melanogaster LTRs are the most abundant TEs, followed by non-LTR and then TIR elements [36] (there is no information about Helitrons). The same pattern is observed in the other 11 Drosophila species that have been sequenced [37]. The pattern changes in mosquitos where TIR elements are the most abundant, followed by non-LTR, LTRs and finally Helitrons with less than 1% ( Table 3). As in Drosophilidae, all mosquitos show the same pattern, although in Anopheles and Aedes the quantity of TIR, non-LTR and LTR elements is very similar, whereas in Culex TIR elements represent more than half of the repeat content. In Calliphora we see again a completely different pattern. As in mosquitoes TIR elements are the most frequent but they are now followed by Helitrons. LTR and non-LTR elements (in this order) are the least frequent in C. vicina (Table 3). It is noteworthy that if we consider the unclassified repeats in Calliphora this would be the second most frequent class of repeats.

Age of TE insertions Nested elements
Of the 322 identified repeats 11 (3.4%) are nested within other elements. Two of the three LTR elements are nested within other repeats, whereas none of the LTR elements themselves show insertions of other elements. This is consistent with the fact that they are recent insertions. At the other extreme, the unclassified (unknown) elements, in spite of being the most numerous (37%), show the smallest proportion of nested elements: only one copy is nested and two include insertions of other elements. The fact that one copy of unknown 20 is nested within another TE suggests that this element is mobile although no structural features have been identified (see results). On the other hand, the fact that only one of the 119 unknown repeats is nested suggests that some of them might not be mobile. For the other types of elements (LINE, DNA and RC) the frequency of nested copies is proportional to the number of insertions. However, LINEs show a high number of copies serving as landing sites. This, together with the small size and degraded nature of most copies, indicates that most LINE insertions are very old. Of the RC elements, all three nested insertions belong to Helitron2, two of which are full length. Two of the three are nested inside fragmented copies of the DNA element DDE37E_Cv1.

New vs. old insertions
All LTR insertions found in this sample are recent in origin. All three insertions are full length and at least two of them are polymorphic. We have found no fragments or degraded copies. This is a very different picture to that found in all other TE classes where none (non-LTR elements) or only a few (DNA and RC elements) insertions are full-length. In all these classes most insertions are fragmented and highly degraded. A similar trend was found in D. melanogaster. LTR families appear to be transposing in the D. melanogaster genome at higher rates than TEs from other orders leading to the observation that LTR elements, as a group, tend to be younger [38]. Recent analyses suggest that this trend is due to a higher intrinsic rate of transposition of LTR elements and not to a recent increase of transposition [39].

Role of horizontal transfer
The mobile nature of TEs makes them prone to horizontal transfer. It is thought to be an essential step in TE life cycle, which allows them to escape vertical extinction [40,41].
Four TEs showed a remarkable similarity with elements from other species. Although we could not compare the rates of synonymous mutations between the TEs and orthologous genes, we have checked the distribution of these elements in sequenced insect species to detect possible instances of horizontal transfer.
The broad distribution of the Mariner element Cv-mar1 and the Helitron Helitron2_Cv shows they are vertically transmitted. We cannot rule out completely horizontal transfer in Cv-mar1, but its detection would require a much thorough analysis (which is out of the scope of this study).
The elements Isis and Cv-mar2 do seem to have undergone horizontal transfer. Isis moved between Calliphoridae and Drosophilidae which diverged approximately 100 Myr ago, and Cv-mar2 between Diptera and Hymenoptera which diverged approximately 300 Myr ago.
Overall two of the 43 identified TEs show evidence of horizontal transfer. One is an LTR and the other a DNA transposon, the two classes more often involved in transfer events [40].

Conclusions
This is the first detailed description of TEs in carrion flies. Although the analysis includes only a small region of the genome it gives an overview of the classes of TEs present and their abundance. Moreover, the description of these TEs and repeats can help in the annotation of repeat sequences in other Dipteran genomes, e.g., those currently being sequenced.

Sequences analysed
We have analysed the sequences of six overlapping BAC clones, in a region which contains most of the Achaete-Scute Complex (AS-C) of Calliphora vicina (cloning and sequencing of this region is described in Negre and Simpson, submitted). The clones comprise a total of 651,394 base pairs (bp), of which 38,331 bp correspond to identical alleles in two overlapping clones (see Table 2). Thus we have analysed 613,063 bp of unique sequence.

Identification of repetitive elements
Several tools were used for the identification and classification of repeats: RepeatMasker was run against the Drosophila database and all hits were considered, for protein-based RepeatMasker (A.F.A. Smit, R. Hubley and P. Green, RepeatMasker at http://repeatmasker.org) all hits were also considered; blastn and blastp were run against NCBI non-redundant databases [42] and hits longer than 100 bp with identities over 60% were further analysed. LTR-Finder [43] was used to identify LTR elements and some of their structural features such as PBS and PPT sequences. The online program Palindromes (http://mobyle.pasteur.fr) was used to aid in the identification of TIRs. All hits were compared between methods and manually inspected. Most repeats are identified by more than one method. Non-overlapping hits smaller than 50 bp  2 Holt, et al. [5]; 3 Nene, et al. [7]; 4 Arensburger, et al. [6].
were discarded. The best match was used for repeat classification. Annotated repeats were added to a local database to help in the identification of further copies of the same repeats. Comparison between Calliphora sequences (with blast2sequences-blastn) allowed the identification of many short unclassified repeats which are found recurrently in the Calliphora genome. Some of the elements we have annotated are also present in GeneBank sequences (in intronic and intergenic regions), but these were all unannotated. Consensus sequences were obtained by ClustalW [44,45] or Tcoffee [46,47] alignment and manually corrected with the aid of Bioedit.

Divergence time of TE insertions
The age of TE insertions (t) has been calculated as in [4]; t = K/v, where K is the average divergence of TE copies from the consensus and v the neutral substitution rate. We have used the neutral substitution rate for Drosophila (v=0.016 substitutions/Myr) [48]. For LTR elements we have used t = K/2v, where K stands for the divergence between the two LTRs of one insertion [4].