Completion of the high-quality draft genome sequence of a bdelloid rotifer, Adineta vaga, provides us with an opportunity to investigate the entire TE complement in a long-term asexual species, and to obtain a comprehensive picture of genome-wide TE distribution and evolutionary history. This study is focused on PLEs, an enigmatic class of retroelements which include EN-containing retrotransposons from numerous animal genomes, as well as telomere-associated EN-deficient retroelements from rotifers, fungi, protists and plants [10, 16, 20]. While we observe co-existence, within the genome of the same species, between the conventional Penelope retrotransposons with the GIY-YIG EN domain and the EN-deficient PLEs, as was recently reported in the kuruma shrimp , there is no indication of cross-mobilization of EN-deficient Athena elements by the Penelope-encoded EN. For each A. vaga Penelope family, its mobility in the genome apparently relies on the presence of element-specific terminal structures required for retrotransposition, termed pLTRs, which do not exhibit any association with Athena elements. It should be noted that fungal genomes contain only EN-deficient PLEs and no EN-containing ones, again indicating that the maintenance of the former does not depend on the latter.
The present analysis of Penelope retrotransposons in A. vaga, while illustrating their overall similarity to Penelope elements in other species, including the extreme structural variability, also highlights their peculiar features that may contribute to the evolutionary plasticity of the bdelloid genome characterized by high levels of gene conversion, by relatively low but highly diversified TE content, and by the presence of numerous genes of foreign origin and substantial lineage-specific expansions of various multigene families . Expansions involve gene families, including NRPS and other foreign genes, as well as 7-transmembrane receptors and proteins containing repeated motifs, such as LRR, TPR, PPR, Kelch, NHL, FG-GAP and so on. Paradoxically, many gene families are amplified to a much higher copy number than TE families. These multigene families are likely involved in processes that involve diversification of gene function, such as host defense and immunity, production of secondary metabolites, chemosensory perception, extracellular signaling and cell-cell communication.
We find that A. vaga Penelope elements can mobilize host genes surrounded by terminal pLTR structures and, therefore, can contribute to observed lineage-specific expansions of certain gene families, shedding light on some of the mechanisms that multiply host genes to copy numbers higher than most TEs. While other TE classes also have the potential to contribute to amplification of gene families, which could then be followed by their diversification, Penelope elements have a distinct advantage over other retrotransposons in this respect, as their retrotransposition mechanism apparently allows intron retention . Our analysis reveals no strong evidence that any intact A. vaga Penelope ORFs were exapted as domesticated genes, as none of them are present on two collinear allelic chromosome segments. Those Penelope fragments that we do find in collinear pairs are badly damaged, and their function, if any, would not involve Penelope-encoded products. The most likely agents involved in gene amplification are the Penelope families with the capacity to incorporate relatively long stretches of host DNA between pLTRs, such as Pen1_Av and Pen3a_Av. Four out of six apparently intact Penelope ORFs belong to these families. While the propensity of A. vaga for DNA deletion could rapidly erase one or both pLTRs from the genome, making it difficult to detect additional cases of pLTR-mediated gene amplification, the example described here leaves little doubt that such events can indeed contribute to lineage-specific expansion of multigene families.
Even though the overall TE content in A. vaga is quite low by metazoan standards, the particularly low Penelope copy number in comparison to other retrotransposons is striking. While some TEs could remain undetected in a de novo assembly consisting of over 30,000 scaffolds with N50 of 260 kb , there is little reason to believe that most Penelope copies would be preferentially undetectable. Two circularly permuted copies located on isolated contigs with little or no flanking sequences may represent active elements which could not be properly assembled due to the fusion of several identical copies into a single contig. Although studies of PLE distribution along the chromosome length will have to await chromosome-sized scaffolds, the majority of Penelope copies are present on relatively small scaffolds (Additional file 1: Table S1), and inspection of their genomic environment shows that they are largely compartmentalized in TE-rich regions, which may represent non-essential genomic islands consisting of various TEs, genes of foreign origin and members of diverse multigene families.
Like all other TEs in the genome, Penelopes are subject to the generalized host defense responses, such as RNA-mediated silencing. Indeed, we find that most of the A. vaga Penelope copies give rise to small RNAs with preferential antisense polarity. The Penelope element in Drosophila was previously shown to elicit small RNA response after invasion [22, 23]. We also observed that many Penelope copies were disabled by microhomology-mediated deletions, a mechanism of TE inactivation that is applicable to most other TEs and likely operates during DNA repair following frequent cycles of desiccation and rehydration [14, 24]. However, Penelope elements constitute only about 2% of A. vaga TEs, and only 4% of its retroelements. Thus, additional family-specific mechanisms should be invoked to explain their much lower relative abundance in comparison with other TEs. Most likely, their low proliferation capacity may be associated with peculiarities of their replication mechanism in this species.
A previously undescribed phenomenon is the appearance of very long inserts in the coding regions of Pen2-Pen5 elements, which do not necessarily disrupt ORF integrity and are highly enriched in asparagine residues. It appears that copies with such inserts would still be capable of retrotransposition, although their ORFs would be increased in size from the usual 800 to 900 to 1,300 to 1,500 aa, and the domain structure perturbed. For Pen2a, the inserted segment could serve as a long linker between the RT and EN domains, while in Pen3 to 4 such a linker would connect the core RT with its thumb domain. Analogous inserts have not been previously observed in other TEs, and it is reasonable to suggest that they arise as a consequence of the complicated molecular gymnastics that PLEs perform during their replication. In particular, the existence of an autonomous highly transcribed N-rich segment in the vicinity of Pen3a pLTR indicates that it could have been captured in trans and internalized. We also noticed that in the candidate precursor elements, such as Pen2 and Pen3a, regions roughly corresponding to the linker insertion sites in Pen2a and Pen3 to 4 contain several short stretches of asparagine residues. In addition, a secondary insertion of Pen2a into Pen3 on scaffold 671 also occurred into the N-rich segment, indicating that this sequence may serve as an attractive target for Penelope insertions. Since Penelope elements were previously reported to favor simple AT-rich sequences as preferred targets , we propose that such inserts may arise as a result of spurious self-priming by read-through transcripts containing the adjacent flanks enriched in simple trinucleotide repeats, followed by template jumps. In such cases, the short internal stretches of the N-rich coding sequence (AAT)n or (AAC)n could help in keeping the reading frame of the inserted segment properly aligned, and a chimeric ORF could persist in the genome if it codes for an uninterrupted polypeptide.
The propensity of Penelope elements for self-priming may be inferred from the abundance of inverted-repeat structures containing palindromes at the inverted junction, as shown in Figure 2. Consistent occurrence of such palindromes is best interpreted in terms of self-priming, which, however, would have to occur on an antisense template (if a sense template is used, self-priming at the 3′ end would result in a tail-to-tail inverted repeat arrangement, as opposed to the most frequently observed head-to-head). Moreover, utilization of an antisense template is highly compatible with intron retention, since introns would not be recognized in an antisense orientation by the splicing machinery. The presence of oppositely oriented promoter motifs in pLTRs also argues in favor of bidirectional PLE transcription. However, we cannot currently exclude the possibility of utilization of an unspliced sense transcript as a template, and further experiments will be required to discriminate between these possibilities. Direct demonstration of antisense promoter activity in Penelope elements and full elucidation of its replication cycle constitutes a promising subject for future studies.