Pokey diversity in the Daphnia genome
Analysis of over 160 Pokey and Pokey-like sequences from the D. pulex genome revealed four well-supported clusters. Two clusters of larger elements with an average size of 5,100 bp were designated Pokey A and Pokey B. The clusters have diverged in sequence by about 40%, have different ITR structures and include members that possess an intact transposase ORF. The two other clusters are MITEs, designated mPok 1 and mPok 2, because each mPok element contains an ITR and other non-coding sequences corresponding to one of the full-length Pokey elements. Annotated Pokey B and mPok 2 elements outnumber Pokey A and mPok 1 elements by over 4:1.
Available evidence suggests that both Pokey A and Pokey B occur in D. obtusa and thus the two lineages have likely persisted across multiple speciation events. Vertical diversification of TEs within the same genome can be driven by drift, selection, or more likely a combination of the two. Two models have been proposed. Lampe and colleagues  observed a loss of interaction between the ITRs and transposases of Tc1/mariner elements from different subfamilies with sequence divergence greater than 16%. They postulated that silencing mechanisms based on sequence similarity might create intragenomic selection that favors divergence of the transposase and ITR sequences of related TEs to escape silencing. A second possibility is that the presence of numerous non-autonomous elements drives the divergence of transposase and ITR sequences because the non-autonomous copies titrate the transposase from autonomous copies and decrease their fitness . In that case, intragenomic selection might favor divergent elements whose transposases can only recognize their own ITRs.
The ability of Pokey A and Pokey B elements to cross-mobilize could be investigated using yeast excision, yeast one-hybrid and/or electrophoretic mobility shift assays to determine the strength of interaction between the transposases and ITRs of each group. Although the differences in sequence between the two ITR structures appear minor (Figure 1), Casteret and colleagues  demonstrated that a small number of single nucleotide changes to the ITR of the drosophilid DNA transposon Mos1 produced significant changes in transposition rate.
The mPok elements appear to be of an atypically large size (approximately 750 bp) compared to other MITEs, which can be as small as around 130 bp . However, MITEs that are even larger than mPok have now been discovered in phylogenetically diverse eukaryotes (reviewed in ) suggesting that large MITEs are more common than once thought. One mechanism to explain the origin of large MITEs is progressive internal deletion of autonomous DNA TEs and subsequent selection for increasing transposition rate among the resultant elements over time . Thus, the larger size of mPok elements could be a consequence of their recent evolution. While this could be true for the mPok1 elements, which show little sequence diversity, the occurrence of highly divergent mPok 2b copies is not consistent with a recent origin (Figure 4). Indeed, Deprá and colleagues  suggested that the Mar MITEs in Drosophila willistoni, which are similar in size to the mPok elements, may have originated prior to the diversification of the willistoni subgroup 5.7 MYA, suggesting that large size does not necessarily indicate recent origin.
Repeated sequences in Pokey
An unusual aspect of the Pokey A and B lineages in Daphnia is the presence of sequences derived from NCRs of the rDNA unit. This includes an approximately 200 bp sequence from a non-repetitive region of the IGS (A repeats) and an approximately 50 bp sequence from ITS2 (C repeats) (Figures 1 and 5). Pokey elements contain from 2 to 5 copies of these rDNA sequences within their 5′ NCR (Figure 5). The highly recombinogenic nature of these repeats within the Pokey elements was first suggested by their differential spacing in pcPokey S and pcPokey L  and is strongly supported by this analysis in which particular combinations of A and C repeats are unique to only one or a few Pokey elements.
The acquisition of DNA to the 5′ NCR of Pokey does not appear to be limited to rDNA. For example, Pokey 62 contains a unique, approximately 3,600 bp sequence of which approximately 1,100 bp is derived from sequence on a non-rDNA scaffold in the Daphnia genome. Thus, Pokey elements often acquire sequences from their host’s genome. Langer and colleagues  proposed that Ds elements could acquire host sequence if the transposase slides after binding but before cutting, or if cryptic ITR-like sequences exist downstream of an element. However, the acquisition of sequences well within the 5′ NCR of the Pokey elements argues against such a simple explanation (Figure 5).
What is the significance of the A and C repeats? It is possible that they have no function and that their origin was chance recombination events that had no fitness impact on Pokey. However, the finding that all but one copy of Pokey from both lineages contain these repeats suggests that they do play some role in Pokey activity. Possible functions of these sequences include transcription enhancers, transcription terminators to prevent the formation of aberrant rRNA read-through transcripts, or binding sequences to recruit epigenetic modifiers [34–36]. We suggest a transcription role for these repeats to be most likely as mPok elements, which do not need to be transcribed to be mobilized by a Pokey transposase, do not have the rDNA repeats.
The most remarkable property of the A repeats is that the same sequence was retained in both the Pokey A and B lineages. Not only do the A repeats correspond to the highest level of sequence conservation between the two lineages, but the A repeats within the two Pokey lineages are as well conserved as IGS sequences undergoing concerted evolution within the rDNA unit (Figure 6). This high level of sequence identity suggests that recombination between the Pokey repeats and the rDNA repeats occurs on a regular basis, thus strengthening the argument that Pokey elements have become highly specialized for their insertion into the rDNA locus.
Target site selection and the rDNA niche
While it is not possible to assemble the sequences of individual Pokey elements inserted in rDNA, it should be noted that the consensus Pokey sequence from rDNA is similar to an assembled non-rDNA copy that could putatively encode a functional transposase (Figure 3). Given the rapid turnover of rDNA units, Pokey elements within the locus should be among the newest insertions, while those outside of rDNA are a combination of new and old insertions.
Pokey is the only DNA-mediated TE that is known to evolve insertion specificity for the rDNA unit. Remarkably, the Pokey insertion site is in the same region of the 28S gene that is also the target site for a number of non-LTR retrotransposons (reviewed in ). Two of these elements, R2 and R5, which insert within a few base pairs of the Pokey site, encode related endonucleases that have an active site similar to class IIS restriction enzymes [37, 38]. The R2 endonuclease has been shown to have exceptional specificity for the 30 to 40 nucleotides surrounding its insertion site [39, 40]. Two other non-LTR retrotransposons, R1 and R4, insert 75 and 28 bp, respectively, downstream of the Pokey site. These elements encode an endonuclease with similarity to the apurinic endonuclease involved in DNA repair . The endonuclease encoded by R1 has also been shown to have sequence specificity for the insertion site [41, 42]. In all four cases, most copies of the element are inserted in rDNA with most copies outside rDNA inserted into sites with sequence similarity to the 28S gene target site [13, 43].
The transposase of Pokey elements represents a third protein that has evolved specificity for this region of the 28S gene. Some of the best-studied examples of integrases that have evolved insertion specificity involve the LTR retrotransposons of yeast [44–47]. In these cases, the integrases have evolved protein-protein specificity for association with specific transcription factors or chromatin structural components rather than actual DNA sequence specificity. Such protein-chromatin interactions could also be involved in the insertion specificity of Pokey elements, but we are not aware of any specific chromatin components that are bound to the central region of 28S genes. Alternatively, the A repeat associated with Pokey elements may contain a recognition site for a nucleolar protein that helps guide into the nucleolus Pokey elements that have been excised and are ready for insertion.
It seems a remarkable coincidence that three different lineages of TEs have evolved specificity for the same small region of the rDNA unit. The 28S target region is highly conserved, but there are many regions of the 18S and 28S genes that are conserved across eukaryotes. We suggest either the DNA in this region is highly exposed and thus accessible to the TE machinery, a yet unknown chromatin component can be utilized by the TE in its evolution of specificity, or this is one of only a few areas of the rDNA where a TE can insert without being quickly eliminated by recombination or selected against by the synthesis of disrupted rRNA.
Based on the concordance between phylogenies of rDNA Pokey elements and their hosts, Penton and Crease  concluded that Pokey has undergone stable, vertical inheritance in the rDNA of species in the subgenus Daphnia since its origin. Thus, unlike most Class II TEs, Pokey elements appear to have evaded complete silencing by the host for millions of years. The unique breeding system of Daphnia, involving extended periods of apomictic reproduction, and the complete loss of sexuality in some lineages may have created strong selection pressure on ancestral Pokey elements to avoid causing deleterious mutations in their host, while still maintaining a transposition rate high enough to survive. The theory describing the interaction between TEs and asexual or partially asexual hosts predicts three possible outcomes: (1) active elements are lost, (2) the host goes extinct due to TE-induced mutation, or (3) the elements become domesticated and the threat is neutralized . However, Pokey’s invasion of rDNA suggests a fourth outcome, the long-term persistence of active elements.
Zhou and colleagues  have argued that rDNA is an ideal TE niche, because it is difficult for the host to completely silence elements that have inserted into genes that must be expressed. In addition, TEs inserted in the locus are continually removed by recombination events so old copies that could interfere with the elements are eliminated. Finally, each insertion has a predictable, small effect on the fitness of the host. This effect is small because all organisms contain more than enough rDNA for the production of rRNA, and those rDNA units with insertions are usually not transcribed . R2 and R1 elements, which are abundant in the rDNA of arthropods including crustaceans, have not been found in Daphnia. Perhaps Pokey elements are even better adapted for this niche in that they can be lost from the rDNA locus, but copies located outside the rDNA can on occasion be active and re-establish insertions in the locus. Indeed, individual D. pulex that lack Pokey A in rDNA have been observed, but no individuals have been observed that completely lack Pokey elements [11, 19, 50–52].