De-novo emergence and template switching of SINE retroposons during the early evolution of passerine birds

Passeriformes (“perching birds” or passerines) make up more than half of all extant bird species. Here, we resolve their deep phylogenetic relationships using presence/absence patterns of short interspersed elements (SINEs), a group of retroposons which is abundant in mammalian genomes but considered largely inactive in avian genomes. The resultant retroposon-based phylogeny provides a powerful and independent corroboration of previous indications derived from sequence-based analyses. Notably, SINE activity began in the common ancestor of Eupasseres (passerines excl. the New Zealand wrens Acanthisittidae) and ceased before the rapid diversification of oscine passerines (songbirds). Furthermore, we find evidence for very recent SINE activity within suboscine passerines, following the emergence of a SINE via acquisition of a different tRNA head as we suggest through template switching. We propose that the early evolution of passerines was unusual among birds in that it was accompanied by activity of SINEs. Their genomic and transcriptomic impact warrants further study in the light of the massive diversification of passerines.


Introduction
Short interspersed elements (SINEs) are the most abundant group of the reverse-transcribed retroposons in mammalian genomes (Sotero-Caio, et al. in revision). They rely on transmobilization by the enzymatic machinery of long interspersed elements (LINEs) (Ohshima, et al. 1996), a parasitic interaction so successful that the human genome contains >1,500,000 SINEs compared to <900,000 LINEs (Lander, et al. 2001). On the other hand, SINEs are scarce in avian genomes, which has been noted as one of the most peculiar genomic features of birds (Hillier, et al. 2004;Warren, et al. 2010;Zhang, et al. 2014). While LINEs exhibit up to 700,000 copies in avian genomes, there are only 6,000-17,000 SINEs per avian genome (Zhang, et al. 2014), most of these ancient and heavily degraded (Kapusta and Suh 2016).
Presence/absence patterns of SINEs in orthologous genomic loci are rare genomic changes appreciated widely as virtually homoplasy-free phylogenetic markers (Shedlock, et al. 2004;Ray, et al. 2006). Given the aforementioned scarcity of SINEs, it is not surprising that the emergence and activity of SINEs has never been studied in birds. On the other hand, other types of retroposed elements (REs; LINEs from the chicken repeat 1 superfamily, CR1, and long terminal repeat elements, LTRs) have helped resolve the relationships of various groups of birds, such as Galliformes (Kaiser, et al. 2007;Kriegs, et al. 2007;Liu, et al. 2012), Neoaves (Suh, Paus, et al. 2011;Matzke, et al. 2012;Suh, Smeds, et al. 2015), Palaeognathae (Haddrath and Baker 2012;Baker, et al. 2014), and others (St. John, et al. 2005;Watanabe, et al. 2006;Suh, et al. 2012;Kuramoto, et al. 2015). In the meantime, the sequencing of dozens of avian genomes has revealed SINEs with putative lineage specificity (Warren, et al. 2010; Kapusta and Suh 2016;) and thus the potential for conducting presence/absence analyses in specific groups of birds.
Here we conduct, to our knowledge, the first study of the emergence and activity of SINEs in birds. We focus on the deep phylogenetic relationships of passerines, the largest radiation of birds with nearly 6,000 extant species (Barker, et al. 2004), using 44 presence/absence markers of SINEs and other REs. In contrast to the only previous study of retroposons in passerines with a single RE marker (Treplin and Tiedemann 2007), our multilocus dataset permits the reassessment of sequence-based phylogenies [e.g., (Barker, et al. 2004;Selvatti, et al. 2015;Moyle, et al. 2016)] and, simultaneously, the reconstruction of the temporal activity of SINEs and other REs during early passerine evolution.

Results and Discussion
We initially chose RE marker candidates from selected retroposon families of zebra finch [including TguSINE1, (Warren, et al. 2010)] in October 2009, a time when genome assemblies were available only for chicken and zebra finch (Hillier, et al. 2004;Warren, et al. 2010). Candidates for presence/absence loci were therefore identified via pairwise alignment of RE-flanking sequences from zebra finch to orthologous regions in chicken (Materials and Methods). This was followed by in-vitro presence/absence screening of RE marker candidates as detailed elsewhere Suh, Paus, et al. 2011) using a representative taxon sampling of all major groups of passerines sensu Barker, et al. (2004) (Supplementary   Table  S1). We complemented this with a screening of GenBank (http://www.ncbi.nlm.nih.gov/genbank/) for additional SINEs, which identified a TguSINE1like insertion in myoglobin intron 2 of Pitta anerythra (accession number DQ785977) that is absent in the orthologous position of other Pitta species (Irestedt, et al. 2006). We termed this element "PittSINE" and identified PittSINE marker candidates in a DNA sample of Pitta sordida via inter-SINE PCR [ (Kaukinen and Varvio 1992); Materials and Methods]. This was followed by cloning and sequencing of the 500-bp to 1,000-bp fraction of PCR amplicons, alignment to chicken and zebra finch genomes to reconstruct the left and right SINE-flanking regions, and then in-vitro presence/absence screening of PittSINE marker candidates.
Next, we characterized the structural organization of passerine SINEs ( Fig. 1) using the available TguSINE1 consensus sequence (Warren, et al. 2010) and after generating a majority-rule consensus of PittSINE insertions in our sequenced presence/absence markers (Supplementary Data S1). Both SINEs have highly similar, CR1-derived tails ( Fig. 1) which exhibit the typical hairpin for putative binding by the CR1 reverse transcriptase and an 8-bp microsatellite at their very end for target-primed reverse transcription (Suh 2015). However, the heads of these SINEs are derived from different tRNA genes, namely tRNA Ile in 4 TguSINE1 and tRNA Asp in PittSINE (Fig. 1). Sequence alignment suggests that the tRNAderived SINEs heads are more similar to the respective tRNA genes than they are to each other (Fig. 1C). However, the opposite is the case for the CR1-derived SINE tails, which exhibit four diagnostic nucleotides distinguishing them from the highly similar 3' end of CR1-X1_Pass (Fig. 1C).
We further investigated this peculiar pattern using phylogenetic analyses of the CR1-derived SINE tails and avian CR1 subfamilies sensu Suh, Churakov, et al. (2015), which again suggests that TguSINE1 and PittSINE have a single SINE ancestor which derived its tail from CR1-X1_Pass ( Fig. 2A). Assuming that SINEs are trans-mobilized by LINE reverse transcriptase enzymes due to high sequence similarity between SINE tails and LINE 3' ends (Ohshima, et al. 1996) and thus depend on LINE activity, the most likely candidate for SINE mobilization is the CR1-X1_Pass subfamily. This is further supported by temporal overlap of TguSINE1 and CR1-X activity in RE landscapes of the zebra finch genome (Fig. 2B).
Additionally, we detected direct evidence for temporal overlap of TguSINE1 and CR1-X1_Pass activity through our presence/absence analyses (Fig. 3A, Supplementary Table S1).
Our extensive RE presence/absence analyses yielded 19 TguSINE1 markers, 6 PittSINE markers, 13 CR1 markers, and 6 LTR markers which we could trace across a representative taxon sampling of the major groups of passerines sensu Barker, et al. (2004). Careful inspection of presence/absence alignments using strict criteria (see Materials and Methods) yielded a conflict-free set of RE markers, which we mapped on a maximum likelihood tree constructed from concatenated RE-flanking sequences from the same data set (Fig. 3A). For three of the deepest passerine branching events, we found a multitude of RE markers and thus statistically significant support in available RE marker tests (Waddell, et al. 2001;Kuritzin, et al. 2016). These relationships are the respective monophyly of passerines and oscines, as well as the monophyly of Eupasseres , a group comprising all passerines except the New Zealand wrens Acanthisittidae. The Eupasseres/Acanthisittidae split was first observed in sequence analyses of few nuclear genes (Barker, et al. 2002;Ericson, et al. 2002) and has since been recovered in ever-growing nuclear sequence analyses [e.g., (Barker, et al. 2004;Ericson, et al. 2014;Selvatti, et al. 2015;Moyle, et al. 2016)]. Our analysis of rare genomic changes thus provides the first assessment of this group using an independent marker type and phylogenetic method. None of our RE markers inserted during the rapid radiation of oscine passerines, however, sequence analysis of the RE-flanking regions yielded a topology identical to the aforementioned previous studies. Of particular interest are the four deep-branching oscine lineages Menuridae (e.g., Menura novaehollandiae), Climacteridae (e.g., Climacteris picumnus), Maluridae/Meliphagidae (e.g., Malurus cyaneus and Myzomela eques), and Pomatostomidae (e.g., Pomatostomus superciliosus) because these four lineages together have been rarely included in passerine phylogenetic studies. We find a branching order (Fig. 3A) which recapitulates previous phylogenetic estimates based on few nuclear genes (Barker, et al. 2004) or ultraconserved elements (Moyle, et al. 2016). This suggests that the rapid radiation of oscines can be congruently resolved even with non-genome-scale data. We note that this is in contrast to the neoavian radiation, which appears to be partially irresolvable even with retroposon markers [reviewed by Suh (2016)]. Within passerines, we further note that the conflict between single-RE support for a Picathartidae/Corvidae clade (Treplin and Tiedemann 2007) and sequencebased phylogenies (Han, et al. 2011) results from incorrect placing of this RE marker on the passerine Tree of Life (see legend of Supplementary Fig. S2 for more information).
We then traced the emergence and activity of SINEs across the passerine Tree of Life. Given 6 that RE marker candidates were initially chosen on chicken/zebra finch alignments, we expect no bias in the distribution of RE markers on the lineage leading to zebra finch.
TguSINE1 was mostly active in the ancestor of oscines and, to a lesser extent, in the ancestor of Eupasseres. Interestingly, we find no evidence for TguSINE1 activity in the common ancestor of passerines or during/after the radiation of oscines and therefore hypothesize that TguSINE1 emerged in Eupasseres and became extinct in the oscines ancestor (Fig. 3A). The emergence of TguSINE1 is thus the first "genome morphology" character for the monophyly of Eupasseres and supplements support from skeletal morphology, which is limited to the presence of a 'six-canal pattern' in the hypotarsus .
In contrast to the situation in oscines, the activity of TguSINE1 appears to have been longer in suboscines, postdating the divergence between Old World and New World suboscines (i.e., pitta and phoebe in Fig. 3A). This recent, potentially lineage-specific activity coincides with the putative restriction of PittSINEs to Old World suboscines (e.g., Pitta spp.). The aforementioned indication for a common SINE ancestor of TguSINE1 and PittSINE evidenced by four diagnostic nucleotides in their CR1-derived SINE tails (cf. Fig. 1C and Fig.   2A) suggests that the younger PittSINE emerged from the older TguSINE1 after acquisition of a new tRNA-derived head. Assuming that TguSINE1 and PittSINE were both active on the pitta lineage, we propose that the most plausible mechanism for PittSINE emergence was template switching from TguSINE1 to a nearby tRNA during reverse transcription (Fig. 3B).
Template switching has been previously proposed in a wide range of chimeric retroposons [e.g., (Brosius 1999;Gilbert and Labuda 2000;Buzdin, et al. 2002;Nishihara, et al. 2016)] and appears to be a particularly common opportunity for SINEs to parasitize different LINEs via acquisition of new SINE tails (Ohshima and Okada 2005;Nishihara, et al. 2016). Our data show that template switching may also happen for SINE heads and we speculate that the 7 acquisition of a new SINE head from a different tRNA gene may provide intact and active promoter components for efficient transcription by RNA polymerase III.

Conclusions
Here, we reconstructed the deep phylogenetic relationships of passerines using presence/absence patterns of unusual SINE insertions and other REs. This permitted us to follow the emergence, activity, and extinction of TguSINE1 and PittSINE across the evolution of the most species-rich group of birds. While this SINE activity was considerably lower than, for example, that in mammals, it nevertheless exemplifies that at least some birds have a more diverse repetitive element landscape than previously anticipated. Furthermore, we note that the activity of TguSINE1 appears to coincide with the evolution of vocal learning during early passerine evolution (Suh, Paus, et al. 2011). Previous evidence suggests that ~4% of birdsong-associated transcripts in the zebra finch brain contain retroposons (Warren, et al. 2010) and it thus remains to be seen whether SINE activity influenced the evolution of, for example, vocal learning in oscine passerines.

Materials and Methods
We identified candidates for presence/absence loci for TguSINE1 and other selected zebra finch retroposons via pairwise alignment of RE loci from zebra finch to orthologous regions in chicken. This was done by comparing and extracting the respective RE-flanking sequences in the UCSC Genome Browser (Fujita, et al. 2011 (Suh, Paus, et al. 2011). This led to a total of 44 high-quality RE presence/absence markers (Supplementary Table S1, Supplementary Data S2). All newly generated sequences were deposited in GenBank (accession numbers XXXXXXXX-XXXXXXXX).
All maximum likelihood sequence analyses were conducted using RAxML 8.1.11 (Stamatakis, et al. 2008) on the CIPRES Science Gateway (Miller, et al. 2010  Maximum likelihood phylogeny of passerine SINE tails and avian CR1 subfamilies in Repbase (Jurka, et al. 2005) (GTRCAT model, 1,000 bootstrap replicates) suggests that TguSINE1 and PittSINE arose from the same CR1-X subfamily (CR1-X1_Pass) and share a common SINE ancestor. Note that the topology of the CR1 phylogeny is identical to that of previous studies (Suh, et al. 2012;Suh, Churakov, et al. 2015). (B) Comparison of the TguSINE1 landscape with landscapes of CR1 families (merged subfamilies from panel A)

Figure Legends
suggests temporal overlap of SINE and CR1-X activity in the zebra finch genome. RE landscapes were generated using the zebra finch assembly taeGut2 following methods detailed elsewhere (Suh, Churakov, et al. 2015). Our sampling consists of the major deep passerine lineages sensu Barker et al. (2004). Red and green asterisks indicate emergence of TguSINE1 and PittSINE, respectively. The black asterisk indicates that for some loci (Supplementary Table S1), Malurus cyaneus was sampled instead of Myzomela eques to represent the Maluridae/Meliphagidae clade (Barker, et al. 2004 The character states are '+' (RE presence), '-' (RE absence), or 'd' (unspecific deletion) for each genomic locus. Missing data is indicated by '?'. The TguLTR5d insertion of marker L-4 was first described in Suh, Paus, et al. (2011)