A large amount of genomic data has been generated by high-throughput sequencing technologies, which enable comparative analyses of retrotransposons within or across genomes. Short-read assemblies offer new possibilities for the detection of TEs; however, they are often excluded from assemblies because of their repetitiveness, genome-wide distribution, truncation, and nested organization [1, 8, 41, 42].
We carried out a comparative analysis of chromoviruses in the beet genome, using data from a preliminary B. vulgaris genome draft sequence, which provides an insight into the genomic and chromosomal organization, distribution, and evolution of these viruses.
Chromoviruses are widely distributed within genera of the Amaranthaceae family
Although the coding region of chromoviruses is relatively conserved within a clade, the sequence identity between the LTRs has been used for the grouping of the 22 chromoviruses into separate families . In total, 16 different chromoviral families of the CRM, Reina, Tekay, and Galadriel clades have been identified, representing all known chromoviral clades in higher plants. Nevertheless, our analysis of genome-wide RT sequences clearly shows that more chromoviruses may be identified from the genome draft sequence, which is also evident from the signal strength in the blotting and in situ hybridization experiments. Only one similar, but less exhaustive analysis of chromoviruses has been conducted previously, which was for the Musa acuminata genome .
Our analysis shows that families from the same clade have conserved characteristics throughout the genera Beta and Patellifolia. In cultivated and wild beets, the Bongo families of the Tekay clade were the most abundant, followed by the Beetle families of the CRM clade. In comparison, Beon 1 of the Galadriel clade most probably comprises a single family harboring only a few elements. Similarly, all M. acuminata Galadriel chromoviruses are members of a single family, designated Monkey ; however, Monkey members constitute about 0.2 to 0.5% of the genome . In M. acuminata, the Reina clade constitutes more than half of all chromoviruses and makes up about 4% of the genome, followed by the Tekay clade, constituting about 2% .
Chromosomal localization of Beta chromoviruses
The localization of plant chromoviruses on chromosomes has been investigated in several plants, focusing on members of the CRM clade that show a specific accumulation in centromeric regions. In banana, the chromoviruses of the Reina and Tekay clade were physically mapped in centromeric and peri-centromeric associated heterochromatin, whereas Monkey elements belonging to the Galadriel clade preferentially inserted into the NOR and co-localized with the rRNA genes . This was also observed in this study for Beon 1 in B. vulgaris, where the physical mapping indicated exclusive localization of Beon 1 copies in the NOR. The results suggest that the localization might have been established by a single integration event of a Beon 1 copy into the 18S rRNA gene of a common ancestor, as supported by the presence of Beon 1 elements within the 18S rRNA genes of wild beets from the sections Beta and Corolliflora. Such integrations in the NOR region are not unusual for TEs, and have been reported for the long interspersed element (LINE) R2Bm of Bombyx mori. Furthermore, B. vulgaris ribosomal RNA genes seem to tolerate TE integrations, as an insertion of a single BNR1 LINE has previously been reported . Nevertheless, our FISH analysis clearly showed multiple Beon 1 copies interrupting the 18S rRNA genes. The integration of several members of the tomato rDNA-related retrotransposon (TRRT) family within 18S rRNA genes was recently shown, and the existence of segmental duplication events rather than targeted integration has been proposed to explain retrotransposon amplification . Our studies assigned TRRT elements to the Galadriel clade, forming a branch together with Beon 1. The sequence conservation of Beon 1 copies might result from strong purifying selection and homogenization of the coding sequences of the ribosomal DNA.
Integrations into genic regions were shown for families of the Reina and Tekay clade. A Bongo 3 copy was identified in the vicinity of a disease resistance-activating factor , and the recently described chromovirus Bert (in this paper assigned to Bingo elements) was found within an intron of the callose synthase gene . It is possible that these plant retrotransposons have the ability to alter the expression of nearby genes, as the importance of TEs for the epigenetic regulation of plant genomes has been stated previously [50–52].
The functional role of the integrase
Several studies have confirmed the functional role of integrase for targeted integrations [13, 14, 53]. During the integration process, the interaction of DNA and integrase is crucial , and the relevance of the chromodomain in modulating the interaction with diverse chromatin components has already been shown [19, 55]. An enrichment of positively charged amino acid residues was found within the integrase of all Beta chromoviruses. The differences in the extent of these residues possibly regulate the degree to which the integrase is capable of establishing electrostatic protein–protein interaction. This might enable the retrotransposon to sense target-specific chromatin states, as described by Roudier et al. . Furthermore, we detected a potential NLS in all Beetle families of the CRM clade . Such NLS signals are part of the integrase of several retroviruses and LTR retrotransposons, and are responsible for the transfer of the pre-integration complex to the nucleus [58–60].
The chromodomains of the plant chromoviral clades CRM, Reina, Tekay, and Galadriel in B. vulgaris are easily distinguishable by the presence or absence of conserved amino acid residues compared with HP1, and their position in relation to the gag-pol polyprotein. Although the chromodomain sequences have been classified into three groups [19, 38], only group II chromoviruses have been identified in plants, with all of them belonging to the clades Tekay, Reina, and Galadriel.
The chromodomain encoded by CRM clade members such as Beetle extends into the 3′ LTR [21, 35]. Because of this substantial difference from group II chromodomains, the Beetle chromodomains are referred to as CR motifs. A recent survey of CRM clade elements across diverse plant genomes assigned these chromoviruses to three different groups . CRM chromoviruses of group A carry a CR motif, and are genuine centromeric retrotransposons, which probably transpose actively into centromeric regions. By contrast, group B members are not localized at the centromere, whereas group C representatives, despite a lack of the CR motif, were also found in centromeric regions. Interestingly, the P. procumbens chromovirus Beetle 1  and the Beetle 7 chromovirus from B. vulgaris described in the current study share considerable amino acid identity (41%) within their chromodomain, and also have similar LTR lengths (1089 and 1086 bp, respectively), indicating a role for both Beetle 1 and Beetle 7 in the formation of functional centromeres. Amino acid conservations were also found within the chromodomains of CR elements from different grass species [40, 61], supporting the assumption that the CR elements of grasses were derived from a single ancient family , and that conservation of the chromodomain is also crucial for centromere stability and thus host genome integrity.
Transcriptional activity of Beta chromoviruses
Centromeres are thought to be determined epigenetically , including via a transcription-mediated mechanism [51, 64]. Several RT-PCR studies have identified the transcriptional activity of chromoviruses, in particular of CRM clade members [21, 40, 64–66]. The analysis of B. vulgaris EST datasets indicates the capability of Beetle chromoviruses for autonomous transposition. Thus, the chromodomain as a key component of genuine CRs facilitates the targeting process into centromeric regions, and might be therefore responsible for the generation of centromeric transcripts, which are involved in RNA interference-mediated centromere identity and function.
The rRNA genes have high transcriptional activity, thus it is possible that read-through transcripts of Beon 1 might be generated. Alternatively, as the Beon 1 copies harbor intact coding sequences, their reverse transcription and the integration of new copies into the genome is conceivable. However, corresponding transcripts were not detected in the EST database. Hence, epigenetic silencing mechanisms might prevent the reverse transcription and spreading of Beon 1 copies into other chromosomal regions. This could be caused by the insertion of Beon 1 in two orientations, as was shown in wild beet species, whereas transcription would result in double-stranded RNA, which would immediately initiate the RNA interference machinery. Subsequently, Dicer-generated small interfering RNAs would serve as substrates in RNA-induced transcriptional silencing (RITS) complexes or RNA-induced silencing complex (RISC). RITS would initiate the transcriptional silencing of Beon 1 copies by RNA-directed DNA methylation. Most likely, is the post-transcriptional silencing of Beon 1 copies ed by degradation of mRNA mediated by RISC. In plants, it has been shown that rDNA transcription is subject to dosage control , with only a subset of rDNA genes being transcribed. It might be possible therefore that rDNA genes containing Beon 1 copies are not transcribed.
Based on the accumulation of mutations within their LTRs , we calculated the age of the transposition events and concluded that members of the 16 families transposed less than two million years ago. Therefore, these transpositions are evolutionarily recent events. However, families of the four clades are likely to be much older, as deduced from their widespread distributions within the genera Beta and Patellifolia. Furthermore, nearly 70% of the analyzed contigs of the draft sequence of the beet genome contain incomplete or recombined copies, which over time have lost the typical retrotransposon hallmarks. Ma et al. found that LTR retrotransposons are subject to a genome-specific recombination rate that results in a half-life of less than 6 million years in rice or 3 million years in Arabidopsis.