Genomic analysis of mouse VL30 retrotransposons
© Markopoulos et al. 2016
Received: 24 February 2016
Accepted: 25 April 2016
Published: 6 May 2016
Retrotransposons are mobile elements that have a high impact on shaping the mammalian genomes. Since the availability of whole genomes, genomic analyses have provided novel insights into retrotransposon biology. However, many retrotransposon families and their possible genomic impact have not yet been analysed.
Here, we analysed the structural features, the genomic distribution and the evolutionary history of mouse VL30 LTR-retrotransposons. In total, we identified 372 VL30 sequences categorized as 86 full-length and 49 truncated copies as well as 237 solo LTRs, with non-random chromosomal distribution. Full-length VL30s were highly conserved elements with intact retroviral replication signals, but with no protein-coding capacity. Analysis of LTRs revealed a high number of common transcription factor binding sites, possibly explaining the known inducible and tissue-specific expression of individual elements. The overwhelming majority of full-length and truncated elements (82/86 and 40/49, respectively) contained one or two specific motifs required for binding of the VL30 RNA to the poly-pyrimidine tract-binding protein-associated splicing factor (PSF). Phylogenetic analysis revealed three VL30 groups with the oldest emerging ~17.5 Myrs ago, while the other two were characterized mostly by new genomic integrations. Most VL30 sequences were found integrated either near, adjacent or inside transcription start sites, or into introns or at the 3′ end of genes. In addition, a significant number of VL30s were found near Krueppel-associated box (KRAB) genes functioning as potent transcriptional repressors.
Collectively, our study provides data on VL30s related to their: (a) number and structural features involved in their transcription that play a role in steroidogenesis and oncogenesis; (b) evolutionary history and potential for retrotransposition; and (c) unique genomic distribution and impact on gene expression.
KeywordsVL30s Retrotransposon PSF Mouse genome Exaptation
Mobile DNA sequences occupy almost half of many mammalian genomes [1, 2], and in the past they have been considered as solely selfish or junk DNA sequences. Recent studies, however, have provided evidence that mobile DNA can be harnessed for host functions, and it is now accepted that mobile DNA-host interactions play a major role in organism physiology, pathology and evolution . Mobile DNA elements are divided into two major groups, DNA transposons and retrotransposons. While DNA transposons mobilize through a conservative mechanism, comprising their excision and subsequent integration into a new site, retrotransposon numbers increase in the genome through an RNA intermediate via a mechanism known as retrotransposition  and, based on their genomic structure, they are further subdivided into Long Terminal Repeat (LTR)- and non-LTR retrotransposons.
Viral-like 30 elements (VL30s) are a family of 5–6 Kb retrovirus-like DNA sequences present in mouse and rat genomes, and are classified as LTR retrotransposons . VL30 LTRs contain a large repertoire of transcription factor binding sites that provide a remarkable plasticity for VL30 RNA expression [4, 5]. VL30 transcription is tissue-specific , up-regulated by histone phosphorylation, acetylation and DNA demethylation , and can be induced by several different factors including C2 ceramide, a dominant negative p53 oncoprotein  or in cerebral ischemia . VL30s are considered as immediate early response genes as their RNA expression is rapidly induced following transient inhibition of protein synthesis or mitogenic stimulation [10, 11]. Another important feature of VL30s is that their RNA is packaged into C-type retroviral particles of murine leukemia viruses and can be transmitted to heterologous cells [12, 13], making them an important potential tool for gene transfer. VL30 transcripts play functional roles as inducible VL30 transcription regulates expression of neighboring genes, creating a transcription network . Induced VL30 RNA seems to be a critical factor involved in oncogenesis and steroidogenesis [15, 16], as it contains specific motifs for binding to spliceosome factor (PSF), leading to RNA induction of PSF-repressed genes.
Sequence analysis of two VL30 members, NVL-3  and BVL-1 , has revealed no coding open reading frames due to multiple stop codons for gag and pol, and no evidence for env retroviral genes, classifying them as non-autonomous LTR retrotransposons. Specifically, the NVL-3 member is retrotransposition-competent [19, 20] and its retrotransposition is induced by the large T antigen of Simian virus 40  or oxidative stress agents such as hydrogen peroxide , vanadium  and arsenic . Finally, induced VL30 retrotransposition activates a caspase-independent and p53-dependent cell death pathway associated with mitochondrial and lysosomal damage .
The availability of bioinformatics tools allows the analysis of genomic DNA sequences related to their genomic distribution, evolution and prediction of functional properties. Such analyses have revealed that more than 2/3 of mammal genomes may be comprised of mobile DNA , a much higher percentage than the ~50 % initially calculated . They have also provided novel insights on the features and biology of LTR retrotransposons such as human HERV-Ks  and many mouse LTR retrotransposons [27, 28].
Data on mouse VL30s comes from studies based on conventional methods, such as hybridization analysis and sequencing [4, 5], and gross estimations , prior to publication of the mouse genome, but a detailed genomic analysis of mouse VL30s is not yet available. In this study, we systematically annotated VL30 sequences and analyzed their number, structure, phylogeny and genomic distribution providing new insights for their potential role in the mouse genome.
Copy number, structural features and variation of VL30 sequences
As the only available genomic data for VL30s come from sequencing of a few individual elements , prior to publication of the mouse genome, we attempted to measure their total number, analyse their structural features and predict functional traits of particular VL30 sequences.
To search for VL30 sequences in the UCSC mouse Database [30, 31], we assumed that: (i) an intact VL30 element, independent of its length, should contain near identical 5′ and 3′ LTRs, an intact retroviral Primer Binding Site (PBS) required for binding of a primer tRNA to initiate reverse transcription, and a polypurine Tract (PPT) responsible for initiating (+) strand synthesis during reverse transcription; (ii) a truncated VL30 should contain an internal sequence but not necessarily two intact LTRs; and (iii) solo LTRs, of varying sequence length, should only contain LTR sequences as defined by the Repeat masker tool . Using a combination of Repeat Masker analysis (http://www.repeatmasker.org) for VL30 internal sequences , BLAT querying  for known VL30 consensus sequences from Repbase  and in-silico PCR for specific VL30 primers, corresponding to a conserved internal region , our genomic search initially revealed 372 total VL30 sequences. All sequence data were extracted along with 2000 bp of 5′ and 3′ flanking genomic regions.
Using the LTR-finder tool  in the above extracted VL30 sequence data we distinguished full-length from truncated and solo LTR sequences and found that the mouse genome contains 86 full-length elements in a sequence length ranging from 4075 to 6321 bp. In addition, manual annotation of the remaining non-intact VL30 sequence data revealed 49 truncated elements and 237 relative solo LTRs. The particular genomic coordinates of full-length, truncated and solo LTR VL30s as well as their sequence features are provided in Additional file 1.
Early studies have shown that mouse and rat VL30 RNAs can be efficiently packaged into MoMLV virions [37, 38] through a retroviral Ψ packaging-sequence signal. Given that a mouse VL30 Ψ consensus sequence is not known, we attempted to identify such a sequence comparing the full-length sequences with the known 68-bp purine-rich sequence GGCAAGCCGGCCGGCG , downstream of the rat 5′ LTR VL30, which is critical for RNA dimerization and packaging. Examining the entire internal sequence of all 86 full-length sequences by the Blast algorithm, between 5′ and 3′ LTRs as well as downstream of U5 and upstream of U3 regions, neither a conserved nor a rat-related Ψ signal motif was found.
Finally, based on the property of VL30 RNA to bind PSF , we analyzed, using Blast alignment, all full-length VL30 sequences for the two known specific motifs of PSF binding: CAGCTGCCCTGCCTCCCACTCC and CAGCTTCTCATTCCCCTGTCCCTCCCATCC . We found that in contrast to three PSF-negative elements, five contained only one while all the rest of the 78 sequences contained both motifs (Fig. 1 and Additional file 1). Next, we asked whether truncated elements also had PSF binding motifs. A similar analysis showed that 9 out of 49 truncated elements had no PSF signal motifs. In contrast, 6 elements contained one while 34 contained both PSF-binding motifs (Additional file 1).
VL30s transcriptional and protein coding potential
While both LTR sequences participate in the integration of a retrovirus into the host genome, the 5′ LTR serves as a promoter regulating the entire retroviral RNA expression accomplished through cellular transcription factors which interact with specific DNA motifs located in this region.
We attempted to identify transcription factor binding sites (TFBS) over the entire 5′ LTR sequences of full-length VL30s using the Matinspector tool  in the Genomatix database (see Methods). Specifically, we analyzed the group of 5′ LTRs of all 86 full-length VL30s found along with a group of 5′ LTRs of 7 known VL30 elements (such as NVL1/2, NVL3, BVL1, VL3, VM1, VL11 and B10) for common or unique TFBS. Our analysis revealed 10,371 TFBS of 53 distinct transcription factors (TFs) to exist in 70 % of both element groups and a total number of 15,039 (13,912 in 86 full-length elements, denoted in parenthesis thereof) TFBS of 197 TFs in at least one element (Additional file 2). The number of TF binding sites largely varied depending on the particular VL30 LTR and chromosome. For example, the binding sites of HOMF (homeobox-domain factor) and VTBP (Vertebrate TATA Binding Protein) were present in all 93 (86) (Additional file 2) analyzed LTRs and in all chromosomes with a high number of 550 and 370 sites (516 and 346) ranging between 1–12 and 1–7 per chromosome, respectively. To a lesser extent, the glucocorticoid responsive/related elements (GREF) and nuclear factor kappa-light chain enhancer of activated B cells (NF-κB) TF binding sites were present in 87 and 85 LTRs, with 265 and 131 binding sites, respectively, ranging 1–7 and 1–4 sites per chromosome. Finally, some TF binding sites were extremely infrequent with 1–2 sites restricted either to only one or two LTRs in a particular chromosome, such as those of HASF and MEF3 with two binding sites on 2 different chromosomes, or CABL and HNFP with 1 binding site in different chromosome, respectively (Additional file 2).
In addition, we analysed the truncated and solo LTR sequences since they can also act as promoters or enhancers, driving the expression of adjacent genomic loci. We scored 8558 and 44,428 total TFBS in truncated and solo LTR sequences, respectively. In analogy to full-length elements, HOMF and VTBP TFBSs we present in almost all truncated sequences with a high number of 238 (in 47 out of 49 sequences) and 122 (in 48 out of 49 sequences) sites, respectively. As regards solo LTRs the respective TFBSs for HOMF and VTBP were 1266 and 643 in 234 out of 237 sequences, respectively (Additional file 2).
Next, we extended our analysis searching for VL30-associated ESTs in the mouse section of dbEST. We retrieved 592 total ESTs starting or terminating within 99 distinct VL30 sequences. In particular, 414 different ESTs were identified in two-thirds of the full-length VL30 sequences (57 out of 86). Moreover, we found 85 and 93 ESTs in 14 truncated and 28 solo LTR sequences, respectively (Additional file 3).
In a final step, we searched whether VL30s have protein coding potential, using the ORF-finder  in the sequences of full-length elements. The analysis of their internal sequences between 5′ and 3′ LTRs, revealed multiple stop codons and no intact full-open reading frames corresponding to typical gag, pol and env retroviral genes (data not shown).
The genomic distribution of VL30s
Taking into account the total number of 372 VL30 sequences found, we asked whether there are random or “hot spot” integrations of VL30s in the mouse genome, as their genomic distribution is unknown.
VL30 elements closely located to mouse genes
Associated gene, (distance to TSS)
Adjacent to TSS
Alternative promoter, Enhancer Or Effects in gene expression
TSS within VL30
Alternative splicing region or disruption of transcription
TCR-beta chain (+20882)
End of gene
Phylogeny of VL30 and estimation of their integration time in the mouse genome
There is a relatively limited data on VL30 phylogeny concerning only a small subset of 18 sequences, which was obtained prior to mouse genome sequencing .
To confirm the evolutionary relationships analyzed, we combined the data of the two aforementioned analyses by highlighting with rhombus symbols the elements with identical 5′ and 3′ LTRs in the evolutionary tree obtained (Fig. 6). Indeed, Group I was the oldest one as its members shared no identical LTRs, and 7qA1, having the most divergent 5′ and 3′ LTRs, was the “oldest” VL30 element. Group II was the immediately younger subgroup comparing to Group I and contained 13 sequences (out of 21 total) highly similar to VL3 that shared identical 5′ and 3′ LTRs. Finally, Group III elements, although consisting of three highly related and one divergent subgroup, were the overall youngest group as their high majority, 34 out of 46 elements, had identical LTRs (Fig. 6). Collectively, based on the above dating analysis, Group I contains elements that are 17.5-3.1 Myrs old (from 7qA1 to 2qF1), Group II of 9.3 Myrs old up to date (from 13qB3.2 to 8 elements with identical LTRs) and the youngest Group III of 6.2 Myrs old up to date (from 6qA2 to 34 elements with identical LTRs).
The present study, based on the complete sequencing of the mouse genome and utilizing advanced bioinformatic tools, presents an in silico analysis of VL30s providing information about their structural characteristics, chromosomal distribution and evolution.
Features of VL30 elements
Prior to this study, it was generally assumed that the mouse genome contained ~150–200 VL30 full-length elements , while our analysis revealed 372 total VL30 sequences. Specifically, we found 86 full-length elements, 49 truncated elements, and 237 solo LTR sequences. Accordingly, the total number of truncated and solo LTR sequences is 3.3-times more than those of full-length elements. This implies that both types of non-intact VL30 sequences rather derived after several mutation/recombination events during evolution. Hence, in contrast to previous assumptions, the mouse genome contains 86 full-length elements comprising the 23.1 % of the total VL30 sequences. 76 out of 86 full-length elements analysed had a hallmark of their genome integration, a conserved 4-bp TSD [4, 20]. LTRs were found to be quite heterogeneous, ranging from 436 to 681 bp. Thus, as VL30s are retrotransposition-competent , this finding might be explained by the occurrence of mutations during transcription and/or reverse transcription during retrotransposition, or by recombination events during evolution. Regarding the PBS signals, the large majority had Gly tRNA species specificity, while 17 and 10 elements had a Pro and Gln tRNA species (Fig. 1), respectively. Based on this, we may categorize full-length elements into three distinct groups according to the replication tRNA species: the prevalent group of Gly tRNA-positive elements and those of less frequent groups Pro- and Gln tRNA-positive elements. In addition, the different groups of tRNA species in association to conserved PPTs found (Fig. 1), shows the versatile nature of VL30s replication. Importantly, all internal sequences were characterized by the absence of an env gene while gag and pol genes had multiple stop codons, documenting that VL30s have no intact ORFs or code for very small peptides. Although VL30s are retrotransposition-competent these data emphasize that none of the full-length VL30s is an autonomous-LTR retrotransposon and their retrotransposition is mediated in trans-complementation, probably, by an endogenous MoMLV reverse transcriptase . Strikingly, out of 135 total VL30 sequences, 86 full-length and 49 truncated, only 12 sequences - 3 full length and 9 truncated ones - had no PSF-binding motifs (Fig. 1 & Additional file 1). This large majority of PSF-positive sequences, irrespective of their full length or truncated state, document an almost universal property of VL30s, unique among retroelements, involved in mouse steroidogenesis and oncogenesis . Finally, we were unable to locate or determine a specific or consensus Ψ signal for mouse VL30s, following comparison with that of rat VL30s. Though mouse VL30 elements undoubtedly have a strong but unknown yet Ψ signal, as their RNA can be encapsidated in C-type retroviruses . Thus, we believe that the different Ψ signal in rat and mouse VL30 elements distinguishes evolutionarily these murine species.
VL30s-related transcription factors and ESTs
It is well known that cellular transcription factors (TFs) interacting with specific DNA motifs located in 5′ LTRs regulate retroviral transcription . We found that although the number of TF binding sites (TFBS) largely varied depending on the particular VL30 LTR, we identified 10,371 TFBS allocated to 53 distinct TFs, common to 70 % of 86 new full-length elements, and a total number of 14,880 TFBS from 197 TFs in all 5′ LTRs analyzed (Additional file 2). Notably, while some TFBS were present in almost all full-length elements such as those of HOMF and VTBP, others were extremely infrequent with 1–2 TFBS in only one or both LTRs (Additional file 2). In relation to the glucocorticoid responsive/related TFBS of GREF, they were present in all LTRs of full-length elements with 264 total binding sites, ranging from 1 to 7 sites per element. This particular finding justifies our previous observation on VL30 RNA induction through nuclear steroid/receptor complexes binding on GREF following estradiol, diethylstilbestrol, progesterone or dexamethasone treatment . Significantly, we identified 52,986 total TFBSs in both truncated as well as solo LTR sequences that may be similarly involved in gene expression, as full-length VL30 LTRs. In support of this comes the finding that 99 distinct VL30 sequences had 592 VL30-associated ESTs (Additional file 3), implying their involvement in initiation or termination of RNA expression. In conclusion, VL30s: (a) bear a large number of common or unique TFBS, explaining the versatile VL30 RNA expression, which can be tissue-specific or up-regulated by various or pleiotropic stimuli, and (b) provide transcriptional initiation or termination sites, which may affect cellular gene expression.
VL30s genomic distribution
The analysis of chromosomal distribution of VL30s shows that they are ubiquitous in mouse autosomal and sex chromosomes (Fig. 2) and integrated in regions of ~42 % GC content (Additional file 4), which is the exact GC average of the mouse genome . Nevertheless, two lines of evidence support their preferential chromosomal integration documented by: (a) chromosomes 3, 7, 12, 13, 17 and X where the number of integrations is higher that than those expected (Fig. 3), and (b) a possible integration “hot-spot” mapped in 1.5 Mb of XqF2-F3, where 8 truncated VL30 elements were inserted (Fig. 4) probably as a result of serial unequal recombination events between loci of the X chromosome.
As concerns VL30 genomic distribution with respect to gene regions, we found that the large majority of 372 VL30 sequences were preferentially integrated near or in gene regions. In particular, 285 or 272 integrations were located upstream or downstream of TSSs, respectively, in a distance less than 500 Kb (Fig. 5). The significance of such integrations is that they may play an important role in the epigenetic regulation of gene expression. Specifically, first, VL30s affect nearby gene expression acting in a distance of up to 220 Kb , and contain many TFBS. Based on this, VL30 integrations at about of <20 Kb distance upstream of 29 known genes (Table 1) may play a cis-acting regulatory role in modulating their expression. Thus, VL30 LTRs integrated adjacent, upstream or downstream of gene Transcriptional Start Sites (TSS) (Fig. 5) may act as promoters or enhancers for a large number of genes (Table 1). Notably, VL30 members are retrotransposition-competent and their retrotransposition is induced up to ~90,000- or 420,000-times higher than the natural retrotransposition frequency by arsenic  or H2O2 , respectively. Conceivably, in these cases, the number of new integrations might lead to a dramatic change of the whole genome expression profile. Secondly, we found three full-length and one truncated VL30 element containing a TSS (Table 1) and their LTRs may serve as primary promoters for these genes. Thirdly, 22 integrations found inside introns and 2 ones at the 3′ end of genes (Table 1). Intronic integrations may affect mRNA splicing, as the presence of splicing donor sequences in some VL30s  is consistent with this concept. In reference to integrations at the 3′ end of genes (Table 1) we believe that they possibly act as transcription termination signals through VL30 polyA sequences since retrotransposon integrations lead to premature transcriptional termination even at a >12.5 Kb distance .
Finally, we identified a significant number of integrations near genes of the KRAB (Krueppel-associated Box containing) family (Additional file 6), known to participate in epigenetic silencing of endogenous retroviruses [49, 50]. Given that: (i) retrotransposons participate in or control regulatory networks in embryonic stem cells  or more premature developmental stages ; (ii) VL30s contain a large repertoire of TFBS (Additional file 2) and are competent of inducible [5, 8] and cell type-specific expression ; and (iii) Mouse VL30s inserted near genes affect their expression ; we suggest that the unique genomic distribution of VL30s near genes, such as those of the KRAB gene family, might ultimately lead to their selection as functional elements influencing epigenetic gene regulation and participating in regulatory networks.
Based on the evolutionary tree created in the present study (Fig. 6) and on the calculation of the integration time for individual VL30s phylogeny (Fig. 7), we found that VL30s are divided into three evolutionary groups.
Our dating analysis is based on nucleotide substitution rate that creates divergence between 5′ and 3′ LTRs of individual elements. Even though we cannot exclude possible biases in date estimates (deamination biases for instance), we followed that method since it is the most widely accepted for LTR elements, has been used for HERV elements  and our data from intra-element data analysis come in agreement with their position in the phylogenetic tree. We document that Group I is the oldest one emerged ~17.5 Myrs ago, after the calculated rodent-primate divergence of ~41 Myrs ago  and this may explain why VL30s are rodent-specific, not existing in primates. Moreover, the element 7qA1 seems to be the archetype or “the VL30 eve” (Fig. 7). Group II and Group III elements comprised mostly of full-length elements, emerged 9.3 and 6.2 Myrs ago, respectively and are characterized by highly conserved LTRs (Fig. 6) and replication signals (Fig. 1). The elements of both groups are active or retrotransposition-competent, as 42 out of 86 full-length elements share identical 5′ and 3′ LTRs representing present-day integrations, and being possible candidates for the evolutionary youngest elements. VL30 grouping is further strengthened by the fact that while Group I elements are characterized by heterogeneity using 3 different tRNA species, Group II and Group III contained elements using Gly or Pro and Gly tRNA species, respectively (Fig. 6 and Additional file 1). We believe that the oldest elements probably diverged due to recombination/mutation events and upon evolutionary pressure underwent selection and fixation in the mouse genome. We cannot also exclude that these different PBS signals may indicate independently derived mutations from cross-species transmission events. Our data are also supported by previous findings that: (a) VL30s are highly polymorphic (at ~40 % of integrations) across 18 mouse strains ; (b) VL30 RNA is the only one (among LINEs, SINEs and IAP retroelements and DNA transposons) up-regulated by both histone hyperacetylation and DNA demethylation agents ; (c) overexpression of VL30 elements following DNA demethylation is associated with abnormal placental formation of Mus musculus × Muscaroli hybrids ; and (d) hydrogen peroxide induces VL30 retrotransposition of NVL3 element at ~42 % , the highest frequency ever measured in cultured mammalian cells. Taken together, our evolutionary data support that VL30 elements are highly active and rapidly expanding in the mouse genome.
Moreover, taking into account: (i) the recent expansion of VL30s in the mouse lineage (Fig. 2); (ii) the universal feature of VL30 RNAs to bind PSF (Fig. 1), leading to oncogenesis and steroidogenesis [15, 16]; and (iii) their possible role as insertion mutagens and inducers of cell death , we believe that PSF binding may act in some cases as a positive selection mechanism, leading to “tolerance” of VL30 integrations, since the deleterious effect of new retrotranspositions may be compensated for by the physiological action of VL30 RNA in steroidogenesis.
This is the first extensive study on the LTR retrotransposon VL30s, which reveals that the mouse genome harbors precisely 372 total VL30 sequences characterized as 86 full-length, 49 truncated and 237 solo LTRs. The full-length VL30s are categorized into three distinct evolutionary groups. The finding that 42 out of 86 ones share identical LTRs and are marked by various genomic TSDs implies their potential to mobilize as non-autonomous retrotransposons, leading thus to a significant genome reorganization. A hallmark of VL30 genomic distribution is their preferential integration in 6 out 21 mouse chromosomes. At the transcriptional level, they can be diversely induced by various stimuli as they bear a very large number of TFBS also related to tissue-specificity. The most striking functional mark of VL30s, among other retroelements, is their involvement in steroidogenesis and oncogenesis as 91.2 % of both full length and truncated ones are PSF-positive. Finally, the large majority of the 372 sequences bearing LTR integrated near or in gene regions denotes their role in gene regulation, serving either as promoters/enhancers for many cellular genes or transcription termination signals through their polyA-signals. Retrotransposons, resembling the Janus double-faced Roman god, have a dual impact on the genome. Specifically, while new integrations of the retrotransposon copies might be mutagenic or deleterious leading to cell death their LTRs could be exploited by the cell for gene regulation, adaptation and general homeostasis upon environmental changes. Conclusively, VL30s, fulfilling both effects of retrotransposon action, might have been evolved as important mouse “LTR controlling elements” in parallel to those that Barbara McClintock stated for maize transposons .
Bioinformatics mining tools for identification of VL30 sequences
The UCSC genome browser (http://genome.ucsc.edu/) [30, 31] bioinformatic tools were used to screen manually the mouse genome (version mm9) for VL30 sequences. A combination of Repeat Masker analysis (http://www.repeatmasker.org) , BLAT querying  for VL30 consensus sequences from Repbase  and in-silico PCR for specific VL30s primers  was used to annotate VL30 sequences in their respective genomic regions. The LTR-finder tool (http://tlife.fudan.edu.cn/ltr_finder/)  was utilized to identify full-length VL30s. Sequences not corresponding to full-length VL30s were manually annotated as truncated VL30 elements or solo LTRs. The transfer of VL30 genomic coordinates between mouse genome version mm9 and the current one of mm10 was performed by the LiftOver tool of UCSC database.
The BLAST algorithm , in the NCBI database (http://blast.ncbi.nlm.nih.gov/Blast.cgi), was used to locate PSF binding motifs, (based on the two consensus sequences described in ) and the open reading frame (ORF)-finder  to search for potential ORFs in full-length VL30s. The analysis of full-length VL30 LTRs for transcription factor binding sites was performed using the Matinspector tool  in the Genomatix database (www.genomatix.de/matinspector.html). Retroviral Poly-Purine Tract (PPT) and PSF-binding motif consensus sequences for full-length elements were generated using the Weblogo software . The GC-content of genomic regions , adjacent to full-length VL30s, was calculated in the Galaxy platform .
Phylogenetic and nucleotide substitution analysis
The multiple alignment of VL30 5′ LTR regions was performed with the ClustalW2 algorithm  (http://www.ebi.ac.uk/Tools/msa/clustalw2/). The phylogeny of VL30s was inferred using the Maximum Likelihood method, based on the Tamura-Nei nucleotide substitution model , including in our analysis the LTRs of seven known elements (NVL1/2, NVL3, BVL1, VL3, VM1, VL11 and B10). The bootstrap test (500 replicates) was applied to validate phylogeny  and all evolutionary analyses were conducted by the molecular evolutionary genetics analysis 5 (MEGA5) software .
Nucleotide substitution analysis was performed according to a previous study  to calculate the divergence between paired 5′ and 3′ LTRs, as described for HERV retrotransposons . Paired 5′ and 3′ LTR sequences were compared by BLAST alignment  to find differences in nucleotide sequence, as product of spontaneous mutation events during evolution. Based on the number of nucleotide substitutions between paired LTRs we calculated VL30s integration time according to an estimate of the mean mutation rate in the mouse genome, 4.6 × 10−9 mutations per base per year .
Analysis of VL30s genomic distribution
The distribution of VL30s was depicted using the Ensembl Genome Browser  uploading their genomic coordinates. To study VL30 chromosomal distribution, chromosome sizes were obtained from summary tables in UCSC website and the fraction of each chromosome was calculated (by dividing the chromosome size to the total genome size). The expected number of element integrations per chromosome was calculated by multiplying the total number of elements with chromosome fraction, as done in a previous study for SVA elements . The chi-squared test was used to analyze the chromosomal distribution of VL30 elements by comparing the observed with the expected number of VL30 elements, assuming a random insertion model.
The relative distribution of VL30s to Transcription Start Sites (TSS) of mouse genes was analyzed by the Genomic Regions Enrichment of Annotations Tool (GREAT) (great.stanford.edu) , with VL30 coordinates uploaded to UCSC Table Browser . Briefly, GREAT algorithm determines the distance of a given genomic region to the nearest TSS found in a distance less than 1000 Kb. Then it categorizes elements based on the respective distance in groups (0–5 Kb, 5–50 Kb, 50–500 Kb and >500 Kb). Finally, it calculates of association between elements and previously annotated gene groups. The obtained association data were manually confirmed and further examined in the UCSC Genome Browser.
STATISTICA (version 13) was used for statistical analysis. The chi-squared test was used to analyze chromosomal distribution of VL30 elements, based on the expected values from a random distribution model. VL30s and gene TSS association statistics were calculated in the GREAT platform . GREAT performs a binomial test over genomic regions and the hypergeometric test over genes, so that a possible bias from the one test is compensated by the other test. Only the output regions with significant associations from both independent statistical tests were obtained.
expressed sequence tag
Long Terminal Repeat
poly-pyrimidine tract-binding protein-associated splicing factor
transcription factor binding site
Target Site Duplication
transcription start site
viral like 30 elements
This work was supported by a fund from Empirikeion Foundation (6351/2004), Athens, Greece and small institutional funds from University of Ioannina.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921.View ArticlePubMedGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–62.View ArticlePubMedGoogle Scholar
- Goodier JL, Kazazian Jr HH. Retrotransposons revisited: the restraint and rehabilitation of parasites. Cell. 2008;135:23–35.View ArticlePubMedGoogle Scholar
- French NS, Norton JD. Structure and functional properties of mouse VL30 retrotransposons. Biochim Biophys Acta. 1997;1352:33–47.View ArticlePubMedGoogle Scholar
- Nilsson M, Bohm S. Inducible and cell type-specific expression of VL30 U3 subgroups correlate with their enhancer design. J Virol. 1994;68:276–88.PubMedPubMed CentralGoogle Scholar
- Faulkner GJ, Kimura Y, Daub CO, Wani S, Plessy C, Irvine KM, et al. The regulated retrotransposon transcriptome of mammalian cells. Nat Genet. 2009;41:563–71.View ArticlePubMedGoogle Scholar
- Brunmeir R, Lagger S, Simboeck E, Sawicka A, Egger G, Hagelkruys A, et al. Epigenetic regulation of a murine retrotransposon by a dual histone modification mark. PLoS Genet. 2010;6:e1000927.View ArticlePubMedPubMed CentralGoogle Scholar
- Tzavaras T, Eftaxia S, Tavoulari S, Hatzi P, Angelidis C. Factors influencing the expression of endogenous reverse transcriptases and viral-like 30 elements in mouse NIH3T3 cells. Int J Oncol. 2003;23:1237–43.PubMedGoogle Scholar
- Costain WJ, Rasquinha I, Graber T, Luebbert C, Preston E, Slinn J, et al. Cerebral ischemia induces neuronal expression of novel VL30 mouse retrotransposons bound to polyribosomes. Brain Res. 2006;1094:24–37.View ArticlePubMedGoogle Scholar
- Magun BE, Rodland KD. Transient inhibition of protein synthesis induces the immediate early gene VL30: alternative mechanism for thapsigargin-induced gene expression. Cell Growth Differ. 1995;6:891–7.PubMedGoogle Scholar
- Singh K, Saragosti S, Botchan M. Isolation of cellular genes differentially expressed in mouse NIH 3 T3 cells and a simian virus 40-transformed derivative: growth-specific expression of VL30 genes. Mol Cell Biol. 1985;5:2590–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Scolnick EM, Vass WC, Howk RS, Duesberg PH. Defective retrovirus-like 30S RNA species of rat and mouse cells are infectious if packaged by type C helper virus. J Virol. 1979;29:964–72.PubMedPubMed CentralGoogle Scholar
- Dolberg D, Fan H. Further characterization of virus-like 30S (VL30) RNA of mice: initiation of reverse transcription and intracellular synthesis. J Gen Virol. 1981;54:281–91.View ArticlePubMedGoogle Scholar
- Herquel B, Ouararhni K, Martianov I, Le Gras S, Ye T, Keime C, et al. Trim24-repressed VL30 retrotransposons regulate gene expression by producing noncoding RNA. Nat Struct Mol Biol. 2013;20:339–46.View ArticlePubMedGoogle Scholar
- Song X, Sui A, Garen A. Binding of mouse VL30 retrotransposon RNA to PSF protein induces genes repressed by PSF: effects on steroidogenesis and oncogenesis. Proc Natl Acad Sci U S A. 2004;101:621–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang G, Cui Y, Zhang G, Garen A, Song X. Regulation of proto-oncogene transcription, cell proliferation, and tumorigenesis in mice by PSF protein and a VL30 noncoding RNA. Proc Natl Acad Sci U S A. 2009;106:16794–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Adams SE, Rathjen PD, Stanway CA, Fulton SM, Malim MH, Wilson W, et al. Complete nucleotide sequence of a mouse VL30 retro-element. Mol Cell Biol. 1988;8:2989–98.View ArticlePubMedPubMed CentralGoogle Scholar
- Hodgson CP, Fisk RZ, Arora P, Chotani M. Nucleotide sequence of mouse virus-like (VL30) retrotransposon BVL-1. Nucleic Acids Res. 1990;18:673.View ArticlePubMedPubMed CentralGoogle Scholar
- Noutsopoulos D, Vartholomatos G, Kolaitis N, Tzavaras T. SV40 large T antigen up-regulates the retrotransposition frequency of viral-like 30 elements. J Mol Biol. 2006;361:450–61.View ArticlePubMedGoogle Scholar
- Tzavaras T, Kalogera C, Eftaxia S, Saragosti S, Pagoulatos GN. Clone-specific high-frequency retrotransposition of a recombinant virus containing a VL30 promoter in SV40-transformed NIH3T3 cells. Biochim Biophys Acta. 1998;1442:186–98.View ArticlePubMedGoogle Scholar
- Konisti S, Mantziou S, Markopoulos G, Thrasyvoulou S, Vartholomatos G, Sainis I, et al. H (2) O (2) signals via iron induction of VL30 retrotransposition correlated with cytotoxicity. Free Radic Biol Med. 2012;52:2072–81.View ArticlePubMedGoogle Scholar
- Noutsopoulos D, Markopoulos G, Koliou M, Dova L, Vartholomatos G, Kolettas E, et al. Vanadium induces VL30 retrotransposition at an unusually high level: a possible carcinogenesis mechanism. J Mol Biol. 2007;374:80–90.View ArticlePubMedGoogle Scholar
- Markopoulos G, Noutsopoulos D, Mantziou S, Vartholomatos G, Monokrousos N, Angelidis C, et al. Arsenic induces VL30 retrotransposition: the involvement of oxidative stress and heat-shock protein 70. Toxicol Sci. 2013;134:312–22.View ArticlePubMedGoogle Scholar
- Noutsopoulos D, Markopoulos G, Vartholomatos G, Kolettas E, Kolaitis N, Tzavaras T. VL30 retrotransposition signals activation of a caspase-independent and p53-dependent death pathway associated with mitochondrial and lysosomal damage. Cell Res. 2010;20:553–62.View ArticlePubMedGoogle Scholar
- de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7:e1002384.View ArticlePubMedPubMed CentralGoogle Scholar
- Belshaw R, Dawson AL, Woolven-Allen J, Redding J, Burt A, Tristem M. Genomewide screening reveals high levels of insertional polymorphism in the human endogenous retrovirus family HERV-K (HML2): implications for present-day activity. J Virol. 2005;79:12507–14.View ArticlePubMedPubMed CentralGoogle Scholar
- DeBarry JD, Ganko EW, McCarthy EM, McDonald JF. The contribution of LTR retrotransposon sequences to gene evolution in Mus musculus. Mol Biol Evol. 2006;23:479–81.View ArticlePubMedGoogle Scholar
- McCarthy EM, McDonald JF. Long terminal repeat retrotransposons of Mus musculus. Genome Biol. 2004;5:R14.View ArticlePubMedPubMed CentralGoogle Scholar
- French NS, Norton JD. Analysis of retrotransposon families in genomic DNA by two-dimensional restriction mapping: detection of VL30 insertions in mouse thymic lymphoma. Biochim Biophys Acta. 1994;1219:484–92.View ArticlePubMedGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12:996–1006.View ArticlePubMedPubMed CentralGoogle Scholar
- Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, et al. The UCSC genome browser database: update 2011. Nucleic Acids Res. 2011;39:D876–82.View ArticlePubMedPubMed CentralGoogle Scholar
- Huda A, Jordan IK. Analysis of Transposable Element Sequences Using CENSOR and RepeatMasker Bioinformatics for DNA Sequence Analysis. In: Posada D, editor. Humana Press; 2009. p. 323–36. doi:10.1007/978-1-59745-251-9_16.
- Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002;12:656–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–7.View ArticlePubMedGoogle Scholar
- Puschendorf M, Stein P, Oakeley EJ, Schultz RM, Peters AH, Svoboda P. Abundant transcripts from retrotransposons are unstable in fully grown mouse oocytes. Biochem Biophys Res Commun. 2006;347:36–43.View ArticlePubMedGoogle Scholar
- Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Besmer P, Olshevsky U, Baltimore D, Dolberg D, Fan H. Virus-like 30S RNA in mouse cells. J Virol. 1979;29:1168–76.PubMedPubMed CentralGoogle Scholar
- Torrent C, Gabus C, Darlix JL. A small and efficient dimerization/packaging signal of rat VL30 RNA and its use in murine leukemia virus-VL30-derived vectors for gene transfer. J Virol. 1994;68:661–7.PubMedPubMed CentralGoogle Scholar
- Song X, Sun Y, Garen A. Roles of PSF protein and VL30 RNA in reversible gene regulation. Proc Natl Acad Sci U S A. 2005;102:12189–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Cartharius K, Frech K, Grote K, Klocke B, Haltmeier M, Klingenhoff A, et al. MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics. 2005;21:2933–42.View ArticlePubMedGoogle Scholar
- Asch BB, Asch HL. Expression of the retrotransposons, intracisternal A-particles, during neoplastic progression of mouse mammary epithelium analyzed with a monoclonal antibody. Cancer Res. 1990;50:2404–10.PubMedGoogle Scholar
- Heberlein C, Kawai M, Franz MJ, Beck-Engeser G, Daniel CP, Ostertag W, et al. Retrotransposons as mutagens in the induction of growth autonomy in hematopoietic cells. Oncogene. 1990;5:1799–807.PubMedGoogle Scholar
- McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotech. 2010;28:495–501.View ArticleGoogle Scholar
- Davis CM, Constantinides PG, van der Riet F, van Schalkwyk L, Gevers W, Parker MI. Activation and demethylation of the intracisternal A particle genes by 5-azacytidine. Cell Differ Dev. 1989;27:83–93.View ArticlePubMedGoogle Scholar
- Sverdlov ED. Retroviruses and primate genome evolution. Georgetown: Landes Bioscience; 2005.Google Scholar
- Takahata N, Kimura M. A model of evolutionary base substitutions and its application with special reference to rapid change of pseudogenes. Genetics. 1981;98:641–57.PubMedPubMed CentralGoogle Scholar
- Robbez-Masson L, Rowe HM. Retrotransposons shape species-specific embryonic stem cell gene expression. Retrovirology. 2015;12:45.View ArticlePubMedPubMed CentralGoogle Scholar
- Li J, Akagi K, Hu Y, Trivett AL, Hlynialuk CJ, Swing DA, et al. Mouse endogenous retroviruses can trigger premature transcriptional termination at a distance. Genome Res. 2012;22:870–84.View ArticlePubMedPubMed CentralGoogle Scholar
- Matsui T, Leung D, Miyashita H, Maksakova IA, Miyachi H, Kimura H, et al. Proviral silencing in embryonic stem cells requires the histone methyltransferase ESET. Nature. 2010;464:927–31.View ArticlePubMedGoogle Scholar
- Rowe HM, Jakobsson J, Mesnard D, Rougemont J, Reynard S, Aktas T, et al. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature. 2010;463:237–40.View ArticlePubMedGoogle Scholar
- Kunarso G, Chia N-Y, Jeyakani J, Hwang C, Lu X, Chan Y-S, et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet. 2010;42:631–4.View ArticlePubMedGoogle Scholar
- Smith ZD, Chan MM, Mikkelsen TS, Gu H, Gnirke A, Regev A, et al. A unique regulatory phase of DNA methylation in the early mammalian embryo. Nature. 2012;484:339–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Kumar S, Hedges SB. A molecular timescale for vertebrate evolution. Nature. 1998;392:917–20.View ArticlePubMedGoogle Scholar
- Nellaker C, Keane TM, Yalcin B, Wong K, Agam A, Belgard TG, et al. The genomic landscape shaped by selection on transposable elements across 18 mouse strains. Genome Biol. 2012;13:R45.View ArticlePubMedPubMed CentralGoogle Scholar
- Brown JD, Piccuillo V, O’Neill RJ. Retroelement demethylation associated with abnormal placentation in Mus musculus x Mus caroli hybrids. Biol Reprod. 2012;86:88.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.View ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90.View ArticlePubMedPubMed CentralGoogle Scholar
- Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–7.View ArticlePubMedGoogle Scholar
- Goecks J, Nekrutenko A, Taylor J, Team TG. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11:R86.View ArticlePubMedPubMed CentralGoogle Scholar
- Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993;10:512–26.PubMedGoogle Scholar
- Felsenstein J. Confidence-limits on phylogenies - an approach using the bootstrap. Evolution. 1985;39:783–91.View ArticleGoogle Scholar
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011;28:2731–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Tristem M. Identification and characterization of novel human endogenous retrovirus families by phylogenetic screening of the human genome mapping project database. J Virol. 2000;74:3715–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang H, Xing J, Grover D, Hedges DJ, Han K, Walker JA, et al. SVA elements: a hominid-specific retroposon family. J Mol Biol. 2005;354:994–1007.View ArticlePubMedGoogle Scholar
- Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, et al. The UCSC table browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–6.View ArticlePubMedPubMed CentralGoogle Scholar