Virus-like attachment sites as structural landmarks of plants retrotransposons
© The Author(s). 2016
Received: 29 March 2016
Accepted: 7 July 2016
Published: 28 July 2016
The genomic data available nowadays has enabled the study of repetitive sequences and their relationship to viruses. Among them, long terminal repeat retrotransposons (LTR-RTs) are the largest component of most plant genomes, the Gypsy and Copia superfamilies being the most common. Recently it has been found that Del lineage, an LTR-RT of Gypsy superfamily, has putative virus-like attachment (vl-att) sites. This signature, originally described for retroviruses, is recognized by retroviral integrase conferring specificity to the integration process.
Here we retrieved 26,092 putative complete LTR-RTs from 10 lineages found in 10 fully sequenced angiosperm genomes and found putative vl-att sites that are a conserved structural landmark across these genomes. Furthermore, we reveal that each plant genome has a distinguishable LTR-RT lineage amplification pattern that could be related to the vl-att sites diversity. We used these patterns to generate a specific quick-response (QR) code for each genome that could be used as a barcode of identification of plants in the future.
The universal distribution of vl-att sites represents a new structural feature common to plant LTR-RTs and retroviruses. This is an important finding that expands the information about the structural similarity between LTR-RT and retroviruses. We speculate that the sequence diversity of vl-att sites could be important for the life cycle of retrotransposons, as it was shown for retroviruses. All the structural vl-att site signatures are strong candidates for further functional studies. Moreover, this is the first identification of specific LTR-RT content and their amplification patterns in a large dataset of LTR-RT lineages and angiosperm genomes. These distribution patterns could be used in the future with biotechnological identification purposes.
KeywordsLTR-RTs Angiosperm genomes vl-att site Retrotransposons
Since the genome of Arabidopsis thaliana was sequenced in 2000, 55 other plant genomes have been released and published [1, 2]. This has advanced our understanding of genome composition, such as the discovery that repetitive sequences are major constituents of most genomes . Among these repetitive sequences are the transposable elements (TEs), which are mobile genetic sequences present in plants and in all eukaryotes. TEs comprise approximately 45 % of the human genome and form the vast majority of the total DNA content of most plant genomes, in some cases reaching close to 80 % [4–6].
The predominant TE found in plant genomes is the long terminal repeat retrotransposons (LTR-RTs). For example, it represents ~79 % of the maize (~2.3 Gb total) and ~55 % of the sorghum (~730 Mb total) genomes [7–11]. Based on sequence similarities and on the structural/domains organization, LTR-RTs are divided into two major superfamilies: the Gypsy and the Copia . Phylogenetic analysis of the reverse transcriptase domain revealed that the Gypsy superfamily is divided into five lineages, namely Athila, CRM, Del, Galadriel, and Reina, while the Copia superfamily is divided into six lineages (Ale, Angela, Bianca, Ivana, Maximus, and Tar) [12–14]. It has been shown by coding sequence and structural similarities that LTR-RTs are related to retroviruses , it has been suggested that retroviruses evolved from the Gypsy superfamily after acquisition of the envelope gene .
Our research on the relationship between retroviruses and LTR-RTs has recently revealed that Del has putative virus-like attachment (vl-att) sites in its LTRs [17–19]. The LTRs are direct repeat sequences located at the 5′ and 3′ ends of the LTR-RT elements containing the regulatory information of the LTR-RT such as promoters, enhancers and termination signals . The att sites were originally described in retroviruses as sequences recognized by retroviral integrase to confer specificity to the integration process [17, 18, 21]. We questioned whether vl-att sites are specific to the Del lineage or are conserved structural landmarks across plant LTR-RTs and, therefore, a new structural feature common to plant LTR-RTs and retroviruses. To study this hypothesis, we retrieved all the putative complete elements, a total of 26,092 elements, from the other LTR-RTs lineages present in the 10 angiosperm genomes used previously to study the Del lineage .
The present study supports the existence of structural vl-att sites in nine out of 10 LTR-RT lineages of 10 angiosperm genomes. We also propose a multivariable genome-specific LTR-RTs “barcode” signature for the putative complete LTR-RTs content and their differential amplification pattern to identify each genome analyzed. The differential amplification patterns found could be related to the vl-att sites diversity we discovered. To our knowledge such a wide landscape of LTR-RT and angiosperm genomes was never considered to reveal, simultaneously, the existence of structural vl-att site signatures and the genome-LTR-lineage amplification patterns that we describe herein.
Results and discussion
Establishing a conserved structural retrovirus landmark on plant retrotransposons: the virus-like attachment sites (vl-att)
Total copy-number of putative complete LTR-retrotransposons identified in each genome and classified according to lineage
Putative complete elements copy-number by lineage
Total copy number per genome
Genome size database (MB)
% GC content per genome
Total copy number per lineage
Figure 1 displays the conserved regions and the similarities identified along the putative vl-att sites. Four of the studied lineages presented a clear segment of high similarity that established the length of the structural vl-att sites hereby described: Ale (7 bp-6 bp), Bianca (13 bp-13 bp), Ivana (5 bp-6 bp) and Reina (5 bp-7 bp). The Tar and Athila lineages exhibited a conserved nucleotide stretch of five bases and an additional conserved nucleotide outside this region. Our results are compatible with the length reported for the structural vl-att sites from the Del lineage (10 bp-11 bp) . Long segments presenting high similarity levels were detected in Angela (18 bp-10 bp), Maximus (16 bp-5 bp), and CRM (12 bp-10 bp), making it more difficult to establish the correct length of the structural vl-att sites of these lineages. The criterion used to delimit these long structural vl-att sites is the presence of a maximum of two gaps, not longer than two nucleotides, in the high-similarity region.
The structural vl-att sites are conserved across all the angiosperm genomes and across all the 10 retrotransposons lineages analyzed (Fig. 1 and Additional file 1: Figure S1). Ale, Bianca, Ivana and Reina structural vl-att sites are highly conserved across the analyzed genomes with only minor nucleotide and size differences (ranging from 1pb to 3 bp), except for the Zea mays genome (Fig. 1 and Additional file 1: Figure S1). In the Zea mays genome, Bianca and Ivana lineages display putative vl-att sites with a longer similarity region (40 bp) than the average length described herein for the other lineages (Additional file 1: Figure S1). Twenty-four copies in Bianca and 138 copies in Ivana lineages support these structural vl-att sites (Table 1).
The Athila and Tar lineages presented less homogeneous lengths (differences greater than 3 bp) between their structural vl-att sites general signature (Fig. 1) and the specific structural vl-att sites of some specific genomes and plant groups (Additional file 1: Figure S1). Finally, although the elements with long high-similarity regions (detected in the Angela, Maximus and CRM lineages) varied in length among the genomes and plant groups, most of the nucleotides included in these regions were conserved (Additional file 1: Figure S1). These are interesting results because they indicate that some structural vl-att sites are not only lineage specific but also lineage-genome specific. All the putative vl-att site signatures presented herein are strong candidates for further functional studies. Genome-specific analysis was not possible for genomes carrying a lineage with a low copy number of complete LTR-RT elements (≤8 copies; see Table 1 for details).
To our knowledge, this is the first report indicating that structural vl-att landmarks are not of Del lineage particularity since nine out of 10 LTR-RT lineages studied also display them. The Galadriel lineage was not considered in our study due to its low copy number (43 copies) and restricted distribution. The number of putative complete elements used varied from 156 to 12,049 per lineage (Table 1). The sample validation of these genomes, which will be discussed in the next section, and the significant similarity of the alignments showed by the PlotCon analyses support the notion of structural vl-att sites landmarks. Six structural vl-att sites are clearly short as was the already described Del structural vl-att sites, while other three could have extended length. Because the structural vl-att sites described herein are specific in length and nucleotide composition for each lineage, it is possible that they have a role in retrotransposon speciation and life cycle. Moreover, they may be responsible for the differential amplification pattern of these lineages in the studied genomes, as the ones that will be shown in the next section of this work.
Our study highlights the presence of putative vl-att sites along LTR-RTs in plants, these are specific to each lineage and in some cases also to each genome, and warrants further research on the importance of the vl-att sites for each lineage integrase recognition specificity in the LTR-RTs replication cycle. Indeed, the specificity to the integration process conferred by the recognition of att sites by the retroviral integrase is reported for retroviruses [18, 21] and should be clarified in retrotransposons. Moreover, it would be interesting to investigate the presence of vl-att sites in genomes other than plants.
Exploring LTR-RT amplification patterns that might be linked to the diversity of structural virus-like attachment sites (vl-att)
We postulated that lineage-specific vl-att site signatures could have functional implications for the amplification of LTR-RT elements. For instance, att sequences of retroviruses are recognized by the retroviral integrase to confer specificity to the integration process [17, 18, 21]. To test this hypothesis, we analyzed the amplification pattern of the 28,622 putative complete LTR-RT elements used in the vl-att site analyses. These elements were categorized as matching one of the six Copia or one of the four Gypsy lineages (Table 1). This classification was performed using hmmer alignment against previously described Hidden Markov Model (HMM) profiles, which were created using alignments of lineages reverse transcriptase amino acids . Table 1 also includes the 2530 elements of the Del lineage (Gypsy) used herein for comparative purposes .
Bianca (Copia) and Galadriel (Gypsy) lineages are poorly represented in the analyzed genomes, totaling 199 copies. The monocot Brachypodium dystachyon and the eudicot Arabidopsis thaliana are the genomes with the lowest copy-numbers of putative complete LTR-RT elements (Fig. 2 and Table 1).
The more frequent occurrence of high copy-numbers of LTR-RTs found in some grasses genomes (e.g., Zea mays and Sorghum bicolor) and the presence of low copy-numbers observed in monocot and eudicot plant groups (e.g., Brachypodium dystachyon and Arabidopsis thaliana) are in accordance with previous studies employing complete and non-complete LTR-RTs elements. These previous studies only used some of the genomes or lineages analyzed herein [8, 9, 13, 31]. Furthermore, the copy-number reported here for the Copia superfamily (ordered from the most to the least frequently represented lineages: Athila, Maximus, Del, Ale) corroborates with recent studies [12, 32], one of which used fluorescent in situ hybridization to analyze lineages from both Copia and Gypsy superfamilies using complete and non-complete LTR-RT elements . Therefore, we believe that the LTR-RTs sampling performed here with the LTR_STRUC software was effective and has allowed us to expand the current understanding about the amplification of the LTR-RT lineages among the genomes studied, regardless of the software structural analyses that enriches the sampling with recent events of amplification.
Normalized number of putative complete LTR-retrotransposons identified in each genome and classified by lineage
Putative complete elements genome contribution by lineage (%)
In other cases, the normalized and non-normalized data (Table 2 and 1, respectively) were coincident, as for the three Copia superfamily lineages that showed to be important size contributors to some of the genomes (Ale 43 % - Vitis vinifera, Angela 28.2 % - Setaria italica and Maximus 36 % - Zea mays). While in the three Gypsy lineages that proved to be important size contributors (Athila, CRM and Del), only CRM in Glycine max showed the same profile after normalization. Thus, the lineage genome-contribution signature for these four cases is maintained not only as “total copy-number” but also as a lineage contribution to the LTR-RTs genome content (Tables 1 and 2).
Furthermore, the Gypsy superfamily is more represented in the studied plant genomes than the Copia superfamily, both in terms of “total copy-number” and as the major contributor to the LTR-RTs content (normalized data not shown). This is confirmed by previous studies using complete and non-complete LTR-RTs elements and analyzing up to a maximum of three different plant genomes, but never in the complete angiosperm and lineages dataset explored herein [8, 11, 12, 14]. Once again, these data validates the sampling of LTR-RTs of the studied genomes using the LTR_STRUC software. The copy-number ratios of these superfamilies were also shown for the apple tree Malus domestica genome using dot blot hybridizations . However, our normalized data showed that Copia lineages contribute most to the LTR-RTs content of the eudicot species, whereas the Gypsy lineages contribute most to the LTR-RTs content of the studied monocot species (Fig. 3a and Table 2).
LTR-RT elements are widely and abundantly present in plant genomes and have been implicated in their evolution [7–9, 30]. Here we present the LTR-RTs amplification as a function of the “total copy-number” and quantified the relative contribution of each lineage to the content of LTR-RTs of each genome through data normalization (Table 2 and Fig. 3a). We focused on putative complete LTR-RTs insertions and did not consider the copies affected by recombination and decay, which are common events on the elements’ life cycle. Nevertheless, our “total copy-number” ratios (Gypsy vs. Copia) matched the data presented in previous studies considering complete and incomplete LTR-RTs copies, which also represent different stages of the elements’ life cycle [27–30].
The data presented above suggest that the studied LTR-RTs lineages have a particular amplification pattern in each of the genomes, which may be linked to the diversity of the putative vl-att sites found. The normalized data simplified the comparison of the LTR-RTs amplification patterns, because it considered the contribution to the LTR-RTs content in each genome instead of the raw “total copy-number” (Fig. 3a and Table 2). It allowed us to propose a multivariable genome-specific LTR-RTs “barcode” signature, which gives an overview of the putative complete LTR-RTs content and their differential amplification in the studied genomes (Fig. 3a, b and Table 2). For instance, the barcode offered an easy way to identify the importance of the Ale lineage to the LTR-RTs content in Populus trichocarpa and Vitis vinifera, the latter being the only perennial species used in our study. It also indicated that Athila is an important component of the LTR-RTs content for most of the studied genomes (Fig. 3a). To our knowledge, this is the first comparative analysis of specific LTR-RT content and their amplification patterns in a large dataset of fully sequenced angiosperm genomes, allowing a deeper understanding of the relationship between these lineages and these genomes as never before.
Based on our normalized data we generated specific identification QR-code for each genome that can be revealed using a common cell-phone QR-code scanner (Fig. 3b). The effective contribution of the proposed LTR-RTs-barcode depends on the capacity to distinguish between plant species even more closely related. However, the closest species used in this study, in terms of evolutionary distances, are Zea mays and Sorghum bicolor (11.9 million years ago – Mya) . The LTR-RTs-barcode differences between these species were readily detected herein. Further research will be needed to confirm the effectiveness of the proposed barcode system using genomes with smaller evolutionary distances. The likelihood is high because studies using closely related plant species have shown differential amplification of genomic LTR-RTs [27, 35–37]. Our LTR-RTs barcode system is based on data not explored before, the diversity of putative vl-att site signatures and the differential amplification pattern of 11 LTR-RT lineages in 10 fully-sequenced genomes. The QR-code proposed here illustrates how this concept could be used in the future as a biotechnological tool for identification of commercially valuable cultivars especially given that the cost of genome sequencing is reducing faster than expected by the Moore’s Law .
Analysis of 26,092 putative complete elements representing 10 LTR-RT lineages of 10 different angiosperm genomes allowed us to find putative vl-att sites in nine out of 10 lineages. The present study is the first to show that vl-att sites are structural conserved landmarks in LTR-RTs across distantly related angiosperms. This is an important finding that expands the information about the structural similarity between LTR-RT and retroviruses. We speculate that the sequence diversity of vl-att sites may be important for the life cycle of retrotransposon and amplification patterns of these lineages in the genomes of angiosperms analyzed herein. Future functional studies of these sequences are necessary to test this hypothesis. Here we reveal three distinct patterns in the structural vl-att sites: (i) four lineages (Ale, Reina, Bianca and Ivana) have minor nucleotide differences among their sequence regardless of the angiosperm genome considered (ii) two lineages (Athila and Tar) display marked differences and (iii) three lineages (Angela, Maximus and CRM) with long structural vl-att varied widely in size but little in nucleotide sequence.
The current study also describes the amplification patterns of the 10 LTR-RTs lineages along these plant genomes using a methodology that allows novel observations such as the grasses genomes carry more putative complete LTR-RTs than the other studied genomes. Also, “total” vs “relative” abundance illustrates the singularity of LTR-RT amplification pattern in each genome. Finally, from our data a specific QR-code identification system was derived for each of the angiosperm genomes that can be used with a common cell-phone QR-code reader. The QR-code proposed may have biotechnological applications in the identification of commercially valuable cultivars.
Element extraction and classification
Ten fully sequenced genomes (A. thaliana - At - AtGDB171/TAIR9 – GenBank current version is TAIR10 at GCA_000001735.1, M. truncatula - Mt – Mt3.5 – GenBank current version is MedtrA17_4.0 at GCA_000219495.2, P. trichocarpa - Pt - Ptr v2.2 – GenBank current version is Poptr2_0 at GCA_000002775.2, V. vinifera - Vv - Genoscope 12X – same GenBank current version at GCA_000003745.2, G. max - Gm – Glyma1 – GenBank current version is Glycine_max_v2.0 at GCA_000004515.3, B. distachyon - Bd – Version1 – GenBank current version is Brachypodium_distachyon_v2.0 at GCA_000005505.2, O. sativa - Os – Release 7 – GenBank current version is Build 4.0 at GCA_000005425.2, S. italica – Si - JGI 8x v2 Sitalica_164 – GenBank current version is Setaria_italica_v2.0 at GCA_000263155.2, S. bicolor - Sb - JGI Sbi1 – GenBank current version is Sorbi1 GCA_000003195.1 and Z. mays - Zm - B73_RefGen_v2 – GenBank current version is B73 RefGen_v3 at GCA_000005005.5) were downloaded (11/25/2011) from the plandGDB ftp website . The complete genome sequences were split into sequences from individual chromosomes and screened using LTR_STRUC  with default parameters. Hidden Markov Model (HMM) profiles were built using the HMMER package (version 2.3.2) based on reverse transcriptase amino acid alignments as previously described . Extracted sequences were conceptually translated in all six frames and subjected to HMMscan (HMMER 2.3.2 package) against the HMM profiles, with an e-value cut-off at 1e−10. All sequences were classified into lineages  according to the best hit. Further analyses were performed only on complete putative elements, which were defined as elements with two intact LTRs found by the LTR_STRUC software. Using our normalized data results, we generated a specific QR identification code for each genome, using the Barcode generator online tool (http://www.barcode-generator.org/). A local database was built at GaTE lab (https://gate.ib.usp.br/GateWeb/) and sequences are available upon request.
Identifying structural virus-like attachment (vl-att) sites
Two conserved regions were identified along most LTR-RT lineages by examining alignments of all sequences in Jalview (version 2.4.0.b2) using the option “color per conserved sites” : one at the 5’ end of the LTR and a second at the 3′ end of the LTR,. The first and last 40 bases of the LTRs were submitted to WebLogo  and PlotCon, both of which are part of the EMBOSS Molecular Biology software analysis package (6.3.1) , to examine and plot the sequence conservation analysis results. The PlotCon algorithm represents the alignment quality quantification, helping to determine the relevant extension of each putative vl-att site. When the conservation exceeded 40 bp, 150pb was used. Nevertheless, alignment-quality gaps were found in the structural vl-att sites. To detect the strongest candidates, we selected structural vl-att sites with a maximum of two quality-gaps per sequence and a maximum of two nucleotides of quality-gap extension.
Hidden markov model
Virus like attachment site
This manuscript was reviewed by a professional science editor and by a native English-speaking copy editor to improve readability.
We gratefully acknowledge funding from FAPESP-BIOEN (08/52074-0) and CNPq (308197/2010-0) to MAVS. EAOC (12/21064-4), GMQC (08/58243-8) and AVP (12/02671-7) are supported by FAPESP fellowships. EAOC; GMQC and AVP designed and performed the experiments. EAOC and MAVS analyzed the data and wrote the manuscript.
EAOC carried out the bioinformatic studies, participated in the sequence alignment and drafted the manuscript. EAOC and GMQC participated in the design of the study and retrieved the sequences used in this study. EAOC, GMQC and APV participated in the sequence alignment and performed the sequence analysis. EAOC, GMQC & MAVS conceived the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.
The authors have no competing interests, or other interests that might be perceived to influence the results and/or discussion reported in this manuscript.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Kaul S, Koo HL, Jenkins J, Rizzo M, Rooney T, Tallon LJ, Feldblyum T, Nierman W, Benito MI, Lin XY, et al. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815.View ArticleGoogle Scholar
- Michael TP, Jackson S. The first 50 plant genomes. Plant Genome. 2013;6. https://dl.sciencesocieties.org/publications/tpg/articles/6/2/plantgenome2013.03.0001in.
- Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–82.View ArticlePubMedGoogle Scholar
- Ravindran S. Barbara McClintock and the discovery of jumping genes. Proc Natl Acad Sci U S A. 2012;109:20198–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Muotri AR, Marchetto MCN, Coufal NG, Gage FH. The necessary junk: new functions for transposable elements. Hum Mol Genet. 2007;16:R159–67.View ArticlePubMedGoogle Scholar
- Mills RE, Bennett EA, Iskow RC, Devine SE. Which transposable elements are active in the human genome? Trends Genet. 2007;23:183–91.View ArticlePubMedGoogle Scholar
- Paterson AH, Bowers JE, Feltus FA, Tang H, Lin L, Wang X. Comparative genomics of grasses promises a bountiful harvest. Plant Physiol. 2009;149:125–31.View ArticlePubMedPubMed CentralGoogle Scholar
- Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, et al. The Sorghum bicolor genome and the diversification of grasses. Nature. 2009;457:551–6.View ArticlePubMedGoogle Scholar
- International Rice Genome Sequencing P. The map-based sequence of the rice genome. Nature. 2005;436:793–800.View ArticleGoogle Scholar
- Hansey CN, Vaillancourt B, Sekhon RS, de Leon N, Kaeppler SM, Buell CR. Maize (Zea mays L.) genome diversity as revealed by RNA-sequencing. PLoS One. 2012;7:e33071.View ArticlePubMedPubMed CentralGoogle Scholar
- Baucom RS, Estill JC, Chaparro C, Upshaw N, Jogi A, Deragon JM, Westerman RP, Sanmiguel PJ, Bennetzen JL. Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genet. 2009;5:e1000732.View ArticlePubMedPubMed CentralGoogle Scholar
- Domingues DS, Cruz GM, Metcalfe CJ, Nogueira FT, Vicentini R, Alves Cde S, Van Sluys MA. Analysis of plant LTR-retrotransposons at the fine-scale family level reveals individual molecular patterns. BMC Genomics. 2012;13:137.View ArticlePubMedPubMed CentralGoogle Scholar
- Wicker T, Keller B. Genome-wide comparative analysis of copia retrotransposons in Triticeae, rice, and Arabidopsis reveals conserved ancient evolutionary lineages and distinct dynamics of individual copia families. Genome Res. 2007;17:1072–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Llorens C, Futami R, Covelli L, Dominguez-Escriba L, Viu JM, Tamarit D, Aguilar-Rodriguez J, Vicente-Ripolles M, Fuster G, Bernet GP, et al. The Gypsy Database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Res. 2011;39:D70–4.View ArticlePubMedGoogle Scholar
- Bousios A, Darzentas N. Sirevirus LTR retrotransposons: phylogenetic misconceptions in the plant world. Mob DNA. 2013;4:9.View ArticlePubMedPubMed CentralGoogle Scholar
- Pelisson A, Teysset L, Chalvet F, Kim A, Prud’homme N, Terzian C, Bucheton A. About the origin of retroviruses and the co-evolution of the gypsy retrovirus with the Drosophila flamenco host gene. Genetica. 1997;100:29–37.View ArticlePubMedGoogle Scholar
- Chiu R, Grandgenett DP. Avian retrovirus DNA internal attachment site requirements for full-site integration in vitro. J Virol. 2000;74:8292–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Brown HE, Chen H, Engelman A. Structure-based mutagenesis of the human immunodeficiency virus type 1 DNA attachment site: effects on integration and cDNA synthesis. J Virol. 1999;73:9011–20.PubMedPubMed CentralGoogle Scholar
- Cruz GM, Metcalfe CJ, de Setta N, Cruz EA, Vieira AP, Medina R, Van Sluys MA. Virus-like attachment sites and plastic CpG islands:landmarks of diversity in plant Del retrotransposons. PLoS One. 2014;9:e97099.View ArticlePubMedPubMed CentralGoogle Scholar
- Lisch D. How important are transposons for plant evolution? Nat Rev Genet. 2013;14:49–61.View ArticlePubMedGoogle Scholar
- Masuda T, Kuroda MJ, Harada S. Specific and independent recognition of U3 and U5 att sites by human immunodeficiency virus type 1 integrase in vivo. J Virol. 1998;72:8396–402.PubMedPubMed CentralGoogle Scholar
- McCarthy EM, McDonald JF. LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics. 2003;19:362–7.View ArticlePubMedGoogle Scholar
- Kumar A, Bennetzen JL. Plant retrotransposons. Annu Rev Genet. 1999;33:479–532.View ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90.View ArticlePubMedPubMed CentralGoogle Scholar
- Olson SA. EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. Brief Bioinform. 2002;3:87–91.View ArticlePubMedGoogle Scholar
- Kang SY, Ahn DG, Lee C, Lee YS, Shin CG. Functional nucleotides of U5 LTR determining substrate specificity of prototype foamy virus integrase. J Microbiol Biotechnol. 2008;18:1044–9.PubMedGoogle Scholar
- Estep MC, DeBarry JD, Bennetzen JL. The dynamics of LTR retrotransposon accumulation across 25 million years of panicoid grass evolution. Heredity (Edinb). 2013;110:194–204.View ArticleGoogle Scholar
- Petrov DA. Evolution of genome size: new approaches to an old problem. Trends Genet. 2001;17:23–8.View ArticlePubMedGoogle Scholar
- Sun C, Shepard DB, Chong RA, Lopez Arriaza J, Hall K, Castoe TA, Feschotte C, Pollock DD, Mueller RL. LTR retrotransposons contribute to genomic gigantism in plethodontid salamanders. Genome Biol Evol. 2012;4:168–83.View ArticlePubMedGoogle Scholar
- Vitte C, Panaud O. LTR retrotransposons and flowering plant genome size: emergence of the increase/decrease model. Cytogenet Genome Res. 2005;110:91–107.View ArticlePubMedGoogle Scholar
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–5.View ArticlePubMedGoogle Scholar
- Du J, Tian Z, Hans CS, Laten HM, Cannon SB, Jackson SA, Shoemaker RC, Ma J. Evolutionary conservation, diversity and specificity of LTR-retrotransposons in flowering plants: insights from genome-wide analysis and multi-specific comparison. Plant J. 2010;63:584–98.View ArticlePubMedGoogle Scholar
- Sun HY, Dai HY, Zhao GL, Ma Y, Ou CQ, Li H, Li LG, Zhang ZH. Genome-wide characterization of long terminal repeat -retrotransposons in apple reveals the differences in heterogeneity and copy number between Ty1-copia and Ty3-gypsy retrotransposons. J Integr Plant Biol. 2008;50:1130–9.View ArticlePubMedGoogle Scholar
- Swigonova Z, Lai J, Ma J, Ramakrishna W, Llaca V, Bennetzen JL, Messing J. Close split of sorghum and maize genome progenitors. Genome Res. 2004;14:1916–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Piednoel M, Carrete-Vega G, Renner SS. Characterization of the LTR retrotransposon repertoire of a plant clade of six diploid and one tetraploid species. Plant J. 2013;75:699–709.View ArticlePubMedGoogle Scholar
- Piednoel M, Aberer AJ, Schneeweiss GM, Macas J, Novak P, Gundlach H, Temsch EM, Renner SS. Next-generation sequencing reveals the impact of repetitive DNA across phylogenetically closely related genomes of Orobanchaceae. Mol Biol Evol. 2012;29:3601–11.View ArticlePubMedGoogle Scholar
- Hosid E, Brodsky L, Kalendar R, Raskina O, Belyayev A. Diversity of long terminal repeat retrotransposon genome distribution in natural populations of the wild diploid wheat Aegilops speltoides. Genetics. 2012;190:263–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Hayden EC. Technology: The $1,000 genome. Nature. 2014;507:294–5.View ArticlePubMedGoogle Scholar
- Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V. PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res. 2008;36:D959–65.View ArticlePubMedGoogle Scholar
- Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–91.View ArticlePubMedPubMed CentralGoogle Scholar