It is widely accepted that the expansion of Alu elements in primate genomes has occurred by using the L1 element enzymatic machinery for retrotransposition [1, 5]. The identification of retrotranspositionally competent L1 elements is relatively straightforward as only full-length elements having both open reading frames completely intact are capable of propagation via TPRT . Only a limited number of L1 elements meet these criteria as the vast majority of L1s in primate genomes are truncated or have other disabling mutations . The identification of potentially active Alu source elements is far more complicated because the majority of Alu elements are full-length and they do not contain a coding sequence. Recent research has investigated several structural features that influence the ability of Alu elements to replicate. These include the upstream flanking sequence, the integrity of the left monomer, the sequence identity to a known polymorphic subfamily, the distance of the Pol III termination signal from the 3' end of the element and the length and integrity of the poly(A) tail. A discussion of these factors supports the candidacy of our Chr7 Alu insertion as an ancestral source element.
The upstream flanking sequence of an Alu element has been reported to influence transcription ability [26–28]. The Chr7 Alu element reported here has what appears to be an intact TATA box (5'TATAAAAA3') cis regulatory transcription promoter immediately upstream to the 5' TSD that is conserved in all species (Additional file 1: Figure S1). Although a TATA box is typically about 25 bp upstream of a transcription site and is usually the binding site for RNA polymerase II , TATA-box-like promoter sequences have been linked to the efficient transcription of the Alu-like human 7SL RNA gene by RNA polymerase III in vitro. In addition, the presence of a 7SL sequence upstream has been shown to increase Alu transcription . However, RepeatMasker  analysis indicates that the upstream flanking sequence of this Alu element is not a 7SL sequence but rather an ancient DNA transposon classified as a hAT-Charlie. Therefore, an alternative theory is that the 5'TATAAAAA3' sequence is not a functional TATA box but rather a simple variant of the classical TTTTAAAA or TTAAAA endonuclease cleavage site of L1 that is considered the preferred insertion site for Alu elements [32, 33]. The potential role of this upstream sequence in the retrotransposition ability of this Alu element is not clear. However, the rhesus macaque genome (rheMac2) has a different sequence at this homologous position, 5'TATCAAAA3', and also does not have the Alu insertion.
Another factor determined to be critical for Alu replication is the structural integrity of the internal RNA Pol III promoter A and B boxes [4, 7, 34]. Two protein components of the signal recognition particle (SRP 9p and 14p) are believed to bind to specific Alu sequences during L1-mediated TPRT  and these SRP9/14 binding sites in the left monomer are required for Alu activity [4, 7, 34]. Bennett and colleagues demonstrated experimentally that mutating the SRP9/14 binding site in the left monomer reduced Alu mobilization efficiency to only 12% of normal, whereas a similar mutation in the right monomer, while also decreasing SRP9/14 binding, produced only a moderate decrease in retrotransposition efficiency, suggesting that an intact left monomer is more important for Alu mobilization . The Chr7 Alu reported here has a completely conserved left monomer in orangutans, even though it is relatively old.
The degree of sequence variation between a candidate Alu 'master' element and a known polymorphic subfamily has also been reported to impact mobilization efficiency . The O:Chr7 progenitor Alu element in the orangutan appears to have only two random substitutions that do not appear evident in its proposed progeny, and both variants are located in the right monomer. The first single nucleotide substitution is a CpG mutation at position 154 that is present in the ponAbe2 genome assembly (Figure 2) but does not completely segregate in all the orangutans we sequenced at this locus. Bornean orangutan KB5405 exhibited the original cytosine nucleotide at this position in all the clones we sequenced (Additional file 1: Figure S1). It is known that about 30% of all CpG sites reside within Alu elements  and that CpG sites have six to ten times faster mutation rates than non-CpG sites [37–39], increasing the potential for independently occurring random mutation events. The second single nucleotide substitution in O:Chr7 is a relatively recent C to T transition at position 247 that also does not completely segregate in all the orangutans we tested. It is completely absent from the Bornean orangutans (they all have the ancestral cytosine nucleotide) and remains polymorphic with an allele frequency of 50% in the tested Sumatran orangutans (Additional file 1: Figure S1). The overall lack of sequence divergence (< 1%) between the ancestral O:Chr7 Alu element and the consensus sequence of the young polymorphic Alu Ye5b5_Pongo subfamily in orangutan strongly supports its candidacy as the founder element from which the young subfamily derived.
The human Chr7 Alu element appears to have three substitutions that are not present in the H:Chr3 Alu insertion, a CpG mutation at position 239 and two transversions: A to T at position 94 and T to G at position 173 (Figure 2). However, it is entirely possible, even probable, that all three substitutions occurred after the insertion in the H:Chr3 locus. The presence of a guanine residue at position 173 coincides with the consensus sequence of the human Alu Yf5 subfamily  and represents a single difference from the Alu Ye5 subfamily consensus sequence [22, 23]. Although the Alu Yf5 subfamily was likely mobilizing in primate genomes around the same time, based on the sequence structure of the locus it is unlikely that the human Chr7 Alu insertion contributed to the proliferation of this subfamily.
Another factor influencing Alu activity is the distance of the Pol III TTTT termination signal from the 3' end of the element. Comeaux and colleagues used Alu A tail constructs to experimentally determine the effect of various 3' end lengths on Alu mobilization . They reported a strong decrease in Alu retrotransposition ability even with little sequence between the end of the A tail and the Pol III terminator. The Chr7 Alu element reported here has the Pol III transcription terminator (TTTT) in the 3' TSD immediately following the A tail, a characteristic associated with mobilization ability.
The length of the poly(A) tail has also been reported to influence Alu retrotransposition activity, with longer A-tails free of nucleotide substitutions being more characteristic of young active source elements [8, 40]. Mobilization ability in an ex vivo assay is reportedly very limited with a poly(A) tail less than 15 bp (base pairs) and increases thereafter to plateau at about 50 bp . Under endogenous conditions, there appears to be only a modest benefit to Alu retrotransposition efficiency once the poly(A) tail exceeds about 20 bp . The human Chr7 Alu element has a poly(A) tail length of 26 bp with two nucleotide substitutions and the O:Chr7 Alu element has a poly(A) tail length of 27 bp with three nucleotide substitutions (Additional file 1: Figure S1). These poly(A) tail lengths are consistent with possible activity. In addition, the youngest orangutan Alu progeny element in this study (O:Chr17:56932716) displays a perfect 30 bp poly(A) tail (Additional file 3), consistent with the literature. Because older Alu elements tend to have less pristine poly(A) tails compared to younger elements, Comeaux and colleagues used Alu A tail constructs to experimentally determine the impact of A tail disruptions on retrotransposition efficiency . They demonstrated that nucleotide disruptions within the poly(A) tail are not created equal, in that adenine to thymine disruptions were relatively well tolerated with regard to maintaining the integrity of Alu mobilization, whereas nucleotide disruptions by cytosine or guanine resulted in greater impairment to retrotransposition efficiency . The human Chr7 Alu element has a cytosine A tail disruption after 14 A-residues and a second one after 20 A-residues, perhaps impairing its current ability to propagate new copies. The O:Chr7 A tail has acquired a double cytosine (CC) mutation after only 10 A-residues and a third after only 16 A-residues (Additional file 1: Figure S1). These mutations may have rendered this ancestral Alu source element currently inactive.
With the exception of poly(A) tail disruptions, which may have occurred relatively recently, the ancestral Chr7 Alu insertion reported here possesses many of the classical hallmarks of being retrotranspositionally competent. Alu elements, like other retrotransposons, typically acquire nucleotide substitutions at a neutral rate after insertion . Consequently, older elements tend to have a greater number of mutations (on average) than younger insertions. These acquired nucleotide substitutions often alter their ability to mobilize . The Chr7 Alu reported here has remained highly conserved, especially in orangutans, even though it is approximately 16 million years old. This prompted us to speculate whether this element avoided the typical accumulation of random mutations because of its location in the DGKB gene or simply by chance.
According to the UCSC Genome Browser [18, 19] Gene Sorter function, the human DGKB gene is about 693 kb long (693,643 bp), of which only 2,415 bp is coding sequence (< 0.35%), comprising 804 amino acids distributed among 25 exons. Twenty-four introns make up the vast majority of the gene sequence. Zhang and colleagues  recently reported that, although Alu density is quite low in exons of genes (selected against), the Alu density in introns of genes is similar to the Alu density in intergenic regions of the genome, suggesting a similar selective pressure (essentially neutral). The DGKB gene has no Alu element insertions within its exons or promoter regions, but has 113 Alu element insertions within introns as identified by the TranspoGene database [43, 44]. Of the 113 Alu elements located within the gene, 19 were identified as Alu Y or younger, including the Alu element from this study, which is located in intron 20 of 24. We screened the other 18 Alu Y elements to find those with the same species distribution to the Alu element in this study and therefore expected to be of similar age. We selected seven full-length (> 275 bp) Alu Y insertions from the DGKB gene that are shared in human, chimpanzee and orangutan, while absent from the rhesus macaque genome. We constructed a sequence alignment of these Alu elements from hg18 and ponAbe2 to compare their percent divergence from the consensus sequence compared to our element. The percent divergence of the human and orangutan Alu Y insertions was 6.1 ± 1.4 and 8.6 ± 1.4, respectively, compared to 3.6 and 3.3 respectively for the Alu element in this study (Additional file 4). Although this does not conclusively prove that the location of the Alu element in the DGKB gene has no effect, it does suggest that merely being present within a gene as opposed to within an intergenic sequence does not necessarily offer an Alu element protection against age-associated degradation. It also confirms that the Alu element in this study is unusually pristine for its age, a characteristic associated with mobilization ability. The reason for this, if not simply by chance alone, is not clear. The structure of this Alu element and its sequence evolution in multiple species is not consistent with a gene conversion event, nor is there any evidence of differential selection. It is possible that the Alu element is located in a more protected hypomethylated environment that is not similar to the other Alu element insertions we examined in the same gene, one of which was in the same intron. But to determine this would require a more comprehensive study of the DGKB gene and its evolution.
We have estimated the Chr7 Alu insertion in this study to be about 16 million years old and concluded that this insertion was most likely a member of the Alu Ye lineage upon its insertion. In order to determine if this subfamily was actively mobilizing during the estimated time period, we examined data from a previous analysis of the Alu Ye lineage in which Salem and colleagues used PCR to determine the species distribution of 118 Alu Ye5 subfamily members . Of these, about 32% (38 of 118) exhibited the same species distribution to the Alu element in this study while another 21% of the subfamily members (25 of 118) represented even older insertions that were also shared with siamang (present in human, chimpanzee, gorilla, orangutan and siamang). The remaining Alu Ye5 elements represented younger insertions, present in human, chimpanzee and gorilla but absent from orangutan (33%), present in human and chimpanzee only (7%) or were human-specific insertions (7%). The findings of this previous study demonstrate that the Alu subfamily from which the Chr7 Alu insertion in this study is derived was actively propagating during the estimated time of its insertion. Moreover, in the orangutan lineage, the Chr7 ancestral Alu element underwent a hierarchical accumulation of multiple post-insertion diagnostic substitutions in the right arm, while also failing to accumulate the more likely random variants over the same evolutionary time period. It is inconceivable that by chance alone these post-insertion diagnostic substitutions just happen to match the young polymorphic Alu Ye5b5_Pongo elements in the orangutan, and that it is the only element identified in the orangutan genome to do so.
Our findings are consistent with a modified 'master gene' model of Alu amplification, or 'stealth model' for the expansion of lineage-specific Alu subfamilies . It has been well established that Alu subfamilies > 20 million years old still have active members in primate genomes [45, 47]. Studies of human Alu subfamilies have demonstrated that about 15% of subfamily members are active as secondary source elements , leading to a complex bush-like expansion of lineage-specific Alu subfamilies [48, 49]. Under the stealth-driver model, an Alu lineage can remain quiescent for millions of years while maintaining low levels of retrotransposition activity to allow the lineage to persist over time [46, 50]. In the case of the orangutan genome, the relative quiescence of Alu retrotransposition in the last several million years may have resulted from a population bottleneck or other demographic factors impacting their genomic architecture, and effectively disrupting the primary master-driver elements . In this scenario, the survival of an Alu lineage would require the persistence of a few very old active copies that fortuitously avoided mutational decay, slowly giving rise to more recent active daughter elements. The recent expansion of the Alu Ye5b5_Pongo subfamily in orangutan is consistent with the existence of such a backseat driver. Conversely, in the human genome, the expansion of Alu from the Chr7 locus has remained quite limited, possibly due to the overall abundance of more robust Alu systems over the same evolutionary time period.