Analysis of western lowland gorilla (Gorilla gorilla gorilla) specific Alu repeats
© McLain et al.; licensee BioMed Central Ltd. 2013
Received: 19 September 2013
Accepted: 23 October 2013
Published: 22 November 2013
Skip to main content
© McLain et al.; licensee BioMed Central Ltd. 2013
Received: 19 September 2013
Accepted: 23 October 2013
Published: 22 November 2013
Research into great ape genomes has revealed widely divergent activity levels over time for Alu elements. However, the diversity of this mobile element family in the genome of the western lowland gorilla has previously been uncharacterized. Alu elements are primate-specific short interspersed elements that have been used as phylogenetic and population genetic markers for more than two decades. Alu elements are present at high copy number in the genomes of all primates surveyed thus far. The Alu Y subfamily and its derivatives have been recognized as the evolutionarily youngest Alu subfamily in the Old World primate lineage.
Here we use a combination of computational and wet-bench laboratory methods to assess and catalog Alu Y subfamily activity level and composition in the western lowland gorilla genome (gorGor3.1). A total of 1,075 independent Alu Y insertions were identified and computationally divided into 10 subfamilies, with the largest number of gorilla-specific elements assigned to the canonical Alu Y subfamily.
The retrotransposition activity level appears to be significantly lower than that seen in the human and chimpanzee lineages, while higher than that seen in orangutan genomes, indicative of differential Alu amplification in the western lowland gorilla lineage as compared to other Homininae.
Alu elements are a family of primate-specific SINEs (Short INterspersed Elements) of approximately 300 base pairs (bp) long and present in the genomes of all living primates [1–3]. Alu elements were derived from 7SL RNA, the RNA component of the signal recognition particle, in the common ancestor of all living primates . In the past approximately 65 million years Alu elements have become widely distributed in primate genomes [1, 5]. Alu elements are now present at copy numbers of >1,000,000 in all surveyed great ape genomes (Additional file 1) . Despite their high copy number the majority of Alu elements are genomic fossils, non-propagating relics passed down over millions of years after earlier periods of replicative activity [1, 6]. It is hypothesized that a relatively small number of ‘master’ elements are responsible for the continued spread of all active subfamilies [7, 8].
As non-autonomous retrotransposons, Alu elements do not encode the enzymatic machinery necessary for self-propagation [1, 2]. This is accomplished by appropriating the replication machinery [2, 9] of a much larger, autonomous retrotransposon called LINE1 (L1) via a process termed target-primed reverse transcription (TPRT) [10–13].
The effective use of SINEs as phylogenetic markers was first demonstrated in 1993 in a study seeking to resolve relationships between Pacific salmonid species . Subsequent to this study, SINE-based phylogenetic methods have been applied across a wide range of species to determine evolutionary relationships [15, 16]. In particular, Alu elements have proven to be extremely useful tools for elucidating evolutionary relationships between primate species [1, 17]. The essentially homoplasy free presence of an Alu element of the same subfamily at a given locus between two or more primate species is almost always definitive evidence of shared ancestry . The possibility of confounding events is very small, and easily resolved by the sequencing and examining of the element in question [1, 18]. In the past 15 years Alu-based phylogenetic methods have been used with great success to resolve evolutionary relationships among the Tarsiers [19, 20], New World  and Old World monkeys [22–24], gibbons , lemurs [26, 27], and great apes .
In addition to phylogenetic applications Alu elements also function as effective markers for the study of population genetics via examination of polymorphic elements between members of the same species [2, 29, 30]. Alu elements are also linked to numerous genetic diseases, and the insertion of an element at an importune genomic location can have grave consequences for the individual involved [3, 31, 32]. Additionally, Alu elements are thought to be a causal factor in genomic instability [33–36].
Alu elements are classified in multiple major subfamilies and numerous smaller, derivative subfamilies based on specific sequence mutations [37–40]. All extant primates share older elements, while all primate lineages examined also have younger, lineage-specific subfamilies . Alu subfamily evolution is parallel, not linear, and various subfamilies have been found to be actively retrotransposing at the same time in all primate genomes surveyed; each primate lineage thus possesses its own Alu subfamilies [1, 42, 43].
The Alu J subfamily is the most ancient Alu lineage, and was largely active from approximately 65 million years ago to approximately 55 million years ago, at which point Alu S evolved and supplanted Alu J as the predominant active subfamily [37, 41]. Due to the antiquity of the lineage, Alu J subfamilies are present in all extant primates, including Strepsirrhines [27, 44]. Alu S, on the other hand, evolved from Alu J after the Strepsirrhine-Haplorrhine divergence, and so is only found in New World and Old World primates [2, 37, 45]. The Alu Y subfamily subsequently evolved from Alu S in the Old World primate lineage, and remains the predominant active subfamily in catarrhines [1, 41, 45].
A number of Alu Y-derived subfamilies continue to be active in great apes , and polymorphic lineage-specific Alu elements have been well documented between existing human populations , indicating a continued activity level for these mobile elements. A rate of one new element in every approximately 20 live births has been proposed as the current rate of Alu element activity in the extant human population, but the large size of this population coupled with human generation time would make it very difficult for new elements to come to fixation outside of small population groups [46, 47]. Research into Alu element activity in Sumatran and Bornean orangutans has indicated a comparatively low-level of continued retrotransposition activity in these apes , suggesting some alteration of the propagation of Alu within this lineage .
The western lowland gorilla (Gorilla gorilla gorilla), a subspecies of the western gorilla (Gorilla gorilla), is a critically endangered great ape endemic to the forests and lowland swamps of central Africa [50, 51]. Western lowland gorillas are gregarious, living in family groups comprised of a dominant male, multiple females, subadult males, and juvenile offspring . Western lowland gorillas are in danger of extinction due to human activity. Their wild population size is shrinking in the face of anthropogenic pressure and diseases such as Ebola . Gorillas are a close evolutionary relative of humans and the Pan lineage of chimpanzees and bonobos, with the most widely accepted date for a common ancestor 6 to 9 million years ago [28, 53–55], though a date as early as 10 million years ago has been recently proposed .
The genome of ‘Kamilah’ , a female western lowland gorilla living at the San Diego Zoo, was initially assembled from 5.4 Gbp of capillary sequence and 166.8 Gbp of Illumina read pairs, and further refined using bacterial artificial chromosome (BAC) and fosmid end pair capillary technology . This sequence is available from the Wellcome Trust-Sanger Institute.
Previous analyses of Alu elements in gorillas have been limited to analysis in the context of wider research projects [28, 58–61] and have not focused specifically on subfamily analysis. Here we examine the western lowland gorilla genome (build gorGor3.1)  to identify gorilla-specific Alu Y subfamilies and assess the activity levels, copy number, and age of these subfamilies. Our final analysis resulted in the identification of 1,075 Gorilla specific Alu element insertions.
A total of 1,085,174 Alu elements were identified in the genome of the western lowland gorilla (Additional file 1). Of these, 286,801 were identified as belonging to the ancient Alu J subfamily, and 599,237 were identified as members of the Alu S subfamily. A total of 57,427 elements were too degraded or incompletely sequenced to be assigned a subfamily designation by RepeatMasker, and were simply identified as ‘Alu’. We identified 141,709 members of the Alu Y subfamily. This subfamily is of particular interest due to its relatively young age and known continued mobility in other great ape genomes [1, 62]. Approximately one-third (57,458) of these putative Alu Y elements were >250 bp in length. Gorilla-specific elements were subsequently identified by comparison of orthologous loci in the genomes of human, common chimpanzee, and orangutan . Putative unique, gorilla-specific Alu Y insertions were estimated at 4,127 copies. This number is similar (96.5%) to the 4,274 gorilla-specific Alu elements identified using other approaches . Individual examination demonstrated that the majority of our 4,127 loci were in fact shared insertions. These loci were manually examined for gorilla specificity using BLAT . This manual examination excluded 2,858 loci from further analysis due to the presence of shared insertions missed by Lift Over (2,626 insertions) or the lack of orthologous flanking regions in the genomes of other species that preclude PCR verification (232 insertions). This resulted in a total of 1,269 likely gorilla-specific Alu insertion loci for inclusion in subfamily structure analysis.
Computational and PCR analysis of the western lowland gorilla genome has identified 1,075 independent, gorilla-specific Alu Y insertion loci. Computational analysis of this dataset indicates the presence of 10 distinct subfamilies identifiable by the presence of diagnostic mutations specific to each lineage. The 1,075 elements identified in our study almost certainly do not represent the total number of Alu Y specific to western lowland gorilla genome. Any loci under our arbitrary length of >250 were excluded from our dataset. It is also likely that a number of Alu Y loci are located in portions of the genome where sequence data is incomplete; within repeat regions, for example. Additionally, some Alu Y loci were excluded when no orthologous genomic region was present in the species being used for comparison.
The largest newly identified gorilla-specific Alu subfamily was designated as Alu Y_Gorilla. This designation was established via computational evaluation and manual alignment of the 759 elements assigned to this subfamily. The consensus sequence for these elements was found to be 100% identical to the canonical Alu Y human consensus sequence (Figure 2). This subset of classic Alu Y elements continued to propagate in the Gorilla lineage after the divergence from the shared common ancestor with the Homo-Pan lineage. We assayed and verified a total of 135 loci from this subfamily via PCR (18%). The 43 elements belonging to the Alu Ya1_Gorilla subfamily differ from the Alu Y consensus sequence by one diagnostic mutation at nucleotide position 133. We assayed and verified via PCR 21 elements in this subfamily (49%). This sequence should not be confused with the Homo-Pan Alu Ya subfamily.
The Alu Ya1b4 subfamily is derived from Alu Ya1_Gorilla and is a small and very likely young subfamily of 13 elements that shared the diagnostic mutation at position 133 of Ya1 but has also accrued four additional diagnostic mutations. We assayed and verified via PCR seven elements in this subfamily (54%). A second identified Alu Y lineage in gorilla is the Alu Yc3_Gorilla subfamily. We assayed and verified via PCR 20 of the 69 elements in this subfamily (29%). The consensus sequence for the 69 members identified in this subfamily is a 100% match to the human Alu Yc3 subfamily consensus sequence (Figure 2).
Two additional gorilla-specific Alu Yc-derived subfamilies share the characteristic 12 bp deletion at positions 87–98 that is a hallmark of human Alu Yc5. These two subfamilies possess independent diagnostic mutations that make them distinct from the Alu Yc5 consensus sequence. These two subfamilies are designated as Alu Yc5a3_Gorilla (55 elements identified) and Alu Yc5b2_Gorilla (46 elements identified). Alu Yc5a3_Gorilla has three additional diagnostic mutations differentiating it from the Alu Yc5 consensus as a mark of identification. In keeping with Alu subfamily naming convention this subfamily has thus been deemed ‘Yc5a3’, ‘a’ as the first Yc5-like subfamily identified in the gorilla genome and ‘3’ for the three diagnostic mutations differentiating it from the canonical Yc5 consensus. We assayed and verified 27 members of this subfamily via PCR (49%). Alu Yc5b2 also shares the characteristic 12 bp deletion of the human Alu Yc5, but has two independent diagnostic mutations (Figure 2). We assayed and verified via PCR 19 members of this subfamily (41%). It is probable that Alu Yc5a3_Gorilla and AluYc5b2_Gorilla derived from Alu Yc5 around the time of the Homo/Pan-Gorilla speciation event.
A third lineage nearly identical to human Alu Yb3a2 was identified as Alu Yb3a2b2_Gorilla (25 elements identified). This Alu subfamily contains two additional diagnostic mutations. Termed Alu Yb3a2b2_Gorilla, this lineage is an independent evolution in the Gorilla gorilla gorilla genome and not a derivative of the human-specific Alu Yb3a2. The Alu Yb lineage is human specific, meaning any identical or apparently derived Alu lineages in other primate genomes must be examples of independent evolution . This is confirmed by the lack of orthologs at the same location in the human genome. We assayed and verified 14 members of this subfamily via PCR (56%). An additional subfamily present at only 17 copies and derived from Alu Yb3a2b2_Gorilla was identified and termed Alu Yb3a2b2a2_Gorilla, due to two diagnostic mutations separating these otherwise identical subfamilies. We assayed and verified via PCR nine elements in this subfamily (53%). The low copy number of these subfamilies coupled with their lack of impairing point mutations, even with the caveat that some subfamily members may have been overlooked, leads us to posit that they are among the youngest and potentially still active subfamilies in the western lowland gorilla genome.
Two additional subfamilies were identified that, while clearly Alu Y derived, do not follow the consensus sequences of established subfamilies available via RepBase. The first, termed Alu Y16_Gorilla is identified clearly by the presence of an A-rich insert at position 219 followed by a 16 bp deletion, and is present in 30 copies. We assayed and verified via PCR 10 members of this subfamily (33%). The second subfamily, apparently derived from the first and designated Alu Y16a4_Gorilla, is present in 18 copies and can be distinguished from Alu Y16_Gorilla by a 20 bp deletion occurring after the A-rich region at position 219. Seventeen elements from this subfamily were assayed via PCR (94%), with 100% of these 17 being verified as gorilla-specific. One locus (gorGor3.1 chrX:74544052–74544324) lacked sufficient orthologous 5′ sequence in non-gorilla outgroups to successfully design a working primer, but was included in the total count based on computational verification. The accumulation of non-diagnostic mutations in these two subfamilies may indicate that they are more ancient.
Approximately 25% of the 1,075 gorilla-specific Alu Y elements computationally identified in this study were verified by PCR, with the remaining approximately 75% verified by manual examination of computational data. It is important to note that we had no false positives in this study, and 100% of the elements computationally identified as gorilla-specific that were subsequently assayed via PCR were confirmed to be, in fact, gorilla-specific.
One means of identifying potential master elements  is to look for subfamily members with mutation-free polyA-tails . A possible source element for the Alu Y_Gorilla subfamily, for instance, was identified on chrX:5135584–5135921, with a mutation-free 30 bp polyA-tail and intact promoter region. A posited source element for the Alu Yc5b2 subfamily was identified on chr9:17925753–17926051, also with a mutation-free 30 bp polyA-tail and intact promoter region.
Alu Y retrotransposition rates appear to be lower in the western lowland gorilla genome than in the human or chimpanzee genomes , while higher than that seen in the orangutan genome [48, 49]. Factors influencing rates of retrotransposition are myriad [1, 46]. Active retrotransposons are frequently polymorphic within a population, and are easily lost during events like speciation or population bottlenecks [70, 71]. The number of active elements, and the amplification rate of elements surviving such an event, can vary greatly and impact overall retrotransposition activity in the host genome.
A possible explanation for this lower activity level include inhibition of retrotransposition in the Gorilla lineage by the interaction of host factors such as members of the APOBEC family of proteins with the enzymatic machinery of L1 [1, 72]. Interference with L1 and Alu retrotransposition by APOBEC has been documented [72–74]. Analysis of the activity level of Gorilla-specific L1 elements could elucidate this, but has not yet been done. Additionally, environmental stress factors may impact retrotransposition rates . It is possible that one or a combination of these retrotransposition-inhibiting factors could be responsible for the lower level of Alu Y activity in the western lowland gorilla genome.
A median joining tree of relationships between gorilla-specific Alu Y subfamilies was generated from a stepwise alignment  using the Network program (Figure 1) [42, 77]. The tree generated supports the divergence of all gorilla-specific subfamilies from the Alu Y_Gorilla subfamily, and analysis of subfamily ages using BEAST places the date for this subfamily divergence at the stem of the Gorilla lineage. Initial divergence of gorilla-specific subfamilies occurred shortly after the speciation event separating the Gorilla lineage from the Homo-Pan lineage 6 to 9 million years ago [28, 53–55], and master elements have continued to produce copies of each subfamily at varying rates since .
BEAST analysis of individual subfamily ages suggests no delay or change in transposon activity in western lowland gorilla following the divergence of the Gorilla and Homo-Pan lineages. The age of the gorilla-specific lineages ranges from 6.5-6.71 million years ago based on a baseline divergence of 7 million years ago for the most recent common ancestor of Gorilla and Homo-Pan. This indicates that all of the identified subfamilies originated around the time of the speciation event that separated these two lineages. This result is consistent with the ongoing propagation of these subfamilies before, during, and after the speciation event at a relatively constant rate. This indicates that the ‘master genes’  from which these subfamilies are derived already existed and were retrotranspositionally active prior to the aforementioned speciation event, and have remained active subsequently. Examination of Alu elements indicates retrotranspositionally active elements are relatively rare, and that most Alu activity is the result of a small number of ‘master’ copies engaging in retrotranspositional activity over time . Our results suggest that the 10 gorilla-specific Alu Y subfamilies identified in this study diverged and are still diverging from master elements already present in the genome of the common ancestor of the Gorilla and Homo-Pan lineages. A table listing each subfamily, the ‘master gene’ or ancestral Alu subfamily from which it was likely derived, the % divergence from the consensus sequence of the master element, copy number, and suggested age of the most recent common ancestral element are available in the Additional files section of this paper as Additional file 3.
Alu Y subfamily activity appears to be greatly reduced in the western lowland gorilla genome when compared to the human and chimpanzee genomes. The level of activity seen, while not as low as that observed in the genome of the orangutan, is consistent with a change in host surveillance or intrinsic retrotransposition capacity. Alu subfamily age estimates provide further support for the master gene model of Alu retrotransposition with a relatively low number of ancient lineages responsible for ongoing retrotranspositional activity. The 1,075 lineage specific Alu Y insertion loci and the 10 subfamilies identified should provide future researchers with a rich source of genetic systems for conservation biology and evolutionary genetics.
DNA sample data of all species examined in this study
Gorilla gorilla gorilla
Western lowland gorilla
Northern white-cheeked gibbon
African green monkey
Loci selected for verification were examined for further evidence of gorilla-specificity using the BLAST-Like Alignment Tool (BLAT) available at the UCSC Genome Browser website . Putative gorilla-specific loci were compared to the available genomes of three other primate species, human (hg19), chimpanzee (panTro2), and orangutan (ponAbe2) [64, 83]. Elements found to be absent in these species and with sufficient orthologous flanking across species were marked for PCR primer design and experimental validation. Loci determined to be shared insertions, as well as those lacking sufficient orthologous flanking for effective primer design, were removed from further consideration .
The COSEG program , designed to identify repeat subfamilies using significant co-segregating mutations, was then run on the remaining putative gorilla-specific insertions to identify and group specific subfamilies together. COSEG ignores non-diagnostic mutations during analysis, providing an accurate representation of relationships between subfamilies of elements by ignoring potentially misleading mutational events . COSEG uses a minimum subfamily size of 50 elements as the default setting. We arbitrarily defined subfamilies as groups of >10 elements to increase the detail of subfamily structure resolved. A subset of a minimum of 10% of each identified subfamily was then chosen for verification using locus-specific PCR, with a total of 279 loci assayed and verified (Figure 1).
A multi-species alignment comprised of the species listed above was created for each locus using BioEdit . Oligonucleotide primers for the PCR assays were designed in shared regions flanking each putative gorilla-specific element chosen for experimental verification using the Primer3Plus program . These primers were then tested computationally against available primate genomes using the in-silico PCR tool on the UCSC Genome Bioinformatics website .
Subfamily age estimates were calculated using the BEAST program [66, 87]. BEAST has previously been used to estimate dates of divergence using transposon data . For each subclade, the consensus sequence for each subfamily was determined from the COSEG output . The progenitor element was determined by RepeatMasker analysis of each consensus sequence. Elements were aligned using the SeaView software program and MUSCLE algorithm [76, 89]. The progenitor element was then used as an out-group to root the tree of each subclade. BEAST was calibrated with a baseline divergence date of 7 million years ago for the split between the Gorilla and Homo-Pan lineages. A divergence date of 7 million years ago is within the generally accepted 6 to 9 million years ago range for this divergence [28, 53–55]. BEAST was run with the following parameters: Site Heterogeneity = ‘gamma’; Clock = ‘strict clock’; Species Tree Prior = ‘birth death process’; Prior for Time of Most Recent Common Ancestor (tmrca) = ‘Normal distribution’ with mean of 7.0 million years and 1.0 standard deviation’; ucld.mean = ‘uniform model’ with initial rate set at 0.033; Length of Chain = ‘10,000,000’; all other parameters were left at default settings.
The Network program  was run on gorilla-specific Alu Y subfamily consensus sequences to generate a stepwise tree of relationships between identified subfamilies [42, 77]. The consensus sequences for the gorilla-specific Alu Y subfamilies were aligned using the MUSCLE algorithm  and converted to the .rdf file format using the DNAsp program . The .rdf file was then imported to Network, and a median-joining analysis was run. The resulting output file demonstrating evolutionary relationships between subfamilies is presented in Figure 1C.
To verify gorilla-specificity, locus specific PCR was performed with a nine-species primate panel comprised of DNA samples from the following species: Western lowland gorilla (Gorilla gorilla gorilla); Human HeLa (Homo sapiens); Common chimpanzee (Pan troglodytes); Bonobo (Pan paniscus); Bornean orangutan (Pongo pygmaeus); Sumatran orangutan (Pongo abelii); Northern white-cheeked gibbon (Nomascus leucogenys); Rhesus macaque (Macaca mulatta); African green monkey (Chlorocebus aethiops). Information on all DNA samples used in PCR analysis is listed in Table 1.
Bacterial artificial chromosome
Bayesian evolutionary analysis sampling trees
Basic local alignment search tool
Blast-like alignment tool
Long interspersed element
Polymerase chain reaction
Short interspersed element
Target-primed reverse transcription
University of California Santa Cruz.
The authors wish to thank G. Cook, J.A. Walker, S. Herke, and M.K. Konkel for all of their helpful advice during the course of this project. Special thanks go to Sydney Szot (firstname.lastname@example.org) for the primate illustrations. We thank the American Type Culture Collection, The Coriell Institute for Medical Research, the Integrated Primate Biomaterials and Information Resource, and Dr. Lucia Carbone (http://carbonelab.com) for providing the DNA samples used in this study. This research was supported by National Institutes of Health Grant RO1 GM59290 (MAB). ATM was supported in part by a Louisiana Board of Regents Graduate Fellowship and the Louisiana State University Graduate School Dissertation Fellowship. MLF was supported by the Louisiana Biomedical Research Network with funding from the National Center for Research Resources (Grant Number P20GM103424), and by the Louisiana Board of Regents Support Fund.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.