This study shows that retrotransposition activity values obtained from cell-culture based assays are roughly proportional to the estimated rates at which new L1s emerge in the human population. L1s with higher retrotransposition activities branch more frequently on the phylogenetic tree of human L1s. There appears to be an asymmetry in the evolution of retrotransposition activity, where L1s change more readily from high to low retrotransposition activity than from low to high. This asymmetry was shown by two different analysis methods. These two methods also showed that while low-activity L1s rarely turn into high-activity L1s, the rate for this transition is not zero. In addition, there was evidence for a recent decrease in the rate at which L1s emerge in the human population. When combined, the estimated insertion rates and rates of L1 retrotransposition activity evolution suggest that L1s continue to grow in the human population, albeit at a rate that decreased recently.
Comparative methods, such as the ones used in this analysis, have several limitations. They can provide misleading results when applied to un-replicated evolutionary events [21]. The lack of replication should not be a major concern in our dataset since the L1 tree indicates that there were several phylogenetically independent transitions between high and low retrotransposition activity (Fig. 1). A specific caveat of the BiSSE model is that unaccounted variation in the speciation rate can lead to a spurious correlation between specific character states and the speciation rate [22]. However, this is mainly a problem for analyzing speciation rates of complex organisms where myriads of traits could potentially affect the speciation rate. The context of our analysis is different. For one, L1s are not organisms and therefore harbor fewer traits that could be associated with speciation. Furthermore, cell-culture based retrotransposition estimates directly quantify insertion events. The most parsimonious expectation should therefore be that the speciation rate observed on the L1 phylogenetic tree is proportional to cell-culture based estimates. Our BiSSE results indicate that this expectation is consistent with the data.
There is an additional caveat for applying the BiSSE model to L1 retrotransposition. The BiSSE model interprets each internal node of the phylogenetic tree as a speciation (or in our case retrotransposition) event. The 155 different L1 loci studied in our analysis require 154 retrotransposition events. However, these 154 retrotransposition events do not have to exactly coincide with the 154 internal nodes of the L1 tree, because strictly speaking, the internal nodes correspond to coalescent rather than retrotransposition events and the coalescent process within the human population might be on a comparable time scale as the time between different retrotransposition events. More accurate parameter estimation might therefore require a model that considers the coalescent and retrotransposition process simultaneously. Nevertheless, the ratio of speciation rates estimated via the BiSSE model for high and low retrotransposition L1 is very close to the ratio of retrotransposition rates for these L1 classes obtained from cell cultures, suggesting that the results obtained by ignoring the coalescent process might still be reasonably accurate.
The results of the BiSSE and BayesTraits models provide also information on how retrotransposition activity evolves after insertion. Both approaches show clear statistical support for a model in which the evolutionary change from high to low retrotransposition activity is much more likely than for the reverse. This is consistent with a priori expectations since random mutations of L1 sequences are more likely to disrupt the retrotransposition machinery than improve it. Both approaches indicate that, nevertheless, L1s occasionally change from low activity to high activity. Each model has its own strength and weakness. The BiSSE model requires ultrametric trees, and hence a more restrictive phylogenetic estimation procedure, but it allows incorporating the effects of activity on branching. The BayesTraits model poses no restrictions on the tree branch lengths but does not incorporate the effects of activity on branching. The fact that both models arrive at qualitatively similar conclusions about the evolution of retrotransposition activity underscores the robustness of these results.
Interpretation of the BiSSE parameters requires a careful consideration of the data. Both studies whose data were used in this analysis [9, 10] searched for full-length L1s in a limited set of sample sequences. The first study performed a BLAST search of a full-length L1 sequence against human genomic databases available in 2003 [9]. 44% of the 90 L1s analyzed in this study are polymorphic with an average allele frequency of 44%. The second study searched for non-reference L1s in fosmid clones constructed from genomic DNA of six individuals and only analyzed L1s that occur in at least two fosmid clones [10]. 100% of the 69 L1s identified in the second study are polymorphic with an average allele frequency of 16% [10]. The average allele frequency of all L1s from both studies combined is 62%. By comparison, the average allele frequency of polymorphic full-length L1s in the 1000 genome data is 3% [23]. (Data were obtained from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/, L1s were identified using the tag ““INS:ME:LINE1“ and the tag “SVLEN” was used to select L1s over 6000 bp length). Hence, the majority of L1s included in this analysis are polymorphic and occur at a higher population frequency than the average L1, suggesting that methods to detect L1 in the two studies were biased against low-frequency L1s.
The speciation rates of the BiSSE model estimates the rate at which new L1 insertions occur and reach high enough population frequencies to be detected. As such, the speciation rate in the BiSSE model combines the effects of mutation, selection and drift. Similarly, the rate at which an L1 generates new L1 insertions depends on its retrotransposition activity and population frequency. The estimated transition rates between high and low retrotransposition activity are therefore the result of the combination of two processes, the evolution of retrotransposition activity and changes in allele frequency. On average, full-length L1s have a negative selection coefficient [24] and most likely individual L1 insertions vary widely in their selective effect. The speciation rates estimated here subsume this variation into population-level averages. While these averages ignore a lot of biological complexity, they are sufficient for analyzing population-level dynamics. Since full-length L1s have a negative selection coefficient [24], they depend on drift to increase in population frequency. The smaller the effective population size the more important the relative contribution of drift, and therefore the more likely L1s are to emerge. The human effective population size has changed over time with a bottleneck about 4700 generations ago and population expansion in the last 140 generations [18]. It is therefore likely that the rate at which new L1s reach higher population frequency was higher during the bottleneck and slowed down recently. The change point model confirms a recent decline in the apparent retrotransposition rate that is most likely due to the recent increase of the effective population size.
Since the majority of L1s included in this analysis are polymorphic, the estimated speciation rates are likely to be higher than the allele substitution rate, i.e. the rate at which new alleles arise and become fixed in the population, and lower than the de-novo insertion rate. Estimating the substitution rate would require restricting the model to fixed L1s. However, restricting the analysis to L1s that are fixed in the population would miss the high-activity alleles that tend to be polymorphic and contribute significantly to the overall retrotransposition [9].
The best-fitting BiSSE model restricts the extinction rates to zero (μ0 = μ1 = 0). There are two possible explanation for these zero extinction rates. Non-zero extinction rates lead to an uptick of the apparent speciation rate in the very recent past, since these are branches that have not yet gone extinct [25]. The zero extinction rate could therefore be an artefact of a decline in drift due to recent population expansion that masked an uptick in apparent speciation rate. Alternatively, the zero extinction could be because low frequency L1s have a low probability to be included in the two studies whose data were used in this analysis. L1s that reached a sufficient population frequency to be detected, might get lost from the population at a rate that is low, relative to the other rates in the BiSSE model. Either way, an extinction rate of zero in the fitted BiSSE model does not contradict a frequent loss of L1s shortly after insertion, because most of these low frequency L1s would not be detected in the studies analyzed here.
According to the BiSSE model, an average full-length L1 generates 3.2 *10− 6 new L1 insertions per generation. The model furthermore estimates that at a steady state, 75% of L1s are low activity, leading to an average retrotransposition activity of 27%. Ewing & Kazazian estimated the L1 retrotransposition in humans to be between 1/95 and 1/270 births [26]. Our population-level estimates of insertion rates would be equivalent to the insertion rate per individual if L1 insertions were selectively neutral [27]. In that case, each individual would have to carry on the order of 103 average retrotransposition competent full-length L1s for our estimate to be compatible with the estimate by Ewing & Kazazian. However, the published estimates of the number of L1s with intact ORFs in a human genome range from 90 to 266 [28, 29]. There are several possible reasons for this mismatch in number of active L1s. For one, the ratio of high and low retrotransposition L1s might not yet be in steady state. Furthermore, full-length L1s are under negative selection [24]. Negative selection weeds out many L1s shortly after insertion, which could explain why the insertion rate on the individual level is much higher than a population-level substitution rate. This effect can be even more pronounced when there is a variation in selective effects, so that a certain proportion of L1s are selected out immediately after insertion.
It is unknown whether L1s are growing in the human population or are at a stable equilibrium. Linear models, such as the BiSSE model, only allow for exponential growth or decline. According to our parameter estimates, L1s grow currently exponentially with a doubling time in human genomes of 5*105 generations. It is not clear what mechanism would lead to a negative feedback of L1 density on average retrotransposition rate that is required for a stable equilibrium. It has been suggested that a stable equilibrium for retrotransposition is obtained when the number of available genomic positions becomes limiting and L1s repeatedly insert into pre-existing L1s [7, 8]. However, the low density of active L1s in human genomes makes it unlikely that such a feedback is the driving force for an equilibrium. Alternatively, there might be no equilibrium for the number of L1s but instead co-evolutionary cycles where phases of high L1 retrotransposition lead to evolutionary adaptations in the host that suppress retrotransposition, which in turn increases selection for L1s that can escape the host suppression. There is some empirical evidence for such cycles [30]. A more complete understanding of the L1 dynamics in human genomes will require a model that combines the effects of L1 retrotransposition rate on L1 growth, the evolution of this rate and the fitness effects on the host. The results presented here are a first step in that direction by providing parameter estimates for the first two components.