Transposable elements (TEs) are abundant in eukaryotic genomes, particularly mammalian genomes. Indeed, at least 45% of the human genome is made up of TE-derived sequences[1, 2], which are non-randomly distributed across the genome. For example, human Alu short interspersed elements (SINEs) are predominantly found in GC- and gene-rich regions, whereas L1 long interspersed elements (LINEs) are most prevalent in low-GC and gene-poor regions[1, 3]. Transposable elements have also been shown to affect the expression of host genes via the provisioning of a variety of regulatory sequences. The non-random genomic distribution of human TEs, considered together with their regulatory potential, initially suggested the possibility that the TEenvironment of human genes might affect the way that they are expressed.
In fact, a number of associations between the TE environment in-and-around human genes and their expression levels and functional patterns have subsequently been observed. Weakly expressed genes generally contain low SINE and high LINE densities, while the most highly expressed human genes are enriched for SINEs (Alu) and depleted in L1 elements. Additionally, Alu elements are significantly associated with the breadth of gene expression across tissues[7, 8]. Thus, highly and broadly expressed housekeeping genes are identifiable by their TE-content, which is rich in Alus and poor in L1s. Functionally, TEs have recently been demonstrated to have been exapted during the evolution of novel phenotypic characteristics, such as mammalian pregnancy[10, 11]. Mammalian-wide interspersed repeats (MIRs) are the only TEs that show a positive association between their prevalence in-and-around genes and tissue-specific gene expression[8, 12].
MIR elements are an ancient family of tRNA-derived SINEs[13, 14], whose anomalous sequence-conservation levels among mammalian genomes were initially taken as evidence that they encode some unknown regulatory function. Succeeding studies demonstrated that, in a number of individual cases, MIRs do in fact donate transcription-factor binding sites[16–20], enhancers[18, 21, 22], microRNAs[23, 24] and cis natural antisense transcripts to the human genome. The association of MIRs with tissue-specific expression, along with their propensity to be exapted as regulatory sequences, suggests to us the possibility that they might provide numerous tissue-specific regulatory sequences across the human genome.
Enhancers are regulatory elements that are most highly associated with tissue-specific expression[26, 27]. They are also characterized by a unique chromatin environment made up of a specific combination of histone modifications[26–29]. Consistent with their role as tissue-specific regulatory elements, the enhancer chromatin environment is highly variable across cell-types, compared to other classes of regulatory sequences[26, 27, 29]. We hypothesized that the global coincident association of both MIRs and enhancers to tissue-specific gene expression is at least in part a consequence of MIR sequences frequently acting either as enhancers and/or constituting fragments of enhancer sequences. This would be consistent with previously reported individual cases of TE-derived enhancers[21, 30–32]. We also reasoned that the enhancer-characteristic chromatin environment could serve as a useful proxy to identify putative MIR-derived enhancers.
To test our hypothesis, we performed a genome-wide assessment of the relative prevalence of MIRs within enhancer sequences and explored the potential mechanistic bases and functional consequences of this relationship. We found that not only are MIRs highly concentrated in predicted enhancers, but they also constitute a significant part of the core of genic enhancers; this analysis identified many more putative MIR-derived enhancers than previously reported[22, 33]. These MIR-derived enhancers have cell-type specific chromatin profiles that are highly similar to those seen for canonical enhancers. Furthermore, we report MIRs to be major donors of transcription-factor binding sites (TFBSs) within enhancers, and show that MIR-derived enhancers affect both the level and tissuespecificity of gene expression. Using the erythroid K562 cell-line as an example, we show that MIR-enhancers are involved in the modulation of several developmentally-specific biological processes related to erythropoiesis.