Transcriptional dynamics of transposable elements in the type I IFN response in Myotis lucifugus cells

Background Bats are a major reservoir of zoonotic viruses, and there has been growing interest in characterizing bat-specific features of innate immunity and inflammation. Recent studies have revealed bat-specific adaptations affecting interferon (IFN) signaling and IFN-stimulated genes (ISGs), but we still have a limited understanding of the genetic mechanisms that have shaped the evolution of bat immunity. Here we investigated the transcriptional and epigenetic dynamics of transposable elements (TEs) during the type I IFN response in little brown bat (Myotis lucifugus) primary embryonic fibroblast cells, using RNA-seq and CUT&RUN. Results We found multiple bat-specific TEs that undergo both locus-specific and family-level transcriptional induction in response to IFN. Our transcriptome reassembly identified multiple ISGs that have acquired novel exons from bat-specific TEs, including NLRC5, SLNF5 and a previously unannotated isoform of the IFITM2 gene. We also identified examples of TE-derived regulatory elements, but did not find strong evidence supporting genome-wide epigenetic activation of TEs in response to IFN. Conclusion Collectively, our study uncovers numerous TE-derived transcripts, proteins, and alternative isoforms that are induced by IFN in Myotis lucifugus cells, highlighting candidate loci that may contribute to bat-specific immune function. Supplementary Information The online version contains supplementary material available at 10.1186/s13100-022-00277-z.


Background
Bats are increasingly recognized to be an important reservoir of zoonotic viruses, including Rabies viruses, Dengue viruses, Ebolaviruses, and Coronaviruses [1,2]. Remarkably, viral infection in bats is associated with minimal lethality and reduced inflammatory phenotypes, which has led to extensive research aimed at uncovering bat-specific features of immunity [3][4][5][6].
Recent genomic and functional studies in bats have begun to reveal species-specific adaptations affecting innate immune responses. For example, the interferon (IFN) genes have been subject to evolutionary expansions and contractions in different bat species, and several species exhibit constitutive expression of IFNs at low levels [1,7]. Bats also exhibit unique subsets of ISGs [8]. A time-course analysis comparing the type I IFN response in Pteropus alecto and humans revealed distinct kinetics of IFN-stimulated gene (ISG) regulation, where bats exhibit more rapid downregulation of ISGs compared to humans [9]. In addition to adaptations affecting IFN signaling, other pro-inflammatory genes are also frequently mutated or lost, including TLR genes [10], components of the inflammasome [11,12], the cGAS/STING pathway [13], and the OAS/ RNASEL pathway [9,14]. These studies have begun to reveal the genetic basis for bat-specific features of immunity which could help us understand their propensity to act as viral reservoirs.
While it is clear that bats have evolved numerous unique adaptations affecting innate immune pathways, we still have a poor understanding of the genetic mechanisms responsible for these changes. Our study focuses on TEs as a potentially important yet understudied source of mutations that shape bat immunity.
TEs are widely speculated to have been important contributors to the evolution of bats [15][16][17][18][19], including bat-specific immune functions [20,21]. While the genomes of all mammalian species contain numerous lineage-specific transposons, bat genomes are distinguished by recently active DNA transposons, which are extinct in most other mammalian lineages [15][16][17][18]. In addition to DNA transposons, bat genomes have been extensively shaped by other TEs typically found in other mammals, including LTR retrotransposons like endogenous retroviruses (ERVs), LINEs, and SINEs. Studies in other mammalian lineages have repeatedly shown that lineage-specific TEs contribute to innate immune functions through a variety of mechanisms. For example, TE and virus-derived proteins have been repeatedly co-opted as immune proteins that often restrict viruses through dominant negative activity [22] in ruminants [23], rodents [24], and primates [25]. TE-derived non-coding transcripts can readily form immunostimulatory double-strand RNAs or DNAs [26][27][28]. Finally, TEs can regulate interferon-inducible gene expression by acting as regulatory elements [29][30][31][32][33]. Notably, a recent report identified a Rhinolophusspecific LTR insertion within an exon of OAS1, which disrupts horseshoe bat antiviral activity to SARS-CoV2 with significant implications for the OAS1-mediated response to SARS-CoV2 in humans [14]. This example further supports that the recurrent independent cooption of TEs for immune functions may reflect their capacity to fuel adaptation by increasing genomic variation, especially in the context of host-pathogen coevolutionary arms races [27,28].
However, aside from this example, the potential impact of TEs on bat immunity remains largely unstudied, due to the lack of experimental and functional genomic resources available for studying bat immunology.
To conduct a comprehensive genome-wide study of TEs in bat innate immunity, we conducted transcriptomic and epigenomic profiling of the type I IFN response in Myotis lucifugus primary fibroblast cells. We used RNA-seq and CUT&RUN to characterize the IFNinducible transcriptomes and regulatory elements, which allowed us to systematically examine the contribution of TEs to loci that define the bat IFN-inducible response.

Results
To characterize the contribution of TEs to the IFN response in bats, we conducted transcriptomic and epigenomic profiling of the type I IFN response in M. lucifugus primary embryonic fibroblast cells (Fig. 1). We stimulated cells using recombinant universal IFN alpha (IFNa), and profiled the transcriptome at 0, 4 and 24 h time points using RNA-seq. We confirmed cellular response to universal IFN treatment using qPCR to measure the expression of canonical ISGs (Fig. S1), as shown previously for M. lucifugus dermal fibroblasts [8]. We also profiled 0 and 4 h time points using CUT&RUN to map genome-wide localization of H3K27ac, POLR2A, and STAT1. We aligned these data to a chromosomescale HiC assembly of the little brown bat genome (myoLuc2.0_HiC) [34,35], which was the most contiguous assembly available (Scaffold N50 of ~ 95.5 Mb).
Prior to analyzing our functional genomic data, we performed de novo repeat identification on the myoLuc2.0_HiC assembly using RepeatModeler2 [36] and HelitronScanner [37], followed by repeat annotation using RepeatMasker [38,39]. We annotated 42.7% of the genome as derived from TEs, compared to 35.5% annotated in the myoLuc2 non-HiC assembly (https:// www. repea tmask er. org/ speci es/ myoLuc. html; Fig. S2). L1 LINEs represent the most abundant TE group (14.1%), followed by virus-derived elements (ERVs; 5.6%) and DNA hAT and Helitron elements (3.7 and 3.2% of the genome, respectively) ( Fig. 2A). As previously identified [15], our analyses show the lineage-specific expansion of DNA transposons, first led by Helitron elements, and more recently by multiple subfamilies of hAT elements that may have been introduced by horizontal transfer ( Fig. 2A) [40]. Compared to the previous annotation, our annotation included more species-specific Helitron and LINE1 TEs (2.3 and 1.2% more, respectively), and assigned an additional 5.3% of the genome to "Other Repeats", a classification that groups together satellite and low complexity repeat sequences. This increase in masked repeats is congruent with the better quality of the genome assembly used in our study.

Gene and TE expression profiles upon IFN stimulation
To analyze transcriptional activity at the TE family level, we mapped RNA-seq reads to both genes and TE families using TETranscripts [42] (Fig. 2B, Fig. S3). On average, 6.38% of RNA-seq reads mapped to TEs in unstimulated cells, while 7.26% of reads mapped to TEs after 4 h IFN treatment, and 7.15% after 24 h IFN treatment. These percentages are slightly higher than what is observed in other mammals with similar TE genomic content (i.e., mouse and human, [43]). However, TE expression is highly variable between tissues and cell types for the same Fig. 1 Experimental design. A Myotis lucifugus embryonic fibroblast primary cells were treated for 24 hours (24 h) and for 4 hours (4 h) with 1000 U/ ml of universal IFNa (+IFNa) or matched volume of DPBS (−IFNa). Total RNA at both time points was extracted and used as input for RNA library preparation and sequencing to identify differentially expressed genes and transposable elements (TEs). To characterize changes in chromatin accessibility upon IFN treatment, cells were similarly treated for 4 h, and subjected to the CUT&RUN protocol on H3K27ac, POLR2A, and STAT1. B The chromosome-level genome assembly for Myotis lucifugus was used as reference to perform de novo repeat element identification and annotation. Combined with genome guided transcriptome assembly of our RNA-seq datasets, the custom repeat element annotation was used to identify TE-derived and virus-derived isoforms and transcription start sites (TSS) Fig. 2 Repeat element composition, evolutionary dynamics and transcriptional profiles. Repeat elements were annotated by combining de novo identification and homology-based searches. A Pie chart shows the relative abundance of main TE families (42.7% total) as percentage of the genome; histogram shows the composition as percentage of the genome of major TE superfamilies as a function of the divergence (Kimura2Distance) from the reference consensus sequence of each TE. Given the correlation between divergence from the consensus and time of transposition, the lower the K2D (left) and the younger a TE is. As previously identified in Myotis and other bat species, we recovered recent expansion and activity of multiple DNA elements, in particular Helitrons and more recently hAT elements. B Histograms show expression levels of major TE families as a fraction (%) of normalized read counts from RNA sequencing data of IFNa-stimulated and unstimulated cells at 4 h and 24 h post treatment. C MAplots of apeglm [41]  organism, and what we observed in primary cultured fibroblast may not reflect physiological TE transcription profiles, as TE transcription levels are affected by chromatin accessibility and TE regulatory mechanisms [43]. The most abundant TE-derived transcripts we identified included L1 LINEs, DNA/hAT elements, ERVs, SINEs and L2 LINEs. Forty-five TEs showed significant familylevel transcriptional induction at 4 h (adj. p-value < 0.05; log2FC > 1.5), and 8 families induced at 24 h according to the same cutoff thresholds (Fig. 2C). These included multiple ERV families (21), L1 LINEs (6) and DNA transposons (6) at 4 h post treatment, and ERVs (5) at 24 h post IFN treatment. These findings indicate that multiple batspecific TEs show family-level transcriptional induction in response to IFN treatment, peaking at 4 h and diminishing but still present at 24 h post induction. As previously noticed [43], the TE transcriptional landscape recapitulates genomic TE content. When only clade specific TEs were analyzed (Fig. S3, see Methods), we found similar trends where transcriptome compositions broadly summarized the genomic landscape of TEs of similar age, with Helitrons, other SINEs and L1 LINEs being the most abundantly transcribed TEs. It is noteworthy that recent TEs (defined as TEs with a divergence from the consensus sequence less or equal 20; see Methods) make up on average 50% of the transcribed TEs, with the relationship holding true for ERVs, L1 LINEs and most of DNA elements. Since Helitrons expanded in bat genomes at the beginning of the vesper bat radiation [18], we found no changes in the transcription levels of Helitrons when analyzing only recent TEs. Similarly, SINEs are dominant in the recent transcriptome having expanded in the genome more recently. An exception to this is represented by MIR SINEs, that are mostly ancient TEs and therefore highly represented in the total TE transcriptome. Compellingly, the "other SINEs" represent almost 30% of the recent TE transcriptome (Fig. S3), but they only make up a more marginal fraction of the genome (Fig. 2). This might be indicative of recent expansion under strong purifying selection (i.e, [44]), or support the hypothesis of a stochastic transcription model of the genome [43,45]. Additionally, we found only minimal induction of recent TEs following IFN treatment. This result is in sharp contrast with what was observed when the entire TE transcriptome was analyzed. It is possible that the patterns here detected underlie different constraints acting specifically to restrain excessive transcription of more recent and potentially transposition-competent TEs in the context of highly regulated physiological processes.
We next analyzed our RNA-seq dataset to identify IFN-stimulated genes (ISGs), as determined by DESeq2 comparing gene counts from treated to untreated samples in pairwise comparisons at 4 h and at 24 h post treatment (Tables S1 and S2). We first used the homology-based annotation provided by DNAzoo as a reference transcriptome. Using a cutoff of adj. p-value < 0.05 and log2FC > 1.5, we identified 213 upregulated transcripts (corresponding to 138 unique genes) and 1 downregulated gene at 4 h, and 91 upregulated transcripts  3B). Similarly, an enrichment for genes involved in response to cytokine stimuli was detected at 4 h but not at 24 h post treatment. These observations of a strong early response followed by a decline by 24 h upon IFN stimulation are consistent with observations in the black flying fox (Pteropus alecto) [9].

TE contribution to ISG transcript structure
To improve detection of potentially unannotated IFNinduced TE-derived transcripts and isoforms, we conducted genome-guided transcriptome reassembly on our combined RNA-seq dataset using StringTie2 [46] (Supplementary Data 1), and annotated assembled genes based on homology using the SwissProt database. This yielded an expanded transcriptome with 68,110 StringTie-assembled transcripts, with 32,137 transcript   isoforms matching a likely protein-coding gene with an ortholog in Swissprot, and the rest most likely corresponding to long non-coding RNAs and bat-specific genes. After performing pairwise differential expression analyses (Table S3)  We used our improved transcriptome reassembly to investigate the contribution of TEs to both constitutive and IFN-inducible transcript structures. First, we identified transcripts that contained exonized TEs based on the overlap of exon (> 50% sequence length) and annotated TE features. Considering all expressed (TPM ≥ 0.5) multi-exon transcripts, we found a subset of 1039 transcripts corresponding to 749 StringTie genes (648 transcripts and 470 genes matching Swissprot) that contained at least one TE-derived exon (Table S4).
Focusing first on transcripts reconstructed from constitutively expressed genes, we identified EEF1A1 as an example where 271 bp (representing part of the third and fourth exons; coding exons 2 and 3) is annotated as derived from a Zisupton DNA transposon. Zisupton DNA transposons are not abundant in the M. lucifugus genome (our masking recovered only one family with 174 copies). It is possible that they were horizontally transferred into bat genomes as has happened in multiple fish species [47]. Age analysis of Zisupton genomic copies suggests that they were likely recently introduced (Kimura 2 distance from the consensus sequence of ~ 18), but were able to expand only for a short period of time (peak of divergence from the consensus sequence at 4-9), when other DNA elements have been active. We further verified this finding through multiple BLAST searches (Supplementary Data 2). First, we aligned the EEF1A1 StringTie transcript (STRG.21546.2, Scaffold 10: 12654609-12,761,039) to the transcript sequence deposited on the NCBI database (XM_006089332.3). After confirming the match between the two transcripts, we used the deposited mRNA sequence as a query against the Repbase transposable element database [48] and against bats and mammalian mRNA collections. All Microchiroptera, Vespertilionidae in particular, share highly homologous coding exons 3 and 4 (Supplementary Data 3), suggesting that the exonization event occurred before or during the Microchiroptera and Pteropodinae radiation. Despite the shared homology, no Zisupton matching features were found in the human or other euarchonta EEF1A1 homologues. Therefore, our analysis successfully identified bat EEF1A1 as an example of a gene that has likely been altered by a bat-specific TE.
We next filtered for IFN-inducible transcripts containing TE-derived exon(s), and found 44 transcripts from 34 genes (16 of which annotated by SwissProt homology) induced at 4 h, and 31 transcripts from 24 genes (11 annotated) induced at 24 h (Table S4). These included genes with established roles in immune responses, like PARP9, SLFN5 (Fig. S4A) and a candidate novel isoform of IFITM2 that has not been previously identified (Fig.  S4B). This analysis reveals that numerous constitutively expressed and IFN-inducible genes have acquired exons from bat-specific TEs either in coding regions or UTRs (Table 1).
We next aimed to identify potential examples of coopted TE-derived proteins (such as syncytins [49];) which may not be masked by RepeatMasker due to their age or low copy number. We used tblastx to query expressed StringTie transcripts against reference TE protein sequence libraries specific for retrotransposons and DNA elements (see Methods). We found a total of 7 StringTie genes with a known annotated ortholog based on the concatenated StringTie transcriptome assembly, and 156 StringTie genes based on intact ORF prediction (Table S5; Methods). By using the intact ORF prediction approach we identified 9 ISGs (i.e., GBP1, DDX58 and PARP14) with at least one retrotransposon-derived feature, mostly from ERVK and L1 LINEs. Most of these TE-derived sequences reside in the last exon, where they provide both the stop codon and the 3'UTR, or novel coding and/or regulatory sequences. Finally we followed the same approach to identify novel protein-coding sequences matching known viral proteins that could represent domesticated viral proteins [50]. By leveraging the gEVE and a custom syncytin protein database we found a total of 3 constitutively expressed StringTie genes with homology to syncytin proteins and 9 with homology to either a pol, gag, retrotransposase or AP viral proteins (Table S6). Only one pol (RVT1)-derived gene, UBP18, was differentially expressed upon IFN treatment.
In addition to examining the TE contribution to protein-coding sequences, we also searched for examples of TE-derived promoters. We collapsed StringTie transcript coordinates to their transcription start site (TSS), and intersected transcript TSSs with the generated TE annotation. We identified a total of 11 transcripts with a TSS deriving from a TE that are IFN-inducible at 4 h, 9 of which are shared with the 24 h subset (Table S7). Most of these belong to genes known to be involved in immune function and regulation, like NLRC5 (Fig. 4), EIF2AK2, GBP1, MX1, MX2, PHF11, SAMD9 and XAF1, while others like PARP14 and CS012 are non-canonical ISGs. Notably, we also observed recurrent usage of TE-derived promoters for histocompatibility loci located in Scaffolds 13 and 20, where 78 transcripts (corresponding to 21 annotated genes) at 4 h and 7 transcripts (corresponding to 3 annotated genes) at 24 h have TE-derived TSSs or coding exons. While the histocompatibility locus may be prone to sequence assembly artifacts related to its highly polymorphic and complex structure, our analysis suggests that TEs may have influenced the evolution of the bat histocompatibility locus, which has been proposed to underlie batspecific immune function [51,52].

Epigenetic profiling upon IFN stimulation
Having analyzed the contribution of TEs to ISGs, we next asked how TEs contribute to inducible regulatory elements defined by H3K27ac, POLR2A, or STAT1 activity in response to IFN. We observed an increase in STAT1 signal at the predicted promoters for IRF9 and PSME2 in response to type I IFN (Fig. 5A). Using spike-in normalized CUT&RUN data at 0 and 4 h, we used DESeq2 to define IFN-inducible regulatory elements, as performed previously [29]. Unexpectedly, we did not observe robust IFN-inducible chromatin changes that are characteristic of IFN-stimulated cells from other species [29,30]. We did not identify any H3K27ac inducible elements with a false discovery rate (FDR) < 0.05 ( Fig. 5B; Table S8), and instead defined a set of 1113 elements showing an increase of H3K27ac signal with a relaxed significance threshold of unadjusted p-value < 0.1 (Fig. S5A). This set of IFN-inducible elements was enriched for interferonstimulated response element (ISRE) motifs (E-value 1.43 × 10 − 10 ) (Table S9), consistent with their activation by IFN stimulation. Thus, while our CUT&RUN analysis successfully identified some elements showing IFN-inducible activity, our analysis reveals surprisingly modest chromatin-level changes, despite robust ISG induction according to RT-qPCR from RNA taken from the same fraction of cells (Fig. S1).
Of these regions, we found that 466 out of 1113 fully overlapped at least one TE (Table S10). Additionally, we identified 766 inducible, STAT1-bound TEs that fall within 100 kb of an ISG (Table S11). This includes an LTR14_ML element that may be functioning as the promoter for the NLRC5 locus in addition to an intronic Ves2_ML SINE element (Fig. 4). However, in contrast to previous studies in other species [29][30][31][32][33], we did not observe any overrepresented TE families within this set ( Fig. S5B; Table S12). The only subfamilies that overlapped more than 10 IFN-inducible H3K27ac regions correspond to the ancient mammalian MIR and L2 families that predate the evolution of bats ( Fig. 5C; Fig. S5C). Querying H3K27ac regions from either untreated or IFN-stimulated conditions independently, we observed only very modest enrichment of the MIR3, AmnSINE1, and LTR13C_ML families ( Fig. 5D; Fig. S6). Taken together, our analysis indicates that TEs have contributed hundreds of regulatory elements involved in IFN signaling, but in contrast to studies in other species, we did not identify genome-wide enrichment of lineage-specific TE families within IFN-inducible regulatory elements. However, given that our CUT&RUN analysis revealed a relatively minimal set of inducible regulatory elements at a genome-wide level, we were limited in our ability to identify enriched TE families.

Discussion
Our study characterizes the transcriptional and epigenetic dynamics of bat TEs in the IFN response in M. lucifugus cells. To facilitate our ability to map TEs in our functional genomic data, we conducted both RNAseq and CUT&RUN using 150 bp paired end reads, and generated an improved repeat annotation using a chromosome-scale assembly. Our analyses revealed that TEs have shaped the IFN-inducible transcriptome, but we did not find strong evidence for a global role for TEs in shaping associated epigenetic changes. Functional studies will be necessary to validate whether any of the elements identified in this study have significance for bat immunity, but given the growing number of validated examples in human and mouse, it is likely that some TEs have been co-opted for innate immune function in bats.
For our study, we generated matched transcriptomic and epigenomic datasets profiling the type I IFN response in M. lucifugus primary cells. Our transcriptomic analysis of the IFN response in M. lucifugus embryonic fibroblasts confirms previously reported features of bat innate immunity. We found that these cells respond to IFN stimulation at the transcriptional level, with a stronger and broad induction of ISGs at earlier time points (4 h post treatment). We also found that only a small subset of genes that were overexpressed at 4 h maintain high expression levels at 24 h, whereas most genes show a reduction in expression to lower levels or levels similar to those recorded in unstimulated cells. This is in agreement with gene expression profiling in Pteropus alecto [9]. In parallel to gene expression, we characterized expression profiles of TE-derived transcripts, and found similar trends. Total TE expression was higher at 4 h post IFN treatment, with more and more diverse TE families being differentially expressed both in comparison to unstimulated and 24 h cells. While the expression of these transcripts is not directly indicative of function, IFN-inducible expression of bat-specific TE families may act as a source of non-coding transcripts that can further activate innate immune pathways, akin to the "viral mimicry" pathway characterized in human cancers [26,27]. Indirect support for the hypothesis of a potential involvement of TEs in innate immunity comes from the observation that whereas the total TE transcriptome follows in composition the TE genomic content, this is not true when it comes to differentially expressed TE families. Since our analyses cannot discriminate TE expression at the level of individual loci, it is possible that only few members of a given TE family are overexpressed, but the family as a whole is not. Altogether, it is likely that IFN-responsive sequence-specific features (i.e.,

IFN-responsive promoters) exist in Myotis.
We also explored whether specific TEs may have affected the transcript structure of host genes, by screening for gene transcripts that share homology with transposable elements or viral proteins in coding regions and transcription start sites (TSS). Our genome-guided transcriptome assembly identified multiple instances of TEderived and viral protein-derived exonization events, both as alternative (IFITM2) or conserved (PARP9) exons and TSS (NLRC5). Some of these transcripts represent canonical (PARP9, DDX60) or non-canonical (PLAAT3, SP140L) ISGs. These analyses provide strong evidence that TEs have been co-opted into the exons of bat ISGs, and some of these exonization events may have significant functional consequences. For example, our analysis identified multiple Myotis-specific TE insertion and exonization events affecting the NLRC5 gene. NLRC5 has been identified as a key regulator of MHC class I-dependent immune responses [53], and may be involved in the regulation of inflammasome activation and type I IFN responses [54]. Further studies are needed to validate the potential effects of these TE-derived sequences, but it is possible that Myotis-specific TEs have altered NLRC5 function and/or regulation.
We also identified instances of TE-derived constitutively expressed genes. We verified through multiple BLAST and sequence alignments that the ~ 100 amino acids of the EEF1A1 protein of Microchiroptera and likely Pteropodinae bats derived from the exonization of a Zisupton DNA transposons. This example uncovered by our analysis highlights the possibility that TEs have shaped other aspects of bat biology in addition to genes involved in immune function.
While RNA-seq profiling has been applied by an increasing number of studies to profile bat immunity at the transcriptomic level, no study to date has characterized bat immunity at the epigenomic level. Unexpectedly, while our RNA-seq analysis of M. lucifugus cells coincided with strong transcriptional response to IFN treatment, we observed relatively modest chromatin changes based on CUT&RUN epigenomic profiling. This observation contrasts with robust chromatin changes typically observed in IFN-treated cells from other mammalian species [29,55,56]. As a result, we did not observe strong evidence supporting family-level TE regulatory induction as observed previously in other species such as human [30] and cow [29], partly due to the lack of clearly inducible elements at a genome-wide level.
There are multiple potential explanations for our observation of a relatively modest IFN-inducible epigenetic response in bat cells. First, our study is one of the first to conduct CUT&RUN on bat cells, and the antibodies used in this study have not been fully validated in M. lucifugus. However, while the antibody used for STAT1 may exhibit poor recognition of bat ortholog, histones and POLR2A are highly conserved and expected to be targeted effectively by standard antibodies. Second, our study involved stimulating derived embryonic fibroblast cell culture with recombinant universal type I IFN. While these conditions nonetheless showed strong transcriptional induction of ISGs, it is possible that chromatin dynamics are different during endogenous activation of IFN responses in vivo.
Finally, our observations may reflect a unique attribute of bat immunity, consistent with the idea that some bat species exhibit constitutive IFN expression [7,57]. Although the type I IFN locus of M. lucifugus is poorly characterized, we were able to annotate at least 11 uncharacterized genes that likely reflect the expansion by gene duplication of the IFNw cluster (Fig. S7) [58], whereas no IFNa genes were identified. Of the 11 IFNw paralogues, 5 showed evidence of constitutive low expression in unstimulated cells, and induction at 4 h post treatment (Fig. S7). These observations suggest a scenario where Myotis epigenomes are "primed" due to constitutive expression of IFN, and may be capable of driving robust transcriptional activation without exhibiting epigenetic changes typically associated with inducible chromatin activity, such as increased H3K27ac or POLR2A levels.

Conclusions
Our study provides a first systematic investigation of the contribution by TEs to the bat type I IFN response. We uncover numerous examples of TE-derived transcripts, alternative exons, and regulatory elements that shape the genomic response to IFN in M. lucifugus. Our study suggests that TEs in other bat lineages such as Pteropus and Rhinolophus may also shape IFN-inducible transcriptomes, which may motivate functional studies to determine their biological significance in the context of bat immunity. Our findings lend additional support for a widespread role for TE co-option in shaping the evolution of species-specific immune responses.

Transposable element identification and analysis
Myotis lucifugus genomic repeat elements were annotated according to homology-based and de novo identification approaches. We performed de novo TE identification using RepeatModeler2 (included in dfamtetools-1.1) [36] and HelitronScanner v1.1 [37] to match the newly released, highly contiguous chromosome-level assembly of the little brown bat genome (myoLuc2_HiC) [34,35,59,60]. Genome assemblies that rely on long read assembly strategies (i.e., HiC [34,61]) are better suited for capturing full length elements over contiguous chromosome-level scaffolds.
Briefly, we performed de novo TE identification using RepeatModeler and HelitronScanner, then combined the two libraries as a single little brown bat de novo library that was used for homology-based TE annotation using RepeatMasker v4.1.0 [39]. To maximize element identification we followed a custom multi-step mapping strategy [62] using multiple libraries as reference for the masking process in the following order: (i) bat specific repeats included in the Repbase library provided with Repeat-Masker; (ii) a bat specific library provided by Dr. Cosby [19]; (iii) our de novo little brown bat library; (iv) the entire tetrapoda Repbase library provided with Repeat-Masker [48].

Cell lines and treatment
Myotis lucifugus primary embryonic fibroblast cells were a gift from Mario Capecchi (University of Utah). Cells were grown at 37 °C and 5% CO 2 and passaged in DMEM (ThermoFisher #10566016) supplemented with 10% FBS, 5% MEM nonessential amino acids, 100 U/mL Penicillin-Streptomycin, and 1 mM sodium pyruvate. Cells were seeded into six-well plates at an optimized density of 2 × 10 5 cells per well in 2 ml of culturing media (or 2 × 10 6 cells per 15 cm dish for CUT&RUN). The following day (or 48 h for CUT&RUN) cells were treated with 1000 U/ml of Universal Type I IFNa resuspended in DPBS (PBL Assay Science #11200) in 2 ml of culturing media; control cells were treated with equivalent volume of DPBS in 2 ml of media. At 4 and 24 h post treatment cells were harvested for RNA extraction (4 h for CUT&RUN).

RNA isolation and library preparation for RNA-seq
Following media removal, cells were washed with 1 ml of DPBS and detached by adding 400ul of 0.25% trypsin per well. Following a 10 minutes incubation at 37 °C, trypsin was neutralized with 1.6 ml of culturing media. Cell suspensions were transferred into 1.7 ml tubes and pelleted by centrifugation at 300×g for 5 minutes. Cells were then lysed in 300ul of RNA lysis buffer (Zymo Research #R1060- , and stored at − 80 °C until RNA extraction was performed using the Quick-RNA MiniPrep kit (Zymo Research #R1054), following manufacturer's instructions.
Total RNA samples for each time point and condition were prepared in three biological replicates as described above. A NanoDrop One spectrophotometer (Thermo Fisher Scientific) was used to determine RNA concentration and quality; all samples passed quality assessment. PolyA enrichment and library preparation was performed using the KAPA mRNA HyperPrep Kit (Kapa Biosystems #8098115702) according to the manufacturer's protocols. Briefly, 500 ng of RNA was used as input, and single-index adapters (Kapa Biosystems #08005699001) were added at a final concentration of 10 nM. Purified, adapter-ligated library was amplified for a total of 11 cycles following the manufacturer's protocol. The final libraries were pooled and sequenced on an Illumina NovaSeq 6000 (University of Colorado Genomics Core) as 150 bp paired-end reads.
Libraries were generated using a modified protocol for use with the KAPA HyperPrep Kit. Briefly, the full volume of each pulldown (50 uL)  Total RNA samples for each time point and condition were prepared in three biological replicates as described above. A NanoDrop One spectrophotometer (Thermo Fisher Scientific) was used to determine RNA concentration and quality; all samples passed quality assessment. RNA expression levels for CTCF, STAT1, and IFIH1 were quantified using the Luna Universal One-Step RT-qPCR Kit (New England Biolabs #E3005L) according to the manufacturer's instructions. In brief, for each reaction 25 ng of RNA was combined with 5ul 2× Luna Universal One-Step Reaction Mix, 0.5ul 20× Luna WarmStart RT Enzyme Mix, 0.4ul 10uM forward primer, and 0.4ul 10uM reverse primer. Reactions were amplified using a CFX384 Touch Real-Time PCR Detection System (Bio-Rad) with the following PCR cycling conditions: 10 min at 55 °C, 1 min at 95 °C, 40x(10 s at 95 °C, 30 s at 60 °C). Ontarget amplification was assessed by melt curve analysis. Two biological replicates were included for each treatment condition, and each biological replicate was run in technical duplicate. Statistical significance was assessed using a two-tailed paired Student's t-test with a threshold of p-value < 0.05.

Transcriptome analyses
Paired-end 150 bp read length FASTQ files were quality and adapter trimmed using BBDuk v.38.05 [65]; quality check was performed using FastQC v0.11.8 [66] and inspected through MultiQC v1.7 [67]. Filtered FASTQ files were then mapped to the myoLuc2_HiC genome using a 2-pass approach in STAR v2.7.3a [68]. STAR was run following default parameters and allowing for multi-mapping reads (with options '-outAn-chorMultimapNmax 100 -winAnchorMultimapNmax 100 -outFilterMultimapNmax 100'), a requisite for the inclusion of TE-mapping reads in the output files. The annotation file available on DNAzoo, here referred to as "functional annotation" (www. dropb ox. com/ sh/ xt300 ht42m ihjov/ AADoE NW7RT vR3jTh 1a8q UOmRa) was used as reference for the mapping process. For the second pass of mapping we filtered out novel junctions that mapped to the mitochondrial genome, HiC_scaffold_93 based on the most likely alignment hit to the reference mitochondrial genome performed using LASTZ v1.02.00 [69]. Resulting alignment files in sorted .bam format were then provided as input for TE and gene expression quantification in TETranscripts v2.1.4 [42] using the same gene annotation and our custom TE annotation derived from RepeatMasker. To analyze clade-specific TE expression we filtered our custom TE annotation file to include only TEs with a Kimura2D < = 20. Pairwise differential expression analyses at 4 h and 24 h post IFN treatment were performed in DESeq2 v1.32 [70]. Functional enrichment analyses of differentially expressed genes (adj. p-value < 0.05, log2FC > 1.5) were performed using the WebGestalt web tool [71].

Genome guided transcriptome assembly and analysis
Short-read RNA-seq alignment files generated by running STAR (see previous paragraph) were merged, sorted and indexed using SAMtools v1.10 [72], and the resulting .bam file was used as input for genome guided transcriptome assembly in StringTie v1.3.3b [46]. StringTie was run following default parameters, except that the minimum number of spliced reads required to align across a junction was increased from 1 to 5 using option '-j 5' . The resulting StringTie gtf output (Supplementary Data 1) was then converted to FASTA format using the gffread utility using options '-M, -F, and -Z' included in Cufflinks v.2.2.1 [73]. We followed an homology-based approach to annotate assembled StringTie genes and isoforms. StringTie gene sequences were queried against the Swis-sProt database [74] through the blastx search algorithm in BLAST 2.12.0+ [75] using options '-max_target_seqs 1 -evalue 10′. Matches were then filtered for shared sequence identity equal to or greater than 50%.
To find transcripts with coding regions that intersect TEs, we applied BEDTools v2.28.0 [76] to filter for events where at least 50% of the sequence of one exon derived from a TE with option '-f 0.5' . Briefly, exon coordinates were extracted from the StringTie annotation and intersected with our custom TE annotation for Myotis lucifugus. To narrow down the list of candidate StringTie transcripts and limit redundant of false positive matches, the output was filtered for multi-exon transcripts with a transcripts per million (TPM) value equal to or higher than 0.5. TPM values were quantified at the isoform level by RSEM v1.3.0 [77]. Filtered StringTie transcript candidates were cross-referenced against the results of differential expression analysis in DESeq2 v1.32 [70] (Table  S3) to identify TE exonization events in ISGs. Candidates ISGs were verified by visually inspecting candidates against the DNAzoo and NCBI annotations for Myotis lucifugus and RNA-seq coverage tracks. To identify TEs that might be contributing to alternative transcriptional start sites (TSS), we used a custom python script to extract the TSS coordinates from the StringTie annotation and intersected this collapsed file with the TE annotation file as previously described.
Finally, we queried StringTie-derived assembled transcripts against databases of DNA transposons and retrotransposons extracted from the Repbase repository of TE reference sequences [48]. tblastx was used according to previously specified parameters and resulting hits were further filtered for alignments greater than or equal to 300 bp with a sequence identity greater than or equal to 90%. To identify any transcripts with intact protein-coding sequences of viral origin, we ran blastx against i) the gEVE repository of retroviral proteins [78] and ii) a custom database of syncytin proteins collected from the ncbi repository [79]. The output of gEVE blast was filtered for alignments greater than or equal to 200 bp with a shared sequence identity greater than or equal to 50%. The output of syncytin blast was filtered for alignments greater than 100 bp with shared sequence identity of 50% and above. The same blast analysis for TE sequences and protein databases was carried out on identified open reading frames (ORFs) larger than 50aa found by running the function usearch [80] (usearch -fastx_findorfs -orfstyle 7 -mincodons 16) on the StringTie assembled transcripts file.
Peak calling was performed using complete and size subsetted alignment files with MACS2 v2.1.1 [84] in a two-step process where separate sets of peaks were called with 1) single-end options '--format BAM --shift=-75 --extsize=150' and 2) paired-end option '--format BAMPE' . For both modes only peaks with an unadjusted p-value < 0.01 were retained. Peaks from each mode were subsequently merged. IgG peaks were subtracted from each pulldown peak set to minimize background. Only the top 20,000 peaks by descending MACS2 peak score were retained for further analysis.
To assess the contributions of TEs in regulating the IFN response, we intersected IFN-inducible H3K27ac peaks with all annotated TEs, requiring that all reported TEs are fully contained within the peak. We further characterized the regulatory landscape by identifying STAT1marked TEs as a function of distance to the nearest ISG (FDR < 0.05, log2FC > 1.5) transcriptional start site using BEDTools v2.28.0 [76]. To assess family-level enrichment, GIGGLE v0.6.3 [87] was used to create a database of TEs in the myoLuc2_HiC genome using the custom TE database as described above. IFN-inducible H3K27ac peaks were then queried against the TE database. Results were ranked by descending Giggle enrichment score, and enriched TE families were identified according to the odds ratio, Fisher's two-tailed p-value, and number of overlaps. The TE heatmaps were prepared by selecting elements within various families that overlapped either IFN-inducible H3K27ac regions or any H3K27ac regions from untreated or IFN conditions. Signal from S. cerevisiae spike-in, CPM normalized bigwigs was plotted as heatmaps using deepTools v3.0.1 [82,83].