- Research
- Open access
- Published:
Long noncoding RNAs emerge from transposon-derived antisense sequences and may contribute to infection stage-specific transposon regulation in a fungal phytopathogen
Mobile DNA volume 14, Article number: 17 (2023)
Abstract
Background
The genome of the obligate biotrophic phytopathogenic barley powdery mildew fungus Blumeria hordei is inflated due to highly abundant and possibly active transposable elements (TEs). In the absence of the otherwise common repeat-induced point mutation transposon defense mechanism, noncoding RNAs could be key for regulating the activity of TEs and coding genes during the pathogenic life cycle.
Results
We performed time-course whole-transcriptome shotgun sequencing (RNA-seq) of total RNA derived from infected barley leaf epidermis at various stages of fungal pathogenesis and observed significant transcript accumulation and time point-dependent regulation of TEs in B. hordei. Using a manually curated consensus database of 344 TEs, we discovered phased small RNAs mapping to 104 consensus transposons, suggesting that RNA interference contributes significantly to their regulation. Further, we identified 5,127 long noncoding RNAs (lncRNAs) genome-wide in B. hordei, of which 823 originated from the antisense strand of a TE. Co-expression network analysis of lncRNAs, TEs, and coding genes throughout the asexual life cycle of B. hordei points at extensive positive and negative co-regulation of lncRNAs, subsets of TEs and coding genes.
Conclusions
Our work suggests that similar to mammals and plants, fungal lncRNAs support the dynamic modulation of transcript levels, including TEs, during pivotal stages of host infection. The lncRNAs may support transcriptional diversity and plasticity amid loss of coding genes in powdery mildew fungi and may give rise to novel regulatory elements and virulence peptides, thus representing key drivers of rapid evolutionary adaptation to promote pathogenicity and overcome host defense.
Background
Transposable elements (TEs) are selfish genetic and typically repetitive elements in genomes. Autonomous TEs carry genes for self-replication, transposition, and DNA integration, allowing them to multiply and spread in host genomes, often leading to high copy numbers. They are classified into DNA transposons, spreading by excision from one and integration into another genomic location (“cut-and-paste” mechanism), and retrotransposons, which transcribe their sequence into an RNA transcript, which in turn is reverse-transcribed and then integrated as a new copy into the genome, frequently at a distant site [1, 2]. TEs can dramatically influence genome size in eukaryotes [3], destabilize genome integrity [4], and significantly affect genome structure and gene expression patterns [5].
Powdery mildew fungi are ubiquitous phytopathogens of many plant species in temperate climates [6, 7]. The genomes of powdery mildew fungi often consist to more than 50% of TEs, resulting in largely inflated genomes and highly variable genome sizes between species [8]. The barley (Hordeum vulgare) powdery mildew Blumeria hordei possesses a genome of approximately 125 million base pairs (Mbp) in length with a TE content of more than 74% [9, 10]. The TE space of B. hordei is dominated by long interspersed nuclear element (LINE) elements, predominantly Tad1, and long terminal repeat (LTR) elements, especially Ty3/mdg4 (syn. “Gypsy”) and Ty1/Copia [9, 10]. While such a high TE content likely poses a stress on genome integrity in B. hordei, TEs can also be a source of variability and adaptation [4] and contribute to the evolutionary invention of novel virulence factors [11, 12]. However, it is presently unknown how powdery mildew fungi regulate their TEs to prevent uncontrolled spread and, hence, genome disintegration.
Filamentous fungi evolved the repeat-induced point mutation (RIP) mechanism that serves to mutagenize repetitive sequences and helps to disable TEs [13]. However, the RIP mechanism is missing in all powdery mildew fungi sequenced to date [8,9,10, 14], suggesting that it was lost in the course of evolution in a common ancestor of powdery mildew fungi [14]. Instead, we previously found that small RNAs (sRNAs) can originate from B. hordei TEs and that phased small RNAs (phasiRNAs) align to fungal TEs, suggesting frequent RNA interference (RNAi) to control TE activity [15, 16].
Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides with no apparent coding potential [17, 18]. Like mRNAs, lncRNAs are transcribed by RNA polymerase II and often processed by splicing, 7-methylguanosine capping, and polyadenylation [19]. lncRNAs can be distinguished from sRNAs and microRNAs, which are smaller than 200 nucleotides [20], RNA polymerase III-generated noncoding RNAs, including ribosomal and transfer RNAs, and the RNA polymerase II-derived components of the spliceosome, i.e. small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNAs) [21]. Compared to coding genes, lncRNAs frequently exhibit lower expression levels and sequence conservation on average [21]. lncRNAs can be classified as intergenic, intronic, sense, and antisense lncRNAs depending on their relative position to coding genes [22], although the majority of lncRNAs described to date reside in intergenic regions [23,24,25]. lncRNAs engage in biological processes such as transcriptional and post-transcriptional gene regulation, epigenetic regulation, regulation of translation, and regulation of protein activity [19, 26]. A number of lncRNAs have been functionally analyzed in mammals, such as HOTAIR, roX1, roX2, and Meg3, which form triplexes with purine-rich DNA regions to exert regulatory functions [21, 27], and in plants, where lncRNAs are involved in developmental processes and stress responses [28]. In fungi, lncRNAs function in regulating development, stress and starvation responses, and pathogenic traits [29, 30]. For instance, the human-pathogenic fungus Cryptococcus neoformans exhibits dynamic infection stress-related lncRNA expression patterns [31]. The rice blast fungus Magnaporthe oryzae expresses more than 500 lncRNAs specifically during infection [23], suggesting that lncRNAs may be important for plant-pathogenic fungi throughout host colonization. Interestingly, in maize, TEs can give rise to lncRNAs in response to abiotic stresses [32], suggestive of regulatory roles exerted on TE activity by lncRNAs. We hypothesize that lncRNAs could also occur in powdery mildew fungi, be expressed during different infection stages, and fulfill regulatory functions toward controlling TE activity.
In this work, we aimed to discover signs of co-regulation between TEs, coding genes, and lncRNAs in the barley powdery mildew fungus B. hordei. We generated a time-course whole-transcriptome shotgun sequencing (RNA-seq) dataset to measure transcript levels at all relevant stages of the asexual infection cycle of B. hordei, which allowed us to provide manually curated high-quality genome-wide annotations of coding genes, TEs, and lncRNAs. TEs exhibited signs of time point-specific regulation and co-regulation with mRNAs and lncRNAs, and we found evidence for phasiRNAs linked to retrotransposons, suggestive of RNAi-directed regulation of TEs in B. hordei. Importantly, we discovered 823 lncRNAs on the antisense strand of TEs, which may provide precursors for RNAi with TEs.
Results
Stranded total RNA sequencing of B. hordei and barley transcripts
We hypothesize that B. hordei TEs are differentially regulated throughout the pathogenic life cycle of the fungus. We performed stranded RNA-seq at 0, 6, 18, 24, 72, and 120 h post inoculation (hpi) to obtain transcriptomes of all major developmental stages of the asexual pathogenic life cycle of B. hordei and its host, barley (Fig. 1A). The RNA sequencing yielded on average 84,528,021 raw reads/sample and 81,708,016 reads/sample after quality trimming (Supplementary Table 1). FastQC quality control revealed high per-base quality (per-base quality score > 30) across all samples, but also 7.7% sequence duplication on average, which originated from barley chloroplast RNA and had no similarity to the H. vulgare and B. hordei nucleic genomes. Of the remaining reads, 0.2–1.5% and 4.6–24.8% mapped to the reference genome of B. hordei isolate DH14 [10] at the early (0, 6, 18 and 24 hpi) and late (72 and 120 hpi) time points, respectively. The relatively low fungal mapping percentage can be explained by the B. hordei tissue only representing a small fraction of the biological sample material (leaf epidermal peels), especially at early infection stages before 24 hpi. Despite a rather low proportion of fungal reads, RNA-seq analysis of powdery mildew-infected plant samples can retrieve biologically meaningful insights [33]. Further, between 65 and 93% of the reads mapped to the H. vulgare cv. ‘Morex’ reference genome version Morex3 [34], while between 6.9% and 13.2% of the reads could neither be assigned to the host nor the pathogen. Using Pearson correlation coefficient-based hierarchical clustering, principal component (PC) analysis, and non-metric multidimensional scaling (NMDS), we found that the sequencing samples from the same time points were more similar to each other than to samples from the other time points (Fig. 1B-C). However, the principal components PC1 and PC2 did not separate B. hordei time points 6 hpi from 18 hpi, nor 72 hpi from 120 hpi, while hierarchical clustering and NMDS indicated these to be different. Overall, both B. hordei and H. vulgare RNA-seq data clustered according to infection time point, suggesting that the dataset is informative with respect to the respective infection stage.
TEs are transcriptionally active in a time point-dependent manner in B. hordei
Since the genome of B. hordei is largely occupied by TEs [9, 10], we generated a manually curated [35] database of non-redundant TEs for B. hordei (Supplementary Fig. 1). In total, we established 353 consensus repeats including a TE consensus database of 344 TE families, of which 116 were LTR-Ty3 (aka LTR-Gypsy), 122 were LTR-Ty1 (aka LTR-Copia), 80 were LINE-Tad1 elements, 3 were unknown LTR-type TEs, 3 were NonLTR retroposons, 2 were SINE (Eg-R1 and Egh24), and 18 were DNA transposons (Table 1). We then re-annotated the TEs in the B. hordei genome with the custom TE consensus database using RepeatMasker. We found that 47,581,730 bp (38.22%) of the B. hordei isolate DH14 genome assembly were occupied by LTR retrotransposons, 29,472,397 bp (23.67%) by LINE retrotransposons, 10,208,381 bp (8.20%) by DNA transposons, 4,028,256 bp (3.24%) by SINE retrotransposons, and 2,013,553 bp (1.62%) by other or unclassified elements (Supplementary Table 2). Overall, we discovered 13,907 full-length (9.4%) and 133,617 truncated TEs in the B. hordei genome; similarly low full-length conservation was observed in the fungus Laccaria bicolor with 6.3% [36]. In total, according to the updated calculation, the B. hordei reference genome contained 93,304,317 bp (74.95%) interspersed repetitive sequence, which is close to previous estimates [8, 10].
Next, we analyzed the RNA-seq read coverage of fungal consensus TEs in the course of pathogenesis. On average, both DNA and LINE and LTR retrotransposons exhibited high (> 500 transcripts per million (TPM)) levels of read mapping (Fig. 2A-B). Further, using time-resolved analysis of RNA-seq data via TCseq [37] and quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) analysis for selected elements, we found that transposon families can exhibit time point-specific enhanced transcript levels (Fig. 2C-D and Supplementary Fig. 2), suggestive of dynamic regulation of TE expression during plant infection.
Non-coding RNAs are a common means to reversibly silence gene expression [20, 38]. Previously, we found that sRNAs often originate from TEs [16] and that phasiRNAs preferentially align to TEs, indicative of RNAi activity on transcripts originating from TEs [15]. We therefore used publicly available sRNA-seq datasets for B. hordei [15, 39] to assess phasing of consensus TEs via the tool unitas [40]. Overall, we detected the accumulation of phasiRNA mapping in 104 of the 344 consensus TEs (Table 1), of which 91 were LTR elements (51 Ty3/mdg4 and 40 Ty1/Copia elements) and 13 were Tad1 LINE elements (Fig. 3A). These 104 elements corresponded to 1,333 full-length and 39,499 truncated TEs in the genome of B. hordei. Of these, 102 exhibited expression levels of more than 250 TPM at a minimum of one time point. We compared these apparently phased TEs with TEs showing time point-specific expression patterns (Fig. 2C) and found 22 to be both phased and undergoing dynamic expression (12 Ty3/mdg4, 7 Ty1/Copia, and 3 Tad1 elements; Fig. 3). Thus, phasing of TE transcripts may contribute to the dynamic regulation of TEs throughout the pathogenic fungal life cycle.
Further, we observed cases of RNA-seq read-mapping to TE loci that exhibited signs of splicing, with splice sites located on the antisense strand of the TE-encoded genes (Fig. 3D). To preclude multi-mapping artifacts because of short-read mapping to highly repetitive loci, we performed long-read transcript sequencing with Oxford Nanopore Technology (ONT) MinION technology at 144 hpi and observed reads corresponding to spliced antisense lncRNA transcripts in TEs (Fig. 3D). These transcripts did not appear to encode any meaningful peptides and thus were TE-derived antisense long noncoding RNAs (lncRNAs). We selected eight TE antisense lncRNAs that exhibited expression levels above 10 TPM at a minimum of one time point, which we successfully amplified, cloned and sequenced (Sanger technology). In two of these (BLGHnc_000866-RA and BLGHnc_004496-RA), we verified the occurrence of the predicted introns (Figs. 3E and F). Collectively, this approach independently validated the existence of, in part intron-containing, TE antisense lncRNAs in B. hordei.
Genome-wide lncRNA annotation in B. hordei
We next identified and manually annotated all putative lncRNAs in the genome of B. hordei using our extensive time-course RNA-seq dataset. We assembled 45,797 transcripts using StringTie [43] and designed a pipeline to identify lncRNAs (> 200 bp and no apparent coding potential) from the assembled transcripts (Fig. 4A). We used a Web Apollo instance [44] to manually inspect and correct coding and noncoding gene models, resulting in 5,127 lncRNA loci across the B. hordei genome, accounting for 8,970 transcripts in total. Of the 5,127 lncRNA loci, 2,401 (46%) were recovered using ONT long read transcript sequencing, including 293 TE antisense lncRNAs. We consider this a satisfactory recovery rate, since we obtained long read transcriptomic data only from the late infection stage at 6 dpi and recovered polyadenylated rather than total transcripts. Due to the extensive transcriptome data in our study, which was not available for coding gene annotation before [10], we further detected 43 new coding genes in the genome of B. hordei, removed 12 models that were located within TEs and encoded predicted transposon replication genes, and corrected another 142 models, which changed the coding gene number from 7,118 to 7,149. Of these, 5,457 (76%) were recovered by ONT long read sequences The most common functional assignments of the 43 newly annotated encoded proteins are predicted candidate secreted effectors, additional Sgk2 kinase-like paralogs, and fungal zinc finger DNA binding domains (Supplementary Table 3).
We found that lncRNAs were shorter than coding mRNAs on average (1,494.8 bp and 2,110.8 bp, respectively) and contained fewer exons (2.3 and 3.0 exons on average; Fig. 4). More than 50% of lncRNAs consisted of two exons, while lncRNAs with more than five exons were rare (< 1.9%). Overall, mRNAs exhibited 15.5-fold higher median expression levels than lncRNAs in our dataset, and there was no significant difference in this respect between intergenic and antisense lncRNAs. In total, 914 intergenic lncRNAs derived from TEs, of which 823 originated from the antisense strand of the respective TE. Overall, lncRNAs, like coding genes and TEs, were ubiquitously distributed throughout the genome and we did not find any lncRNA-rich genome compartments (Supplementary Fig. 3).
Furthermore, we found that 2,378 of the 5,127 lncRNA genes encoded more than one transcript variant, suggesting alternative splicing of a substantial portion of the lncRNAs in B. hordei (Fig. 4F). Two lncRNAs gave rise to eleven separate isoforms, another three lncRNAs contained up to nine isoforms, while most lncRNA genes had two (1,488; 62%) or three (555; 23%) transcript variants (Fig. 4F). The most common alternative splicing event was a retained intron (2,016 events), followed by an alternative 3’ splice site (1,163 events; Fig. 4G). We detected some cases in which alternative lncRNA isoforms showed differential accumulation at different time points during fungal pathogenesis (Fig. 4H), suggesting that alternative splicing of lncRNAs occurs in a developmental stage-specific manner in B. hordei.
Gene expression patterns in B. hordei reflect its pathogenic development
Transcripts are differentially regulated in B. hordei during its pathogenic development (Fig. 1B and C). To gain detailed insights into (co-)expression profiles, we assigned all reads mapping to the B. hordei reference genome to the annotated 7,149 coding genes and 5,127 lncRNAs (Supplementary Table 4 for TPM values). We then clustered lncRNA and mRNA gene expression patterns across all time points using TCseq [37] and identified nine distinct co-expression clusters featuring 2,264 lncRNAs and 2,762 coding genes in total (Fig. 5A). The clusters exhibited distinct expression patterns, with peaks of transcript accumulation at different time points. Seven clusters showed transcript peaks at one or two specific time points. Cluster 4, comprising transcripts whose levels peaked at 18 hpi, contained the highest number of transcripts, namely 336 mRNAs and 344 lncRNAs (Table 2). We further analyzed the expression pattern of all genes encoding putative secreted proteins in B. hordei and found distinct sets of these genes with peaked transcript levels at 0, 6, 18–24, or 72–120 hpi, indicative of “waves” of genes coding for secreted proteins expressed during fungal pathogenesis (Fig. 5B). Clusters 4, 5, 6, and 8 contained an elevated number of putative secreted proteins (> 20% of coding transcripts each), coinciding with peaked transcript accumulation at 18, 24, or 72 hpi (Fig. 5C and Table 2). By contrast, genes encoding Sgk2-like serine/threonine kinases, which are abundant in the genome of B. hordei [50], almost exclusively occurred in cluster 9, which represents genes expressed at 120 hpi (Fig. 5C and Table 2). Since ROPIP1 of B. hordei is an Eg-R1-derived peptide [11], we also assessed the time course expression pattern of the consensus ROPIP1 gene (Supplementary Fig. 4), and found consistently high ROPIP1 expression levels above 1,000 TPM with a peak at around 4,000 TPM at 6 hpi. Furthermore, we analyzed the expression patterns of proteins from the RNAi machinery in B. hordei, which we identified previously [16]. Genes encoding components of the dsRNA synthesis machinery, i.e., RNA-dependent RNA polymerase (BLGH_05945), helicase QDE-3 orthologs, and replication factor A1 (BLGH_03661) were mostly expressed early (0–6 hpi; Supplementary Fig. 5). The DCL-1 ortholog BLGH_05892 showed both early and late expression (0 and 120 hpi) while DCL-2 (BLGH_05549) was only expressed at late time points. AGO1 (BLGH_02442) expression peaked during haustorium formation (18–24 hpi) and the two other AGO-encoding genes (BLGH_02817, BLGH_06130) exhibited high expression of up to 1,226.0 TPM late in the pathogenic cycle (72–120 hpi). The only DNA methylase of B. hordei, DNMT-2 (BLGH_06833), which could be involved in RNAi-mediated DNA methylation, displayed the highest expression in conidia at 0 and 6 hpi.
We conducted gene ontology (GO) enrichment analysis for the coding genes found in the nine clusters described above (Fig. 5A). While no enrichment was detected in case of 0 hpi (clusters 1 and 2), we found GO processes related to cell division, cell cycle, microtubule activity processes, oligosaccharide metabolism, response to stimulus, and DNA repair to be enriched at 6 hpi (cluster 3; Fig. 5D; Supplementary Table 5). At 18 and 24 hpi, the process “ribonuclease activity” was enriched, likely reflecting an abundance of RNase-like candidate secreted effector proteins [53, 54] expressed at this time point (cluster 5). During late infection (72 hpi and thereafter; clusters 7 and 8), processes related to protein, nucleotide, and fatty acid biosynthesis and gene expression were over-represented. Finally, at 120 hpi (cluster 9), processes related to protein phosphorylation and phosphorus metabolism were enriched. Protein phosphorylation was likely enriched due to the high number of Sgk2-like serine/threonine kinases ([50]; 52 in total; 19.4% of mRNAs in cluster 9; Table 2) expressed at 120 hpi (Fig. 5C and D).
We conducted the time-course expression and GO enrichment analysis for the plant side as well (Supplementary Table 6 and Supplementary Fig. 6). In the host barley, we found eight co-expression clusters, with peaks at 0, 6, 18, 24, or 120 hpi, and a cluster of 646 genes exhibiting distinct down-regulation at 6 and 18 hpi (cluster 6; Supplementary Fig. 6A). Processes related to stress or defense response, carbohydrate biosynthesis, and cell wall biogenesis were associated with early-responding genes (6 hpi to 24 hpi; clusters 2–5), but these processes were also over-represented in cluster 6, i.e., the set of 646 genes that were down-regulated at 18 hpi. At 72 hpi and 120 hpi, sulfur, nitrogen, organic acid metabolism-related, and catabolism terms were highly represented, in addition to biotic stress response (cluster 7 and 8). Overall, we found that early developmental processes and stress response genes dominated the early infection stage of B. hordei, while processes related to growth and metabolic activity were enriched at later stages (Fig. 5A and D). Effector candidates were expressed in at least four waves (Fig. 5B), while Sgk2-like serine/threonine kinases appeared to be associated mostly with the asexual late infection stages in B. hordei (Fig. 5C and Table 2).
B. hordei lncRNAs may co-regulate transcripts in cis and trans
We noted that lncRNAs, mRNAs, and TEs all exhibited time point-specific expression peaks (Fig. 6A). Thus, we conducted a weighted gene co-expression network analysis (WGCNA) including all B. hordei mRNAs, lncRNAs, and TEs to identify possible co-expression networks and infer putative regulatory networks (Supplementary Fig. 7). We detected 14 co-expression networks consisting of at least ten transcripts each (Fig. 6B). Two of these networks included more than 100 genes encoding putative secreted proteins, one of which corresponded to a network of genes induced at 18–24 hpi (blue, 179 secreted) and the other to a network of genes induced at 72–120 hpi (salmon, 136 secreted). Then, we determined the genomic physical distance between the genes encoding these transcripts and calculated the Pearson’s correlation coefficient (PCC) R value with a cut-off of |R|≥ 0.7 (cis) or |R|≥ 0.9 (trans) to discover positive and negative co-regulation of lncRNAs with mRNAs (Supplementary Table 7), or a cut-off of |R|≥ 0.7 for lncRNA comparison with consensus TEs (Supplementary Table 8). We regarded cis co-regulated lncRNAs and mRNA if their maximal genomic distance was 10 kb, otherwise the respective pair was regarded co-regulated in trans. Overall, positive co-regulation was more common than negative co-regulation, and co-regulation was more common in trans than in cis. In case of lncRNA-mRNA regulons, we detected 6,018 positively and 339 negatively co-regulated trans pairs, and 109 positively and 13 negatively co-regulated cis pairs. For lncRNA-consensus TE correlations, we found 6,169 positive and 334 negative interactions in trans. Further, we detected seven negative correlation cases of TE antisense lncRNAs and their respective TE, such as BLGHnc_000540-RA located antisense of Ty1/Copia-23 and BLGHnc_002596-RA, an antisense lncRNA of Ty3/mdg4-9 (Fig. 6C and D). Meanwhile, six TE antisense lncRNAs were positively correlated with the respective TE, such as BLGHnc_004551-RA and Ty3/mdg4-48 and (Fig. 6E). Our findings suggest that some transposon antisense lncRNAs may directly regulate the TE they derive from, although the majority of these are co-regulated with distal TEs or mRNAs.
Conclusions
TEs are highly abundant in the genomes of powdery mildew fungi, including B. hordei [8, 9], but their transcript levels have not been measured so far. High-throughput datasets using next-generation sequencing technologies can be powerful tools to understand powdery mildew biology and evolution [56]. Here, we generated a comprehensive stranded RNA-seq dataset that contains information about all fungal and host transcripts based on random rather than oligo-dT priming during cDNA synthesis. Our dataset covers all major stages of B. hordei during the asexual infection cycle on barley, from early conidiospore germination to sporulation (Fig. 1A). Limitations of the dataset are that (1) the reads are derived from only one isolate (B. hordei K1AC; [57]) during the compatible interaction with its host barley, (2) we did not include the sexual reproduction stages, and (3) the short reads preclude precise assignment to individual TE members. We partially overcame the latter limitation by including long read-based transcriptome sequencing via ONT at a late stage of infection to recover many fungal reads and recovered 46% of lncRNA and 76% of mRNA transcripts annotated initially with the help of short reads. A higher transcript recovery rate would require additional long read-based transcriptomes from all infection stages. Despite the limitations, our data allowed us to provide a genome-wide curated annotation of lncRNAs and their alternatively spliced isoforms (Fig. 4) and to measure stage-specific transcript accumulation of coding genes, noncoding RNAs, and TEs simultaneously (Figs. 2, 5 and 6). Altogether, our findings suggest a TE regulatory network involving both (phased) sRNAs (Fig. 3) and lncRNAs (Fig. 6). Despite applying a more stringent significance cut-off for trans-regulated pairs, our analysis recovered > 10-times more trans- than cis-regulated pairs. Examples of trans-co-regulation of lncRNAs with other transcripts in fungi are rare [58]. The per-chance occurrence of co-regulated transcripts increases exponentially when removing spatial constraints; thus, global co-expression networks should be interpreted carefully.
Active TEs can dramatically change the architecture of a genome, including gene disruption or dysregulation, gene duplication, and structural variation [4, 5]. While they threaten genome integrity and thus survival, TE-induced variations also increase adaptability under stressful conditions like host immunity. This high risk-high reward relationship has been referred to as a devil’s bargain between plant pathogens and TEs [59]. Indeed, the cereal powdery mildews B. hordei and B. graminis f.sp. tritici exhibit extensive copy number variation of effector-coding genes, possibly as a consequence of the escape of the recognition of the respective gene products by host resistance proteins [10, 60, 61]. Considering the missing RIP mechanism [8, 9], powdery mildew fungi are likely to possess powerful alternative TE regulating mechanisms to limit the risk of their activity. TEs are often regulated epigenetically or by RNAi [5]. The arbuscular mycorrhiza fungus Rhizophagus irregularis, which is a broad-range mutualist of predominantly vascular plants, harbors a genome of around 150 Mb that consists to 47% of repetitive elements [62]. Interestingly, R. irregularis exhibits transcription of TEs during spore development, while TE expression is regulated by DNA methylation and RNAi [63]. Similarly, we previously found signs of sRNA phasing linked to B. hordei TEs [8, 16]. Here, we confirmed that phasiRNAs are abundant in 104 of 344 consensus TEs (Fig. 3) and found signs for time point-dependent expression of different Dicer and Argonaute genes (Supplementary Fig. 5), suggesting that RNAi is a major mechanism to regulate TE activity in fungi, including B. hordei.
We found TE antisense lncRNAs, in part spliced, in B. hordei (Fig. 3), which showed both positive and negative co-regulation with the TE they derived from (Fig. 6). TEs can give rise to novel lncRNAs in vertebrates [64, 65] and plants [32, 66]. According to the repeat insertion of domains of lncRNAs (RIDL) hypothesis, TEs can insert into and then become functional domains of lncRNAs, thus driving the emergence, evolution, and functional diversity of lncRNAs [67, 68]. TEs can also give rise to lncRNAs that contribute to regulatory networks under stressful conditions in plants [32, 69]. TE-derived lncRNAs respond differently to phosphate stress between two ecotypes in Arabidopsis thaliana [69], while mutation of the abiotic stress-responsive TE-derived lncRNA lincRNA11195 impairs the response to abscisic acid and root development in A. thaliana [70]. Interestingly, lncRNAs associated with TEs in A. thaliana exhibit high variability between accessions and developmental stages, possibly driven by the TE silencing machinery [71]. Likewise, we observed highly variable expression patterns throughout the pathogenic life cycle of B. hordei; it is conceivable that TE-derived lncRNA expression varies significantly between B. hordei isolates or Blumeria species. TE-derived lncRNAs in plants may also regulate gene expression as competing endogenous RNAs acting, for instance, as miRNA sponges, or through interaction with chromatin remodeling complexes [66]. Fungi are also able to activate TEs in response to biotic or abiotic stress [23, 31], like the Septoria leaf blotch pathogen Zymoseptoria tritici inducing expression of distinct sets of TEs under conditions of starvation or during wheat infection [72]. Likewise, the rice bast pathogen M. oryzae exhibits lncRNA expression specifically during host infection [23], which is comparable to our observations of infection stage-specific induction of lncRNAs in B. hordei throughout barley infection. Our data lends support to two scenarios: (1) TE antisense lncRNAs arise due to active TE expression and permit their own down-regulation, for example, by inducing RNAi, and (2) lncRNAs are expressed independently of the activity of the TE they derive from and fulfil distinct regulatory functions.
Appropriate transcriptional programs are key for organisms to respond effectively to environmental cues. Transcriptional programs appear to be responsive to cues from the respective host in broad-spectrum plant pathogens [73] and drive lifestyle transitions in hemi-biotrophic plant pathogens, such as Fusarium and Colletotrichum species [74, 75]. Likewise, obligate biotrophic and host-specialized pathogens, such as B. hordei, are likely responsive to various host-derived signals to activate suitable transcriptional programs throughout its pathogenic life cycle. Similar to previous findings in B. hordei infecting the hypersusceptible non-host mutant plant Arabidopsis thaliana pen2 pad4 sag101 [33], we observed distinct sets of effector candidate-encoding genes to be transcriptionally induced at specific stages of barley infection (Fig. 5B). A previous microarray study also found one class of effectors to be transcribed at pre-penetration but not post-penetration stages in B. hordei [76]. The different pathogenic stages are likely to require different sets of effectors conducting functions appropriate for the respective phase of infection [77, 78]. For instance, early infection stages prior to successful host cell penetration probably require the aggressive suppression of host immunity, while subsequent steps also depend on reprogramming of the host plants’ secretory system [79, 80], metabolism, and source-sink status [81, 82]. Likewise, we observed that stress response genes are induced in B. hordei during early infection, when the fungus faces the strongest resistance in the form of the generation of reactive oxygen species and the biosynthesis and delivery of antimicrobial compounds (Fig. 5D). Indeed, barley showed up-regulation of defense response and cell wall biogenesis-related genes, but also down-regulation of such genes at 18 hpi. Contrary to previous microarray studies [76, 83], we also observed cell cycle, cell division, and microtubule processes at the early infection stage. However, these processes may have been missed in the previous analysis as genome information was lacking and, in contrast to microarrays, which are limited to known probes, RNA-seq recovers all expressed genes in an organism. Meanwhile, genes encoding proteins functioning in sulfur, nitrogen, and organic molecule biosynthesis showed elevated transcript levels at 72 hpi, after successful colonization of the plant by B. hordei (Supplementary Fig. 6). In line with previous observations [76, 83], protein and nucleic acid metabolic processes and protein biosynthesis including translation and ribosome biogenesis were enriched after 72 hpi. At late infection stages, processes related to vegetative growth, gene expression, and metabolism dominated in B. hordei, suggestive of an undermined plant immune system and a supply of metabolites to fuel biosynthetic processes.
It has not escaped our attention that TE-derived lncRNAs might give rise to novel coding genes in the course of evolution. These genes could encode novel effectors that help establish and maintain the biotrophic interaction with the host plant. In one such case, the B. hordei effector ROP-interactive peptide 1 (ROPIP1) originates from a SINE/Eg-R1 TE and transmits to barley host cells in the course of infection, where it interacts with barley RACB and disrupts cortical microtubules [11]. Thus, ROPIP1 promotes the infection success of B. hordei on barley. The large number of TE antisense lncRNAs in B. hordei implies that B. hordei and possibly fungal plant pathogens with TE-enriched genomes in general repurpose TEs for the invention of novel genes to encode effector proteins.
Methods
Plant and pathogen cultivation
Plants of barley (H. vulgare cv. ‘Margret’) were cultivated in SoMi513 soil (Hawita, Vechta, Germany) in a long-day cycle (16 h light period at 23 °C, 8 h dark at 20 °C) at 60–65% relative humidity and a light intensity of 105–120 μmol s−1 m−2. Plants were inoculated with B. hordei strain K1AC at seven days after germination and then transferred to isolated growth chambers with a long day cycle (12 h light at 20 °C, 12 h dark at 19 °C), approximately 60% relative humidity, and 100 μmol s−1 m−2 light intensity.
RNA sampling and RNA sequencing
We inoculated the susceptible barley (H. vulgare) cultivar (cv.) ‘Margret’ with B. hordei isolate K1AC [57] and procured total RNA from leaf epidermal peelings to enrich for fungal RNAs and infected epidermal cells [84] in a time-course experiment. In three independent experimental replicates per time-point, we sampled and performed stranded RNA-seq at 0, 6, 18, 24, 72, and 120 h post inoculation (hpi) to obtain transcriptomes of all major developmental stages of the asexual pathogenic life cycle of B. hordei and its host, barley (Fig. 1A). The epidermal peels were ground to a fine powder in liquid nitrogen using mortar and pestle and RNA isolated via TRIzol (Thermo Scientific, Karlsruhe, Germany) according to the manufacturer’s instructions. Genomic DNA was removed using DNase I (RNase-free, Thermo Scientific). RNA integrity was determined by microcapillary electrophoresis (2100 BioAnalyzier system; Agilent, Santa Clara, CA, USA) and RNA quantity by spectrophotometry (NanoDrop; Thermo Scientific) and spectrofluorimetry (Qubit; Thermo Scientific). We sent 18 high-quality total RNA samples (RIN > 6.0, c[RNA] > 100 ng µL−1, m[RNA] > 2 µg) for stranded RNA sequencing at 150-bp paired-end reads, random oligomer priming to recover total RNA, and plant/animal ribosomal RNA depletion, provided by Novogene Europe, Cambridge Science Park, UK. The raw data (available at NCBI/ENA/DDBJ under project accession PRJNA835302) was inspected using FastQC v0.11.5 (Babraham Bioinformatics, UK) and overrepresented sequences identified by nucleotide BLAST in April 2021 (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Reads were adapter- and quality-trimmed with Trimmomatic v0.36 [85] before further usage.
Whole-transcript sequencing with Oxford Nanopore Technologies
We sampled B. hordei-infected barley abaxial leaf epidermis at 144 hpi and used TRIzol (Thermo Scientific, Karlsruhe, Germany) for RNA isolation according to the manufacturer’s instructions, as above [84]. Then, a polyA-enriched sequencing library was generated using the Nanopore direct RNA sequencing kit (SQK-RNA002; Oxford Nanopore Technologies, Oxford, United Kingdom). The library was sequenced using MinION (ONT) technology with a R9.4.1 flow cell. Base-calling was performed via the MinKNOW app v5.5.10. Long reads were mapped using Minimap2 v2.15 [86] and the mapping quality assessed with pycoQC v2.5.0.3 [87].
Quantitative reverse transcriptase polymerase chain reaction (qRT-PCR)
We collected barley leaf epidermal peelings (see above) at 0, 6, 12, 24, 48, 72, and 96 hpi in three biological replicates. RNA isolation and quality control were done as described above. Complementary DNA (cDNA) was synthesized using the High-Capacity RNA-to-cDNA Kit (Applied Biosystems-Thermo Fisher, Schwerte, Germany) and stored at -20 °C. The cDNA was diluted 1:10 prior to further experiments. We conducted qRT-PCRs with the Takyon No ROX SYBR MasterMix blue dTTP Kit (Eurogentec, Seraing, Belgium) in a LightCycler 480 II (Roche, Basel, Switzerland). The PCR efficiency of all primers used in this study (Supplementary Table 9) was between 1.8 and 2.0 and the annealing temperature was set to 58 °C. Melting curve analysis validated in each case the synthesis of single amplicons. Expression levels of target TEs was evaluated relative to the housekeeping gene GAPDH (BLGH_00691) of B. hordei [88]. We calculated relative transcript abundance with 2−(CT(target) −CT(GAPDH)) according to the ΔΔCT method [89].
Annotation of TEs
We used the EDTA TE annotation pipeline, which integrates homology-based and de novo structure-based TE identification [90], to discover TE elements in the genome of B. hordei isolate DH14 [10]. Next, we renamed unclassified TE elements from the initial EDTA-generated library using RepeatMasker [91] with the Blumeria TE consensus sequences from RepBase v20181026 as database. Then, we manually curated the TE consensus database, as described before [35]. Briefly, we calculated a multiple sequence alignment of TE consensus sequences with MAFFT v7.475 [92] and clustered sequences, removed gaps, and deleted divergent sites using T-COFFEE v11.00.8cbe486 [93] and CIAlign v1.0.18 [94]. We manually adjusted overhangs to define TE boundaries with the help of AliView [95] and TE-Aid v.0-dev [92]. Then, the manually curated TE database, RepBase v20181026, and the SINE sequences for Eg-R1 (X86077.1) and Egh24 (Z21962.1) extracted from GenBank (absent in RepBase) were merged and CD-HIT-EST [96] was used to eliminate duplicated sequences. We then conducted genome-wide re-annotation of TEs in the genome of B. hordei DH14 with RepeatMasker [91] using the manually curated TE consensus library as database.
Detection and annotation of lncRNAs
First, we mapped the trimmed total RNA-seq read data to the B. hordei isolate DH14 genome [10] using HISAT2 [45]. We extracted mapping reads for transcriptome assembly with StringTie [43] and then filtered out coding genes from the transcriptome via Gffcompare [46], transcripts shorter than 200 bp using Gffread [46], transcripts with coding potential using CPC2 [47], and transcripts accounting for ribosomal RNA, transfer RNA, small nuclear RNA, and small nucleolar RNA via CMscan search against the Rfam database [48]. Then, we performed lncRNA annotation and classification with FEELnc [49], predicting 17,226 putative lncRNAs in the B. hordei genome. We used a WebApollo instance [44] to manually inspect the predicted lncRNA models, and to annotate all possible transcript isoforms for each lncRNA gene. For analysis of the distribution of lncRNAs in the genome of B. hordei DH14, we generated density maps of coding genes, lncRNAs, and TEs in 100-kb windows using BEDtools v2.25.0 [97] and visualized these data using the R package Rcircos v1.2.2 [98].
Functional annotation of newly discovered gene models in B. hordei
Peptide sequences deriving from gene models newly discovered in the genome of B. hordei DH14 [10] during annotation of lncRNAs were functionally analyzed for signal peptides using SignalP5.0 [99], transmembrane domains via TMHMM v2 [100], sequence homology using BLASTP (https://blast.ncbi.nlm.nih.gov accessed May 2023), and functional domains using InterProScan v5.62–94.0 [101] and hmmscan via the HMMER web server v2.41.2. accessed May 2023 [102, 103].
Cloning of lncRNA candidates
We amplified lncRNAs from isolated RNA or cDNA from samples used for RNA-seq or qRT-PCR and lncRNA-specific primer pairs (Supplementary Table 7). The PCR products were purified using Phusion High-Fidelity DNA polymerase (New England Biolabs, Frankfurt a.M., Germany) and subsequently ligated into the vector system pCR-Blunt II-TOPO via the Zero Blunt™ TOPO™ PCR cloning kit (Thermo Fisher Scientific). The lncRNA sequences were confirmed by Sanger chain-termination sequencing provided by eurofins Genomics (Ebersberg, Germany).
Time-course transcript expression analysis with RNA-seq data
We mapped the trimmed total RNA-seq data to the genomes of B. hordei isolate DH14 [10] or H. vulgare cv. ‘Morex’ [34] using HISAT2 [45] with ‘--max-intronlen 500’ and parsed output files with SAMtools v1.9 [104] and BEDtools v2.25.0 [97]. Read count tables were generated using ballgown [105]. Normalized expression was calculated as transcripts per million (TPM) and transcripts at TPM < 1 were filtered prior to further analysis using R v4.1.2 [106]. We next used TCseq [37] to cluster transcripts (lncRNAs, mRNAs, and TEs) according to their time-course specific expression patterns. For identifying co-expression connection networks, we further used WGCNA with a soft threshold of 8 [55]. Then, the unsigned adjacency was calculated via Pearson correlation to assign transcripts into putative co-expression modules of at least 10 transcripts. The module-trait correlation was calculated with infection time points as traits. Network visualization was done using Cytoscape v3.9.1 [107]. Functional enrichment of mRNA gene sets was performed using gene ontology (GO) terms with the web tool ShinyGO v0.77 [51]; GO terms were summarized by using REVIGO [52].
Availability of data and materials
All raw RNA sequencing data generated in this study are deposited at https://www.ncbi.nlm.nih.gov/sra under BioProject ID PRJNA835302. The codes used for the TE identification pipeline and the B. hordei TE consensus database generated in this work are available at https://github.com/qjiangzhao/TransposableElement-identification-alpha. All other data generated or analysed during this study are included in this published article and its supplementary information files.
Abbreviations
- cv.:
-
Cultivar
- GO:
-
Gene ontology
- hpi:
-
Hours post inoculation
- KEGG:
-
Kyoto Encyclopedia of Genes and Genomes
- LINE:
-
Long interspersed nuclear element
- lncRNA:
-
Long noncoding RNA
- LTR:
-
Long terminal repeat
- mRNA:
-
Messenger RNA
- NMDS:
-
Non-metric multidimensional scaling
- ONT:
-
Oxford Nanopore Technologies
- PC:
-
Principal component
- phasiRNA:
-
Phased secondary small interfering RNAs
- qRT-PCR:
-
Quantitative reverse transcriptase polymerase chain reaction
- RIP:
-
Repeat-induced point mutation
- RNAi:
-
RNA interference
- RNA-seq:
-
Whole-transcriptome shotgun sequencing
- RPKM:
-
Reads per kilobase of transcript per million mapped reads
- SINE:
-
Short interspersed nuclear element
- sRNA:
-
Small RNA
- TE:
-
Transposable element
- TPM:
-
Transcripts per million
- WGCNA:
-
Weighted gene co-expression network analysis
References
Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–82.
Lanciano S, Cristofari G. Measuring and interpreting transposable element expression. Nat Rev Genet. 2020;21:721–36.
Chénais B, Caruso A, Hiard S, Casse N. The impact of transposable elements on eukaryotic genomes: From genome size increase to genetic adaptation to stressful environments. Gene. 2012;509:7–15.
Chuong EB, Elde NC, Feschotte C. Regulatory activities of transposable elements: From conflicts to benefits. Nat Rev Genet. 2017;18:71–86.
Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet. 2007;8:272–85.
Glawe DA. The powdery mildews: A review of the world’s most familiar (yet poorly known) plant pathogens. Annu Rev Phytopathol. 2008;46:27–51.
Braun U, Cook RTA. Taxonomic Manual of the Erysiphales (Powdery Mildews). Utrecht, The Netherlands: CBS-KNAW Fungal Biodiversity Centre; 2012.
Kusch S, Qian J, Loos A, Kümmel F, Spanu PD, Panstruga R. Long‐term and rapid evolution in powdery mildew fungi. Mol Ecol. 2023;mec.16909. https://doi.org/10.1111/mec.16909.
Spanu PD, Abbott JC, Amselem J, Burgis TA, Soanes DM, Stüber K, et al. Genome expansion and gene loss in powdery mildew fungi reveal tradeoffs in extreme parasitism. Science. 2010;330:1543–6.
Frantzeskakis L, Kracher B, Kusch S, Yoshikawa-Maekawa M, Bauer S, Pedersen C, et al. Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen. BMC Genomics. 2018;19:381.
Nottensteiner M, Zechmann B, McCollum C, Hückelhoven R. A barley powdery mildew fungus non-autonomous retrotransposon encodes a peptide that supports penetration success on barley. J Exp Bot. 2018;69:3745–58.
Sabelleck B, Panstruga R. Novel jack-in-the-box effector of the barley powdery mildew pathogen? J Exp Bot. 2018;69:3511–4.
Gladyshev E. Repeat-induced point mutation and other genome defense mechanisms in fungi. Fungal Kingd. American Society of Microbiology; 2017. p. 687–99. Available from: http://www.asmscience.org/content/book/10.1128/9781555819583.chap33.
Frantzeskakis L, Németh MZ, Barsoum M, Kusch S, Kiss L, Takamatsu S, et al. The Parauncinula polyspora draft genome provides insights into patterns of gene erosion and genome expansion in powdery mildew fungi. mBio. 2019;10:381.
Kusch S, Singh M, Thieron H, Spanu PD, Panstruga R. Site-specific analysis reveals candidate cross-kingdom small RNAs, tRNA and rRNA fragments, and signs of fungal RNA phasing in the barley–powdery mildew interaction. Mol Plant Pathol. 2023;24:570–87.
Kusch S, Frantzeskakis L, Thieron H, Panstruga R. Small RNAs from cereal powdery mildew pathogens may target host plant genes. Fungal Biol. 2018;122:1050–63.
Quinn JJ, Chang HY. Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet. 2015;17:47–62.
Ulitsky I, Bartel DP. lincRNAs: Genomics, evolution, and mechanisms. Cell. 2013;154:26–46.
Zhang X, Wang W, Zhu W, Dong J, Cheng Y, Yin Z, et al. Mechanisms and functions of long non-coding RNAs at multiple regulatory levels. Int J Mol Sci. 2019;20:5573.
Bologna NG, Voinnet O. The diversity, biogenesis, and activities of endogenous silencing small RNAs in Arabidopsis. Annu Rev Plant Biol. 2014;65:473–503.
Mattick JS, Amaral PP, Carninci P, Carpenter S, Chang HY, Chen L-L, et al. Long non-coding RNAs: Definitions, functions, challenges and recommendations. Nat Rev Mol Cell Biol. 2023;24:430–47.
Zhao X-Y, Lin JD. Long noncoding RNAs: A new regulatory code in metabolic control. Trends Biochem Sci. 2015;40:586–96.
Choi G, Jeon J, Lee H, Zhou S, Lee Y-H. Genome-wide profiling of long non-coding RNA of the rice blast fungus Magnaporthe oryzae during infection. BMC Genomics. 2022;23:132.
Wang Y, Ye W, Wang Y. Genome-wide identification of long non-coding RNAs suggests a potential association with effector gene transcription in Phytophthora sojae. Mol Plant Pathol. 2018;19:2177–86.
Wang Z, Jiang Y, Wu H, Xie X, Huang B. Genome-wide identification and functional prediction of long non-coding RNAs involved in the heat stress response in Metarhizium robertsii. Front Microbiol. 2019;10:2336.
Kopp F, Mendell JT. Functional classification and experimental dissection of long noncoding RNAs. Cell. 2018;172:393–407.
Kalwa M, Hänzelmann S, Otto S, Kuo C-C, Franzen J, Joussen S, et al. The lncRNA HOTAIR impacts on mesenchymal stem cells via triple helix formation. Nucleic Acids Res. 2016;44:10631–43.
Jampala P, Garhewal A, Lodha M. Functions of long non-coding RNA in Arabidopsis thaliana. Plant Signal Behav. 2021;16:1925440.
Till P, Mach RL, Mach-Aigner AR. A current view on long noncoding RNAs in yeast and filamentous fungi. Appl Microbiol Biotechnol. 2018;102:7319–31.
Li J, Liu X, Yin Z, Hu Z, Zhang K-Q. An overview on identification and regulatory mechanisms of long non-coding RNAs in fungi. Front Microbiol. 2021;12:638617.
Kalem MC, Panepinto JC. Long non-coding RNAs in Cryptococcus neoformans: Insights into fungal pathogenesis. Front Cell Infect Microbiol. 2022;12:858317.
Lv Y, Hu F, Zhou Y, Wu F, Gaut BS. Maize transposable elements contribute to long non-coding RNAs that are regulatory hubs for abiotic stress response. BMC Genomics. 2019;20:864.
Hacquard S, Kracher B, Maekawa T, Vernaldi S, Schulze-Lefert P, van Themaat EVL. Mosaic genome structure of the barley powdery mildew pathogen and conservation of transcriptional programs in divergent hosts. Proc Natl Acad Sci U S A. 2013;110:E2219–28.
Mascher M, Wicker T, Jenkins J, Plott C, Lux T, Koh CS, et al. Long-read sequence assembly: A technical evaluation in barley. Plant Cell. 2021;33:1888–906.
Goubert C, Craig RJ, Bilat AF, Peona V, Vogan AA, Protasio AV. A beginner’s guide to manual curation of transposable elements. Mob DNA. 2022;13:7.
Labbé J, Murat C, Morin E, Tuskan GA, Le Tacon F, Martin F. Characterization of transposable elements in the ectomycorrhizal fungus Laccaria bicolor. PLoS ONE. 2012;7:e40197.
Wu M, Gu L. TCseq: Time course sequencing data analysis. 2022.
Nicolás FE, Ruiz-Vázquez RM. Functional diversity of RNAi-associated sRNAs in fungi. Int J Mol Sci. 2013;14:15348–60.
Hunt M, Banerjee S, Surana P, Liu M, Fuerst G, Mathioni S, et al. Small RNA discovery in the interaction between barley and the powdery mildew pathogen. BMC Genomics. 2019;20:610.
Gebert D, Hewel C, Rosenkranz D. unitas: The universal tool for annotation of small RNAs. BMC Genomics. 2017;18:644.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative Genomics Viewer. Nat Biotechnol. 2011;29:24–6.
Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–5.
Lee E, Helt GA, Reese JT, Munoz-Torres MC, Childers CP, Buels RM, et al. Web Apollo: A web-based genomic annotation editing platform. Genome Biol. 2013;14:R93.
Kim D, Langmead B, Salzberg SL. HISAT: A fast spliced aligner with low memory requirements. Nat Methods. 2015;12:357–60.
Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res. 2020;9:304.
Kang Y-J, Yang D-C, Kong L, Hou M, Meng Y-Q, Wei L, et al. CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 2017;45:W12–6.
Kalvari I, Argasinska J, Quinones-Olvera N, Nawrocki EP, Rivas E, Eddy SR, et al. Rfam 13.0: Shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 2018;46:D335–42.
Wucher V, Legeai F, Hédan B, Rizk G, Lagoutte L, Leeb T, et al. FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 2017;45:e57.
Kusch S, Ahmadinejad N, Panstruga R, Kuhn H. In silico analysis of the core signaling proteome from the barley powdery mildew pathogen (Blumeria graminis f.sp. hordei). BMC Genomics. 2014;15:843.
Ge SX, Jung D, Yao R. ShinyGO: A graphical gene-set enrichment tool for animals and plants. Bioinformatics. 2020;36:2628–9.
Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of Gene Ontology terms. PLoS ONE. 2011;6:e21800.
Pedersen C, van Themaat EVL, McGuffin LJ, Abbott JC, Burgis TA, Barton G, et al. Structure and evolution of barley powdery mildew effector candidates. BMC Genomics. 2012;13:694.
Spanu PD. Cereal immunity against powdery mildews targets RNase-Like Proteins associated with Haustoria (RALPH) effectors evolved from a common ancestral gene. New Phytol. 2017;213:969–71.
Langfelder P, Horvath S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.
Bindschedler LV, Panstruga R, Spanu PD. Mildew-Omics: How global analyses aid the understanding of life and evolution of powdery mildews. Front Plant Sci. 2016;7:123.
Barsoum M, Kusch S, Frantzeskakis L, Schaffrath U, Panstruga R. Ultraviolet mutagenesis coupled with next-generation sequencing as a method for functional interrogation of powdery mildew genomes. Mol Plant Microbe Interact. 2020;33:1008–21.
Till P, Pucher ME, Mach RL, Mach-Aigner AR. A long noncoding RNA promotes cellulase expression in Trichoderma reesei. Biotechnol Biofuels. 2018;11:78.
Fouché S, Oggenfuss U, Chanclud E, Croll D. A devil’s bargain with transposable elements in plant pathogens. Trends Genet. 2021;38:222–30.
Müller MC, Praz CR, Sotiropoulos AG, Menardo F, Kunz L, Schudel S, et al. A chromosome-scale genome assembly reveals a highly dynamic effector repertoire of wheat powdery mildew. New Phytol. 2019;221:2176–89.
Saur IML, Panstruga R, Schulze-Lefert P. NOD-like receptor-mediated plant immunity: From structure to cell death. Nat Rev Immunol. 2021;21:305–18.
Maeda T, Kobayashi Y, Kameoka H, Okuma N, Takeda N, Yamaguchi K, et al. Evidence of non-tandemly repeated rDNAs and their intragenomic heterogeneity in Rhizophagus irregularis. Commun Biol. 2018;1:87.
Dallaire A, Manley BF, Wilkens M, Bista I, Quan C, Evangelisti E, et al. Transcriptional activity and epigenetic regulation of transposable elements in the symbiotic fungus Rhizophagus irregularis. Genome Res. 2021;31:2290–302.
Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, Bourque G, et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 2013;9:e1003470.
Cho J. Transposon-derived non-coding RNAs and their function in plants. Front Plant Sci. 2018;9:600.
Ariel FD, Manavella PA. When junk DNA turns functional: Transposon-derived non-coding RNAs in plants. J Exp Bot. 2021;72:4132–43.
Johnson R, Guigó R. The RIDL hypothesis: Transposable elements as functional domains of long noncoding RNAs. RNA. 2014;20:959–76.
Fort V, Khelifi G, Hussein SMI. Long non-coding RNAs and transposable elements: A functional relationship. Biochim Biophys Acta BBA - Mol Cell Res. 2021;1868:118837.
Blein T, Balzergue C, Roulé T, Gabriel M, Scalisi L, François T, et al. Landscape of the noncoding transcriptome response of two Arabidopsis ecotypes to phosphate starvation. Plant Physiol. 2020;183:1058–72.
Wang D, Qu Z, Yang L, Zhang Q, Liu Z, Do T, et al. Transposable elements (TEs) contribute to stress-related long intergenic noncoding RNAs in plants. Plant J. 2017;90:133–46.
Kornienko AE, Nizhynska V, Morales AM, Pisupati R, Nordborg M. Population-level annotation of lncRNAs in Arabidopsis reveals extensive expression variation associated with transposable element-like silencing. Plant Cell. 2023;koad233. https://doi.org/10.1093/plcell/koad233.
Fouché S, Badet T, Oggenfuss U, Plissonneau C, Francisco CS, Croll D. Stress-driven transposable element de-repression dynamics and virulence evolution in a fungal pathogen. Mol Biol Evol. 2020;37:221–39.
Kusch S, Larrouy J, Ibrahim HMM, Mounichetty S, Gasset N, Navaud O, et al. Transcriptional response to host chemical cues underpins the expansion of host range in a fungal plant pathogen lineage. ISME J. 2022;16:138–48.
Hill R, Buggs RJA, Vu DT, Gaya E. Lifestyle transitions in fusarioid fungi are frequent and lack clear genomic signatures. Mol Biol Evol. 2022;39:msac085.
O’Connell RJ, Thon MR, Hacquard S, Amyotte SG, Kleemann J, Torres MF, et al. Lifestyle transitions in plant pathogenic Colletotrichum fungi deciphered by genome and transcriptome analyses. Nat Genet. 2012;44:1060–5.
Both M, Eckert SE, Csukai M, Müller E, Dimopoulos G, Spanu PD. Transcript profiles of Blumeria graminis development during infection reveal a cluster of genes that are potential virulence determinants. Mol Plant Microbe Interact. 2005;18:125–33.
Lo Presti L, Lanver D, Schweizer G, Tanaka S, Liang L, Tollot M, et al. Fungal effectors and plant susceptibility. Annu Rev Plant Biol. 2015;66:513–45.
Stergiopoulos I, De Wit PJGM. Fungal effector proteins. Annu Rev Phytopathol. 2009;47:233–63.
Liao W, Nielsen ME, Pedersen C, Xie W, Thordal-Christensen H. Barley endosomal MONENSIN SENSITIVITY1 is a target of the powdery mildew effector CSEP0162 and plays a role in plant immunity. J Exp Bot. 2023;74:118–29.
Schmidt SM, Kuhn H, Micali C, Liller C, Kwaaitaal M, Panstruga R. Interaction of a Blumeria graminis f. sp. hordei effector candidate with a barley ARF-GAP suggests that host vesicle trafficking is a fungal pathogenicity target: Blumeria graminis effector candidates. Mol Plant Pathol. 2014;15:535–49.
Cai J, Jiang Y, Ritchie ES, Macho AP, Yu F, Wu D. Manipulation of plant metabolism by pathogen effectors: More than just food. FEMS Microbiol Rev. 2023;47:fuad007.
Djamei A, Schipper K, Rabe F, Ghosh A, Vincon V, Kahnt J, et al. Metabolic priming by a secreted fungal effector. Nature. 2011;478:395–8.
Both M, Csukai M, Stumpf MPH, Spanu PD. Gene expression profiles of Blumeria graminis indicate dynamic changes to primary metabolism during development of an obligate biotrophic pathogen. Plant Cell. 2005;17:2107–22.
Li L, Collier B, Spanu P. Isolation of powdery mildew haustoria from infected barley. Bio-Protoc. 2019;9:e3299.
Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.
Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100.
Leger A, Leonardi T. pycoQC: Interactive quality control for Oxford Nanopore Sequencing. J Open Source Softw. 2019;4:1236.
Pennington HG, Li L, Spanu PD. Identification and selection of normalization controls for quantitative transcript analysis in Blumeria graminis. Mol Plant Pathol. 2015;17:625–33.
Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods. 2001;25:402–8.
Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275.
Smit AFA, Hubley R, Green P. Masker Open-4.0. 2013–2015. 2016; Available from: http://www.repeatmasker.org.
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.
Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–17.
Tumescheit C, Firth AE, Brown K. CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments. PeerJ. 2022;10:e12983.
Larsson A. AliView: A fast and lightweight alignment viewer and editor for large datasets. Bioinformatics. 2014;30:3276–8.
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–2.
Quinlan AR, Hall IM. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.
Zhang H, Meltzer P, Davis S. RCircos: An R package for Circos 2D track plots. BMC Bioinformatics. 2013;14:244.
Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, Brunak S, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol. 2019;37:420–3.
Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. Predicting transmembrane protein topology with a hidden markov model: Application to complete genomes. J Mol Biol. 2001;305:567–80.
Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics. 2014;30:1236–40.
Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011;7:e1002195.
Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 update. Nucleic Acids Res. 2018;46:W200–4.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol. 2015;33:243–6.
R Core Team. R: A language and environment for statistical computing. R Found Stat Comput Vienna Austria. 2018; Available from: http://www.r-project.org/.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504.
Acknowledgements
We are grateful to Xinyi Liu, Marion Müller, and Ralph Hückelhoven (TUM, Munich, Germany) for constructive feedback and collaboration on the TE consensus database. The analysis was performed with computing resources granted by RWTH Aachen University under project IDs rwth0146 and rwth0933.
Funding
Open Access funding enabled and organized by Projekt DEAL. Open Access funding enabled and organized by Projekt DEAL. This study was funded under the Excellence Strategy of the Federal Government and the Länder via the RWTH Aachen Exploratory Research Space (ERS) [RWTH start-up grant StUpPD362-20 to S.K.] and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) project number 274444799 [grant 861/14–2 awarded to R.P.] in the context of the DFG-funded priority program SPP1819 “Rapid evolutionary adaptation – potential and constraints”.
Author information
Authors and Affiliations
Contributions
S.K. conceived and designed the study; J.Q. and S.K. were responsible for experiment conception. J.Q. and F.K. collected samples for RNA isolation and performed RNA purification. M.E. performed qRT-PCRs of TEs; J.Q. analyzed the RNA-seq data, developed the lncRNA identification pipeline, and cloned selected lncRNAs. H.M.M.I. conceived and performed the alternative splicing analysis and identified lncRNA splice variants. S.K. and J.Q. drafted the figures. R.P. provided lab space and consumables, S.K. funding for the RNA-seq experiment. S.K. and J.Q. wrote the first draft of the manuscript, all authors contributed to editing. All authors read the manuscript and approved its final version.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Supplementary Fig. 1.
Integrated pipeline for high-quality genome-wide TE annotation. We initially detected transposable elements (TEs) the genome assembly of B. hordei isolate DH14 [10] using the extensive de novo annotator (EDTA) pipeline [90], which enables de novo discovery of long terminal repeat (LTR), terminal inverted repeat (TIR), DNA helitron, and long interspersed nuclear element (LINE) TEs. We performed an initial curation of the database using the Blumeria TE consensus sequences from RepBase v20181026 via RepeatMasker [91]. Then, we manually curated the TE consensus database, as described in [35]. Briefly, we generated the consensus repeat database via multiple sequence alignment with MAFFT v7.475 [92] and clustered sequences, removed gaps, and deleted divergent sites using T-COFFEE v11.00.8cbe486 [93] and CIAlign v1.0.18 [94]. We manually adjusted overhangs to define TE boundaries with the help of AliView [95] and TE-Aid v.0-dev [92]. Blumeria TE consensus sequences from RepBase v20181026 and short interspersed nuclear element (SINE) TE sequences extracted from GenBank accessions X86077.1 (Eg-R1) and Z21962.1 (Egh24) were merged with the manually curated database and duplicated sequences subsequently eliminated with CD-HIT-EST [96]. Genome-wide re-annotation of TEs in the genome assembly of B. hordei isolate DH14 [10] was done with RepeatMasker [91] using the manually curated repeat consensus library as database (Table 1; Supplementary Table 2). The number of annotated TEs at various stages of the pipeline is indicated in grey and the number of TEs without an annotation, denoted as unknown, is shown in red.
Additional file 2:
Supplementary Fig. 2. B. hordei TEs exhibit time point-specific upregulation during host infection. We performed quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis of eleven selected TE families. The dot plots show the relative transcript abundance (according to ΔΔCT analysis; y-axis) of the respective TE family, indicated on the top of each panel, in B. hordei isolate K1AC at seven time points of host infection (hours post inoculation (hpi); x-axis). TE transcript levels were normalized to B. hordei GAPDH (BLGH_00691); we conducted three independent replicates (n = 3) consisting of three technical replicates each. The shades of green indicate the replicate each data point belongs to; the black bar shows the median.
Additional file 3: Supplementary Fig. 3.
B. hordei lncRNAs are ubiquitously distributed throughout the genome. The circos plot displays the density of coding genes (purple), lncRNAs (orange), and TEs (green) on all scaffolds larger than 500 kb in the genome assembly of B. hordei DH14 [10]. We generated density maps of B. hordei DH14 coding genes, lncRNAs, and TEs in 100-kb windows using BEDtools v2.25.0 [97]; these windows are plotted as bar plots in the three tracks. The data was visualized using the R package Rcircos v1.2.2 [98].
Additional file 4: Supplementary Fig. 4.
B. hordei ROPIP1 expression peaks in appressoria prior to haustorium formation. A The ROPIP1 gene, which encodes a host-translocated peptide that disturbs barley microtubules [11], locates to the Eg-R1 consensus sequence (654 bp in length). The upper lane displays the annotated ROPIP1 transcript on Eg-R1, indicated in magenta. The scale below the black horizontal line indicates the TE length and position in bp. B The ROPIP1 and Eg-R1 expression patterns throughout the infection cycle of B. hordei is shown as dot plots, as indicated on top of each plot. Expression values are indicated as transcripts per million (TPM; y-axis) during six time points of host infection (x-axis; see Fig. 1A). The black bar shows the median of three independent replicates (n = 3).
Additional file 5: Supplementary Fig. 5.
Components of the RNAi machinery in B. hordei are expressed in a time point-dependent manner. The left panel shows the simplified canonical RNAi pathway, excluding microRNA biogenesis. Briefly, the RNA-dependent RNA polymerase (RdRP) binds single-stranded RNA (ssRNA) and associates with a helicase (quelling-deficient 3; QDE-3) and replication factor A1 (RPA1). Upon synthesis of the complementary strand to generate double-stranded RNA (dsRNA), Dicer binds and generates 20–30 nucleotides-long short interfering RNAs (siRNAs). Subsequently, AGO (syn. QDE-2 in Neurospora crassa), the AGO-binding protein QDE-2 interacting protein (QIP) and the RISC nuclease (RISCn) are recruited to form the pre-RNA-induced silencing complex (RISC). The mature RISC is formed upon release of the complementary strand, leaving the target siRNA strand, which facilitates complementary binding to target RNA molecules. The RISC complex then catalyzes target mRNA degradation or RNAi-directed DNA methylation by recruitment of DNA methylase (DNMT). The respective B. hordei orthologs are assigned on the right panel; dotted lines indicate B. hordei paralogs of RPA2 (BLGH_03758) and RPA3 (BLGH_02795) and a Piwi domain-containing protein with similarity to Dicer (BLGH_05601) whose homology and role in RNAi are unclear [16]. The right panel shows a heatmap of normalized and time point-scaled relative expression of the ortholog genes encoding RNAi components in B. hordei; purple indicates high relative expression compared to average throughout the time points. The maximum expression throughout all time points in transcripts per million (TPM) is shown on the right with green circles; the respective TPM values are indicated on the right from the circles.
Additional file 6: Supplementary Fig. 6.
H. vulgare displays infection stage-dependent expression patterns of coding genes upon infection with B. hordei. A We clustered time-resolved coding gene expression patterns in the host H. vulgare cv. ‘Margret’ upon infection with B. hordei K1AC using TCseq [37]. The RNA-seq data was mapped to the H. vulgare cv. ‘Morex’ reference genome [34] using HISAT2 [45] with ‘–max-intronlen 500’ and parsed output using SAMtools v1.9 [104] and BEDtools v2.25.0 [97]. The lines represent single transcripts; the color shade denotes the cluster membership by Spearman correlation (the darker the shade of blue, the higher the correlation R with the respective expression cluster according to the color scheme). The numbers at the bottom-right of each plot indicates the number of genes that made up each of the time course clusters. The x-axis denotes the time points of infection in hours post inoculation (hpi; Fig. 1A), which were 0 hpi (conidiospore germination), 6 hpi (appressorium formation), 18 hpi (host cell penetration), 24 hpi (haustorium formation), 72 hpi (epiphytic colonization), and 120 hpi (conidiogenesis). The y-axis indicates the relative z-score based on reads per kb of transcript per million mapped reads (RPKM). B We performed gene ontology (GO) enrichment analysis for the H. vulgare coding genes from each cluster (A) using ShinyGO v0.77 [51] accessed online at http://bioinformatics.sdstate.edu/go/, and summarized GO terms with REVIGO [52]. The dot plot displays the fold enrichment of functional terms compared to the full set of genes of H. vulgare cv. ‘Morex’ (dot size); the fill color indicates the − Log10(FDR-adjusted enrichment p value) according to the color scheme on the right. The GO term accessions are indicated on the x-axis; colored dots denote summarizing GO descriptions indicated in the legend (top-right). The respective co-expression cluster (A) is indicated on the y-axis.
Additional file 7: Supplementary Fig. 7.
Co-expression analysis of TEs, coding genes, and lncRNAs in B. hordei. A We built co-regulation networks using WGCNA [55] using the time course expression data of B. hordei coding genes (mRNAs), lncRNAs, and consensus TEs. The time points of infection were 0 hpi (spore germination), 6 hpi (appressorium formation), 18 hpi (early primary haustorium), 24 hpi (mature primary haustorium), 72 hpi (host colonization), and 120 hpi (conidia formation). The twenty identified co-expression clusters are indicated on the y-axis and the time points on the x-axis. The column SP indicates the number of genes encoding putative secreted proteins (SPs) in each cluster. The Pearson correlation coefficient of each cluster with the respective time point is shown on a scale from green to red, indicating negative to positive correlation. The exact Pearson correlation value and the p value are shown for each cluster and time point. B The colored circles indicate transcripts and lines significant correlation between two transcripts. The circles are colored according to the WGCNA-assigned cluster colors (A); SP indicates the number of genes encoding putative secreted proteins in the respective co-expression cluster. The clusters correspond to the clusters in Fig. 6.
Additional file 8: Supplementary Table 1.
General statistics of the RNA-seq datasets generated for this study. Supplementary Table 2. TE composition in the B. hordei DH14 genome based on our manually curated transposon database. Supplementary Table 3. Functional annotations of the proteins encoded by the 43 coding genes newly discovered in the B. hordei genome. Supplementary Table 4. TPM expression values for mRNAs and lncRNAs of B. hordei K1AC during infection of H. vulgare cv. ‘Margret’. Supplementary Table 5. Gene ontology (GO) enrichment of B. hordei mRNAs induced at specific time points of infection. Supplementary Table 6. Gene ontology (GO) enrichment of H. vulgare mRNAs induced at specific time points of infection. Supplementary Table 7. Pearson correlation coefficient table for co-regulation of lncRNAs with mRNAs in B. hordei. Supplementary Table 8. Pearson correlation coefficient table for co-regulation of lncRNAs with consensus TEs in B. hordei. Supplementary Table 9. List of oligonucleotides used for PCR amplification and qRT-PCR.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Qian, J., Ibrahim, H.M.M., Erz, M. et al. Long noncoding RNAs emerge from transposon-derived antisense sequences and may contribute to infection stage-specific transposon regulation in a fungal phytopathogen. Mobile DNA 14, 17 (2023). https://doi.org/10.1186/s13100-023-00305-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13100-023-00305-6