Small RNAs from the plant pathogenic fungus Sclerotinia sclerotiorum highlight candidate host target genes associated with quantitative disease resistance

Plant pathogenic fungi secrete effector proteins and secondary metabolites to cause disease. Additionally, some produce small RNAs (sRNAs) that silence transcripts of host immunity genes through RNA interference. The fungus Sclerotinia sclerotiorum infects over 600 plant species, but little is known about its molecular interactions with its hosts. In particular, the role of sRNAs in S. sclerotiorum pathogenicity has not been determined. By sequencing sRNAs in vitro and during infection of two host species, we found that S. sclerotiorum produces at least 374 highly abundant sRNAs. These sRNAs mostly originated from polymorphic repeat-rich genomic regions. Predicted gene targets of these sRNAs, from 10 different host species, were enriched for immunity-related functional domains. Predicted A. thaliana gene targets of S. sclerotiorum sRNAs were significantly more down-regulated during infection than other genes. A. thaliana gene targets were also more likely to contain single nucleotide polymorphisms (SNPs) associated with quantitative disease resistance. In conclusion, sRNAs produced by S. sclerotiorum are likely capable of silencing immunity components in multiple hosts. Prediction of fungal sRNA targets in host plant genomes can be combined with other global approaches, such as genome wide association studies and transcriptomics, to assist identification of plant genes involved in disease resistance.

produces sRNAs in vitro (11). Whether it also produces sRNAs during plant infection, and the potential roles of these sRNAs in plant infection are, thus far, undescribed.
We found that S. sclerotiorum produces 374 highly abundant sRNAs during infection of its two host species Phaseolus vulgaris and A. thaliana. In A. thaliana, predicted target genes were more likely to be significantly down-regulated during infection than other genes. Among the A. thaliana targets, there were significantly more genes associated with QDR by GWAS than expected by chance.
Our data indicate that plastic regions of the S. sclerotiorum genome generate transposable elementderived sRNAs that potentially target numerous plant genes likely involved in QDR. They also suggest that fungal sRNA sequencing and identification of their plant targets can be complementary to other available approaches for the search of genes controlling plant QDR.

Fungal cultures and inoculation of host plants
Sclerotinia sclerotiorum isolate 1980 was grown on potato dextrose agar plates for 4 days at 24°C.
Arabidopsis thaliana accession Col-0 and Phaseolus vulgaris genotype G19833 were grown for 4 weeks before inoculation, under controlled conditions at 22°C, under a light intensity of 120 µmol/m2/s for 9 hours per day. Five mm-wide plugs containing actively growing S. sclerotiorum mycelium were inoculated onto fully developed leaves. Inoculated plants were placed in Percival AR-41 chambers at 80% humidity under the same day/light conditions as for plant growth for the duration of the infection.
Samples for RNA extraction were harvested as described in (26), separating the centre and periphery of 2.5 mm wide disease lesions with a scalpel blade. downloaded from Rfam (version 13.0) and S. sclerotiorum non-coding RNAs downloaded from Rfam were discarded sequentially. Only reads that passed all of these steps and that were subsequently exactly mapped to the reference genome of S. sclerotiorum 1980 (30) were kept. The same procedure was followed for P. vulgaris using sRNA sequences predicted for this species obtained by (33). For differential expression analysis, the two filtering procedures for the two different hosts were both applied to the in vitro sample to create two different filtered in vitro datasets for analysis. This controlled for differences in filtering procedures that may cause artificial observation of changes in sRNA expression.

Differential expression analyses of small RNAs
Differential expression analysis was performed using DESeq version 1.22.11 (34). Each unique sRNA was considered a single entity with a raw read count in this analysis. All in planta samples were compared with in vitro samples using a negative binomial test. Unique reads that exhibited a P adjusted value of below 0.05 were considered differentially expressed.

Small RNA target prediction
To predict targets of S. sclerotiorum sRNAs, we used the psRNATarget online server (35). For the over-representation test of GO terms in A. thaliana, we used an E value of <= 3 resulting in 1,368 predicted sRNA target genes (1,789 including alternately spliced transcripts). For all other tests, we used a more stringent E value of <= 2.5 resulting in 408 predicted sRNA target genes (539 including alternately spliced transcripts). We used the same technique to predict targets of B. cinerea sRNAs in the A. thaliana genome. All other parameters were default. For all species in which we predicted sRNA targets, we first compared protein sequences against RepBase version 23.05 (36) using BLASTp. Any mRNAs whose proteins were homologous to sequences in RepBase with an e value of <= 1.0e -10 were not used in target prediction. This is because these proteins are likely host transposable element genes, which could be similar to fungal sRNAs by virtue of their shared evolutionary origins.

Test for over-representation of functional domains
To test for over-representation of Gene Ontology (GO) terms among A. thaliana predicted targets of S. sclerotiorum sRNAs, we used the R bioconductor package TopGO (37). We performed the test separately for the molecular function and biological process categories of GO terms. In each instance, we used terms with 5 or more annotated genes. P-values for GO and PFAM enrichment were obtained using Fisher's exact test and adjusted with the Benjamini-Hochberg correction (38). We also predicted targets in 10 different host transcriptomes from Phytozome. These included all species highlighted in supplementary figure 1. In all of these species, we performed an over-representation test of PFAM domains. To do this, we first created a list of all PFAM domains within the predicted annotations. We then created a list of PFAM domains present in predicted targets of sRNAs. The proportion of each PFAM domain in each set, either sRNA targets or non-targets, was compared using Fisher's exact test. The PFAM domains with a p value of < 0.05 and an increased proportion in sRNA target genes were considered significantly over-represented.

Messenger RNA sequencing and differential expression analysis
The expression of A. thaliana Col-0 genes during S. sclerotiorum infection corresponds to expression at the edge of necrotic lesions relative to non-challenged plants as reported in (26)

Test for down-regulated expression of fungal sRNA target genes in A. thaliana
To test whether predicted targets of fungal sRNAs in A. thaliana were more likely to be downregulated during infection, we assessed the log2(fold change) (LFC) in expression of these genes during fungal infection relative to controls. First, we compared the median LFC of all sRNA targets with non-targets using a Wilcoxon test. Second, we used a non-parametric randomisation test based on the tests described in (40). A randomisation in our test consisted of assigning the status of 'sRNA target' and 'non-sRNA target' to A. thaliana genes randomly and calculating the difference between mean LFC values (∆LFC). For each randomisation, we calculated ∆LFC as the mean LFC of non-sRNA targets minus the mean LFC of sRNA targets. The p values obtained from the test were the number of times ∆LFC was lower than the actual mean ∆LFC between sRNA targets and non-targets.
We performed this analysis on four data sets: 1) Predicted A. thaliana targets of S. sclerotiorum

Test for over-representation of significant quantitative disease resistance scores among sRNA targets
To determine whether targets of S. sclerotiorum sRNAs were more likely to be significantly associated with QDR, we used the previously assessed markers from (26). We considered only genes that contained SNP markers, and only the most significant (highest -log10(P)) SNP in each gene. First, we conducted a Wilcoxon rank sum test of difference in median -log10(P). Second, we used a similar randomisation test to the test used for down-regulated expression of sRNA targets. In this instance, each randomisation recorded the proportion of targets and non-targets that exhibited a -log10(P) value of above 1.3 (or an untransformed score of below 0.05). The score for each randomisation was the proportion of targets with a significant SNP minus the proportion of non-targets with a significant SNP. This was termed the ∆%HQS. The P value derived from this test was the proportion of randomisations with a ∆%HQS higher than the actual data. Third, we performed a Fisher's exact test of over-representation to test whether genes with significant SNPs were over-represented among genes that were predicted S. sclerotiorum sRNA targets.

Numerous Sclerotinia sclerotiorum sRNAs are highly expressed in two hosts
To identify S. sclerotiorum sRNAs that may be important for infection of host plants, we focused on sRNAs matching all of the following criteria: (i) corresponding to non-redundant sRNA sequences, (ii) of length of between 18 and 26 nucleotides, (iii) harbouring a 5' uridine, and (iv) exhibiting over 100 reads per million in planta, in all three replicates collected from both host plants (either centre or border samples). This identified 374 abundantly expressed S. sclerotiorum core sRNA sequences in total ( Figure 2A). Among these sRNAs, 322 were not significantly up-regulated during infection ( Figure 2B), and 52 sRNAs (14%) were significantly up-regulated during plant infection relative to in vitro ( Figure 2C). We did not find any sRNA reads that exhibited significant changes in abundance between the centres and borders of infection lesions (P adjusted > 0.05). To test whether the nature of the host plant affected the repertoire of sRNAs expressed by S. sclerotiorum, we considered sRNAs matching criteria i-iii above, and significantly up-regulated during infection of at least one host species relative to in vitro ( Figure 2A). This identified a total of 94 sRNAs induced in planta (P adjusted < 0.05) ( Figure 2D). Among these, 27 were up-regulated on A. thaliana only, 55 were upregulated on P. vulgaris only, and 12 were shared between both hosts.
Together, these data show that a number of sRNAs are highly abundant in planta and many are significantly up-regulated during infection. Fungal cells residing at the centres and borders of disease lesions expressed similar sRNAs. In many cases, significant up-regulation of sRNAs occurred specifically in one host. This conclusion should be taken with caution considering the lack of methods explicitly designed to quantify sRNA expression and the relatively low coverage in our in vitro samples which may bias differential expression analysis. Therefore, we focused the following analyses on the

Sclerotinia sclerotiorum sRNAs map to transposable element sequences and are in gene-poor polymorphic genome compartments
To determine loci from which the 374 highly abundant sRNAs originated in the S. sclerotiorum genome, we analysed overlapping annotations for genomic regions they mapped to, with a particular interest in the previously published REPET analysis of transposable elements in S. sclerotiorum (30).
We found that these 374 sRNAs mapped to 3,676 loci, with 356 mapping to more than one locus and 18 mapping to a single locus. We found that 3,669 (99 %) of sRNA loci (including multiple mappings) were within 526 genomic regions annotated as transposable element sequence. The highest percentage of sRNA loci overlaps (32.5 %) were with long interspersed nuclear elements (LINEs, To determine whether the sRNA loci were more polymorphic than other regions of the genome, we assessed the number of polymorphisms in 10 Kb sliding windows throughout the genome from a panel of 25 S. sclerotiorum isolates (BioRxiv: doi/xxx). We counted polymorphisms if at least one individual did not exhibit the reference genome allele. The median number of polymorphisms was 235 for windows including at least one sRNA locus, but only 174 for windows not containing sRNA loci (P < 0.001, W = 54084, Wilcoxon's rank sum test) (Figure 3 B). Strikingly, the proportion of windows showing 300 polymorphisms or more was 36% for windows including sRNA loci but only 12% otherwise. Because accurate mapping of short sequence reads to repetitive regions is challenging, we used variants that were filtered based on excess coverage and quality (BioRxiv: doi/xxx). Nevertheless, the excess of polymorphisms at sRNA loci, although robust, could have been over-estimated.
To determine the extent to which sRNA loci reside in gene-sparse regions, we calculated the distance between the 5' and 3' ends of all sRNA loci and the nearest gene borders, and for all S. sclerotiorum protein-coding genes, including predicted effector genes from (30) ( Figure 3C). The mean 5' distance to nearest gene was 4898.56 bp for sRNA loci and 2017.2 bp for protein-coding genes (P < 2.2e-16, W = 3053600). Similarly, the mean 3' distance to nearest gene was 4551.631 bp for sRNA loci and 2018.7 bp for protein-coding genes (P < 2.2e-16, W = 31125000). The genes encoding effector candidates also resided further from the nearest gene than other protein-coding genes, albeit to a lesser extent than sRNA loci. The mean distance to the nearest gene was 2451.0 bp for effector candidate genes on the 5' side (P = 0.01005, W = 442360) and 2977.9 bp on the 3' side (P = 5.37e-05, W = 480430). Together, these data indicate that S. sclerotiorum sRNAs are derived from gene-sparse repetitive regions that are more polymorphic than the rest of the genome.

Predicted targets of Sclerotinia sclerotiorum sRNAs in Arabidopsis thaliana are more likely to be down-regulated during infection
To support host gene silencing by S. sclerotiorum sRNAs, we compared the expression of A. As a complementary analysis, we performed a randomisation to test for the likelihood of finding a lower ∆LFC in random A. thaliana gene sets than observed for fungal sRNA targets. During B. cinerea infection, only 13 % (equivalent to a one tailed P value of 0.13) of random A. thaliana gene samples had a lower ∆LFC than was observed from targets of B. cinerea sRNAs ( Figure 5C), but 71% had a lower ∆LFC than targets of S. sclerotiorum sRNAs (~p=0.71, Figure 5D). Conversely, during S. sclerotiorum infection, 68% of random A. thaliana gene samples had a lower ∆LFC than targets of B. cinerea sRNAs (~p=0.68, Figure 5E), whereas only 1% had a lower ∆LFC than targets of S. sclerotiorum sRNAs (~p=0.01, Figure 5F). Together, these analyses suggest that S. sclerotiorum sRNAs could have a negative impact on the expression of their A. thaliana target genes during infection.

Predicted targets of Sclerotinia sclerotiorum sRNAs in Arabidopsis thaliana are more likely to be associated with quantitative disease resistance
To support a role in disease resistance for predicted targets of S. sclerotiorum sRNAs in A. thaliana, we tested for the association of these genes with a QDR phenotype to S. sclerotiorum measured in A. thaliana natural accessions. For this, we exploited the probability of association with QDR against S. sclerotiorum for 204,648 single nucleotide polymorphisms (SNPs), obtained through a genome wide association study (GWAS) in 84 European accessions of A. thaliana (26). We identified 122,034 SNPs distributed in 23,688 gene models, and determined a score of association per gene (-log10 of the p-value for the most significant SNP for each gene). First, we found that the median score of association for predicted targets of S. sclerotiorum sRNAs was 0.82, significantly higher than the median score for non-targets (median = 0.67, Wilcoxon's rank sum test P = 4.872e -05 , W = 4159300) ( Figure 6A). Genes with an association score >1.3 (corresponding to a pvalue < 0.05) represented 18.7% of S. sclerotiorum sRNA targets, but only 14.9% of other genes (Difference in proportion of high QDR score ∆%HQS= 3.8), Fisher's exact test P=0.04). Second, we performed a randomisation test to determine the likelihood of having a ∆%HQS ≥ 3.8 in random A. thaliana gene sets of the same size ( Figure 6B). We obtained a ∆%HQS < 3.8 in 98.1 % of randomisations indicating that S. sclerotiorum sRNA targets are significantly enriched in genes with an association score >1.3 (P = 0.0191).
To identify the most relevant targets of S. sclerotiorum sRNAs for plant resistance, we focused on the 65 A. thaliana predicted target genes showing (i) p<0.05 for differential expression upon S. sclerotiorum inoculation and (ii) p<0.05 for association with quantitative disease resistance in GWAS ( Figure 6C). In agreement with our previous analysis ( Figure 5), the majority of the 65 genes (39 genes, corresponding to 60%) were down regulated during S. sclerotiorum infection. Among the top 10 scores for association with QDR, 9 genes belonged to the list of 65, including 8 genes significantly down-regulated (from -1.74 to -5.03 fold) one gene significantly up-regulated (AT2G13690 encoding a PRLI-interacting factor of unknown function) and one gene that did not exhibit any change in expression (AT3G49400 encoding a Transducin/WD40 repeat-like protein of unknown function) ( Figure 6D, Table 1). Down-regulated genes included a MIDASIN1 homolog (AT1G67120) associated with the assembly of the 60S ribosomal subunit, a mitochondrial nucleotide exchange factor GrpE (AT4G26780) associated with tolerance to heat stress in A. thaliana, three genes encoding proteins of unknown function (AT4G36120, AT4G28600, AT1G22060) and three kinases.

DISCUSSION
In this study, we demonstrate that the broad host range plant pathogen S. sclerotiorum exhibits numerous abundant sRNAs expressed on two different hosts. These sRNAs exhibit characteristics typical of previously described filamentous pathogen sRNAs, such as a length distribution peaking between 20 and 25 nucleotides and a 5' uridine bias (13,14,42,43). This echoes findings of (11) who analysed sRNAs produced by S. sclerotiorum in vitro. The 5' uridine has been shown to be important for sRNA functions in distantly related species, such as Drosophila melanogaster and A. thaliana (44,45). In these species, the presence of a 5' uridine is important for directing the sRNAs to a specific Most of these sRNAs mapped to repetitive genomic regions, suggesting that sRNAs, like some fungal effectors, may be strongly associated with transposable elements. The repeat-rich regions that contain effector genes often exhibit a reduced overall gene content. This phenomenon appears to be convergent among distantly related plant pathogens in both the fungal and oomycete classes (9,(47)(48)(49)(50). In some species, effector genes within repeat-rich regions exhibit high levels of sequence diversification and tend to evolve more rapidly than the rest of the genome, an observation encapsulated in the 'two-speed' genome hypothesis (51). Clustering of repeats and effectors in specific genome niches probably results from selection against deleterious mutations in essential genes. Genome compartmentalisation seems most pronounced in fungi with a biotrophic or hemibiotrophic component to their lifestyles, which specialise on one or a few host species (49,50). In contrast, broad host-range necrotrophic fungal pathogens such as S. sclerotiorum and B. cinerea, exhibit relatively repeat poor genomes (30,52). Nevertheless, effector genes may also be associated with repeat sequences in necrotrophic fungi and fungi with a broad host range (5,(53)(54)(55). Secretome analyses highlighted a number of candidate effector-like proteins in the genome of S. sclerotiorum (30,56). Some of these predicted effector genes associate with repeat sequences. In this study, almost all sRNA loci were associated with transposable elements. Like effectors, S. sclerotiorum sRNA loci were, on average, further from gene sequences than genes were to each other. Although the number of polymorphisms in repetitive regions cannot be estimated with high accuracy, 10 Kb windows including sRNA loci were clearly more polymorphic than the rest of the genome. These findings indicate that S. sclerotiorum exhibits a fast-evolving, transposon-associated sRNA effector repertoire.
The fact that sRNA loci were highly polymorphic would support the hypothesis that these regions are an important component of adaptive evolution to the very diverse host environments that populations of broad host range fungi encounter in nature. It is possible to envisage a scenario under which random targeting of sRNAs to host immunity genes in one or more species confers a selective advantage. A rapid turnover of sRNAs and large potential for inhibition of host genes through random sequence matches could thus create a high adaptive potential of S. sclerotiorum populations.
Many of the predicted host targets of the sRNAs identified in this study exhibited domains previously shown to function in plant immune responses (57,58). Furthermore, upon infection with S. sclerotiorum, these target genes were far more likely to be down-regulated than non-targets. Although further functional assays would be required to draw firm conclusions, these data support the view that S. sclerotiorum actively suppresses host immunity genes with sRNAs.
To further understand the importance of trans-kingdom RNA silencing during plant infection with S. sclerotiorum we tested the degree of association with quantitative disease resistance for the predicted sRNA targets in A. thaliana exploiting a previous GWAS (26). The GWAS score was significantly higher in predicted S. sclerotiorum sRNA targets than non-targets. In the A. thaliana GWAS, the p-values of association with QDR remained, nevertheless, relatively modest for S. sclerotiorum sRNA targets (score ≤ 4.1, corresponding roughly to a false discovery rate of 1.0 e -06 ).
Therefore, these genes may not have been selected as relevant for QDR based on GWAS data alone.
However, the combination of GWAS, RNA sequencing, and fungal sRNA target predictions revealed novel candidate genes to be functionally characterised for association with QDR in the future. This combination of approaches may be useful in narrowing down candidate plant genes for functional characterisation in diverse plant-pathogen interactions, particularly in the absence of high resolution genetic maps and for species in which functional tests are challenging.