Badnaviruses and banana genomes: a long association sheds light on Musa phylogeny and origin

Abstract Badnaviruses are double‐stranded DNA pararetroviruses of the family Caulimoviridae. Badnaviral sequences found in banana are distributed over three main clades of the genus Badnavirus and exhibit wide genetic diversity. Interestingly, the nuclear genome of many plants, including banana, is invaded by numerous badnaviral sequences although badnaviruses do not require an integration step to replicate, unlike animal retroviruses. Here, we confirm that banana streak viruses (BSVs) are restricted to clades 1 and 3. We also show that only BSVs from clade 3 encompassing East African viral species are not integrated into Musa genomes, unlike BSVs from clade 1. Finally, we demonstrate that sequences from clade 2 are definitively integrated into Musa genomes with no evidence of episomal counterparts; all are phylogenetically distant from BSVs known to date. Using different molecular approaches, we dissected the coevolution between badnaviral sequences of clade 2 and banana by comparing badnavirus integration patterns across a banana sampling representing major Musa speciation events. Our data suggest that primary viral integrations occurred millions of years ago in banana genomes under different possible scenarios. Endogenous badnaviral sequences can be used as powerful markers to better characterize the Musa phylogeny, narrowing down the likely geographical origin of the Musa ancestor.

Usually, pararetroviral integrations have no deleterious impact on their host plants because they are untranslatable sequences.
However, in some cases, integrated sequences contain a functional full-length viral genome that can be activated, leading to systemic infection of the host plant. Also known as infective integration, examples include Petunia vein clearing virus (genus Petuvirus) in petunia (Richert-Poggeler & Shepherd, 1997), Tobacco vein clearing virus (genus Solendovirus) in tobacco (Gregor et al., 2004), and banana streak viruses (BSVs; genus Badnavirus) in banana (Gayral et al., 2008;Harper et al., 1999;Ndowora et al., 1999). BSVs in banana are by far the most economically significant examples; indeed, bananaone of the oldest domesticated crops in the world-is ranked as the world's sixth most important food crop in terms of gross production value after cassava, potato, rice, wheat, and maize (FAOStat, 2014), and first among fruit crops.
Most modern banana cultivars arose via traditional selection processes (Perrier et al., 2011). The seedy progenitors of all domesticated banana cultivars are Musa acuminata (A genome) and Musa balbisiana (B genome) and, to a much lesser extent, Musa schizocarpa (S genome) and Musa textilis/Musa maclayi (T genome) (Carreel et al., 2002;Daniells et al., 2001a). M. acuminata exhibits large diversity based on morphological and molecular characters, and up to nine different subspecies are known (Christelova et al., 2011;Daniells et al., 2001a). M. balbisiana shows comparatively narrower diversity, with a more restricted centre of origin (Perrier et al., 2011).
Interestingly, infective endogenous BSV sequences (eBSV)-found exclusively in the M. balbisiana B genome to date-belong to three distinct BSV species: Banana streak GF virus, Banana streak IM virus, and Banana streak OL virus Gayral et al., 2008;Iskra-Caruana et al., 2010). Several reports have noted the presence of partial badnaviral sequences also in M. acuminata and/ or M. balbisiana genomes (Geering et al., 2001;Ndowora et al., 1999), but most significant was the description of 33 distinct groups of banana endogenous viruses (BEV) related to either M. acuminata or M. balbisiana genomes .
The genomes of badnaviruses contain three main open reading frames (ORFs), with the largest encoding a movement protein, a capsid protein, an aspartic protease, a reverse transcriptase (RTase), and a ribonuclease H (RNAse H). Badnavirus genetic diversity (based on partial sequences of the RTase and RNase H genes) in banana appears large and complex, as viral sequences generated to date are distributed over three different clades within the diversity of the genus Badnavirus (Gayral & Iskra-Caruana, 2009;Harper et al., 2005) (Figure 1). Importantly, the 11 full-length sequenced episomal BSV species responsible for banana streak disease described to date F I G U R E 1 Maximum-likelihood phylogeny based on RTase/RNase H region. Statistical aLRT SH-like branch supports given above nodes when >0.6. Virus or sequences names (GenBank numbers): BSUCV (AJ968464), BSUDV (AJ968465), BSUFV (AJ968469), BSUGV  (Geering et al., , 2011Harper & Hull, 1998;James et al., 2011

| Are badnaviral sequences from clades 2 and 3 integrated in Ugandan EAH AAA genotypes?
To establish whether badnavirus sequences corresponding to spe- Similar patterns were observed with a BSUFV probe ( Figure S1).
We also performed a parallel immunosorbent electron microscopy experiment on eight symptomless banana samples containing only clade 2 sequences (also analysed in the Southern blot experiment) and on nine samples with symptoms containing clade 3 and clade 2 sequences (including samples 7 and 8 in Figure 2, and sample 9 in Figure S1). As expected, viral particles were observed only for the samples with symptoms containing clade 3 sequences ( Figure S3).
Consequently, and given the absence of corresponding episomal virus in the samples tested so far and identity with any known EAH AAA banana genetic diversity encompasses the five following clone sets: Nakitembe, Musakala, Nakabulu, Nfuuka, and Nbide (beer cultivar).
viral species far below the threshold of 80%, we definitively conclude that clade 2 BSUDV and BSUFV in our tested banana samples are exclusively endogenous badnaviral-related sequences.
According to the PCR results (

| Are clade 2 badnaviral sequences present in the Musa diversity?
To We first analysed diversity of clade 2 badnaviral sequences using PCR primers specific for different clade 2 BEVs (Table 1) (Table 3).
All PCR product sequences were aligned to generate a phylogenetic tree (Figure 4). Our analysis included BEVs that are closely related to our sequences from among the different subgroups defined by Geering et al. (2005) (Table 3). As observed in the phylogenetic tree (Figure 4), BEV UF is divided into two subgroups; one is more closely linked to BEV UD (nucleotide identity c.85%), which could be derived from BEV UF.

Species
Other species of the family Musaceae   close to our BEV groups are included.

| What can BEVs tell us about the badnavirus/ banana coevolution?
Badnaviral sequences linked to banana plants are distributed over three main clades (Figure 4). Surprisingly, they are as diverse as all the other viruses of the genus Badnavirus with which they share a same common ancestor. Interestingly, the clade to which these sequences belong is associated with a particular status (episomal and/ or integrated) as a result of specific interactions between the virus and its banana host.
Clade 1   . Although those eBSV exhibit a strong rearranged structure, with inverted and duplicated sequences attesting to past integration, pseudogenization has not progressed to the point where they can no longer reconstitute an infectious viral genome Chabannes & Iskra-Caruana, 2013;Iskra-Caruana et al., 2010). Because nucleic acid identity between a given eBSV and the corresponding BSV is >99%, it is likely that episomal BSOLV, BSGFV, and BSIMV observed now are due mainly, or exclusively, to the awakening of an endogenous counterpart Gayral et al., 2008).

F I G U R E 6
Schematic phylogenetic tree of BEV fixation events relative to speciation events within the family Musaceae constructed using nodes supported from the latest Musa phylogenies (Christelova et al., 2011;Janssens et al., 2016;Li et al., 2010). This is not too surprising considering that the number of initial integrations is low, and that only half of the genomes of these two sequenced plants are being examined, because each derives from a duplication of an initial haploid plant. On the other hand, because genomes A and B diverged 4.5 million years ago , the initial integration loci may have diverged sufficiently so as to no longer be identified during synteny analyses. Therefore, based on recently published banana phylogenies (Christelova et al., 2011;Janssens et al., 2016;Li et al., 2010), we propose a speculative scheme ( Figure (Table 3) Rhodochlamys and Eumusa (Christelova et al., 2011;Li et al., 2010).
According to estimates of species divergence times within the family Musaceae (Christelova et al., 2011)

| Extraction of genomic DNA from banana
Genomic DNA was extracted from fresh or frozen banana leaf tissue using the method of Gawel and Jarret (1991

| Rolling circle amplification
DNA was amplified using a TempliPhi Amplification kit (GE Healthcare) following the protocol described by James et al. (2011).
Reaction products were digested using 2 U of different restriction endonucleases (Promega), according to the manufacturer's instructions, and then separated by electrophoresis in 1% agarose gels.

| Phylogenetic analysis
Badnaviral sequences were aligned using the MAFFT software algorithm (Katoh & Standley, 2013). Phylogenetic trees were constructed using the maximum-likelihood method with PhyML 3.0 (Guindon et al., 2010) and visualized using Darwin 5 software (Perrier et al., 2003). The robustness of trees was tested with aLRT-SH-like statistical support (Anisimova et al., 2011). The new sequences produced during this work have GenBank accession numbers KJ720037-KJ720154 and KJ734678-KJ734703.

ACK N OWLED G M ENTS
We are very grateful to Jerome Kubiriba from the National Agricultural Research Organisation (NARO), Jim Lorenzen (previously IITA), and farmers in Uganda for assistance with field sampling. We would like to thank the Guadeloupe Centre de Ressources Biologiques Plantes Tropicales and especially Nilda Paulo de la Reberdière and Danièle Roques for providing plant material. We also thank CIRAD for funding.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available