Genome-wide estimates of recombination, mutation, and positive selection inspire diverse drivers of Mycobacterium bovis

Thank you for visiting Nature. The browser version you are using has limited support for CSS. For the best experience, we recommend that you use a newer version of the browser (or turn off the compatibility mode in Internet Explorer). At the same time, to ensure continued support, we will display sites without styles and JavaScript.
Genome sequencing has revitalized the field of infectious disease research, revealing disease epidemiology, pathogenesis, host-pathogen interactions, and the evolutionary process imposed on pathogens. The Mycobacterium tuberculosis complex (MTBC) regards Mycobacterium bovis as one of its animal adaptive members that cause tuberculosis (TB) in terrestrial mammals, and is a typical model of bacterial evolution. Like other MTBC members, Mycobacterium bovis is assumed to be a strictly cloned, slowly evolving pathogen, and there is obviously no sign of recombination or horizontal gene transfer. In this work, we apply comparative genomics to a whole genome sequence (WGS) dataset composed of 70 cattle M. from different pedigrees (Europe and Africa) to gain insights into the genetic diversity of cattle M. Evolutionary power. Three different methods are used to estimate the signs of reorganization. Globally, a small number of recombination events have been identified and confirmed by two independent methods with solid support. Nevertheless, compared with mutations, recombination has a weaker effect on the diversity of M. bovis (overall r/m = 0.037). The difference r/m average obtained in the clonal complex of Mycobacterium bovis in our data set is consistent with the general concept that the degree of recombination may vary greatly between the lineages assigned to the same taxonomic species. Based on this work, recombination in Mycobacterium bovis cannot be ruled out, so it should be the subject of further efforts in future comparative genomics research, in which WGS of large data sets from different epidemiological scenarios around the world is crucial. An additional analysis was then performed on the smaller Mycobacterium bovis data set (n = 42) from the multi-host TB prevalence, and more than 1,800 loci were identified, of which at least one strain showed a single nucleotide polymorphism (SNP) . Most (87.1%) are located in the coding region, and the global ratio of non-synonymous changes (dN/dS) of synonymous changes exceeds 1.5, indicating that positive selection is an important evolutionary force exerted on M. bovis. A higher proportion of SNPs were detected in genes rich in functional categories of "lipid metabolism", "cell wall and cellular processes", and "intermediate metabolism and respiration", revealing their potential in the biology and evolution of Mycobacterium bovis importance. A closer look at the genes in the MTBC ancestors that are prone to horizontal gene transfer and included in the 3R (DNA repair, replication, and recombination) system reveals the global average negative value of Taijima's D neutral test, which indicates past selective scanning The recent bottleneck after population expansion is still the main evolutionary driving factor for the mandatory pathogen Mycobacterium bovis to fight the host.
Mycobacterium tuberculosis complex (MTBC) is one of the most successful taxa of bacterial pathogens and a typical case of bacterial evolution. Its members show surprisingly high nucleotide identity at the genomic level (> 99%)1,2 . Different MTBC ecotypes can cause tuberculosis (TB), which is an infectious granulomatous disease, in a wide range of host species from micro-mammals to humans3,4,5. Currently, the complex includes humans [M. Tuberculosis (Mtb), Mycobacterium africanum] and animal-adapted pathogens (Mycobacterium bovis, Mycobacterium capitum, Mycobacterium pinnipedum, Mycobacterium microtobacter , Mycobacterium mongee, Mycobacterium miysani, Mycobacterium surika, "Bacillus chimpanzee" and "dassie") 5,6. M. canettii (also known as "Nodobacter glabrata") The average nucleotide identity with the aforementioned mycobacteria is 98%, and comparative genomics work has shown that M. canettii and the rest of MTBC have recently diverged from the common ancestor.7 Considering this concept, some authors call M. canettii Member of MTBC 8.
MTBC is systematically described as a strict clonal complex, and its population structure is clearly governed by reduced diversity, bottlenecks, selective scanning, and genetic drift9,10. Assuming complex strict clonal evolution, such as missing polymorphisms cannot be restored by recombination. Based on this premise, the consecutive events of the genomic deletion of the differential region (RD) and TbD1 (Mtb specific deletion 1 region) have been proposed as molecular markers of MTBC evolution2,5,11. Comparative genomics and whole genome sequencing (WGS) work supports the division of human-adapted members into nine lineages (Mycobacterium tuberculosis L1 to L4, L7, and L8; and Mycobacterium africanum L5, L6, and L9), lineages L2 to L4 shared delete TbD1 region2,11,12,13. In addition, animal adapted members are proposed to share a common ancestor, which is defined by clade-specific deletions in RD7, RD8, RD9 and RD102, 5, and 14.
Horizontal gene transfer (HGT) and recombination events are considered rare and occur in the ancestors of MTBC, rather than the different history of the entire MTBC member15,16,17. Two early reports by Hughes and collaborators (2002) and Gutacker and collaborators (2006) suggested that recombination events may help shape polymorphisms that mark specific loci in M. tuberculosis strains18,19. The reasons for the obvious lack of recombination in MTBC are: (1) the mechanical process and loss of ability of HGT; (2) the rarity of HGT events; (3) there is no chance of recombination events in the MTBC niche14,17. Recently, some whole-genome sequencing (WGS) studies applied to MTBC strain 20 and Mycobacterium bovis 21 have provided evidence of recombination, the first to show that MTBC strains often exchange small DNA fragments, but due to limited nucleotide sequence variation, these events Still there is not noticed.
Mycobacterium bovis is the most commonly recovered MTBC member from livestock (mainly cattle), although it can also be isolated from free-range and fenced wild animals4,22,23,24. M. bovis evolved into five major clonal complexes [European 1 (Eu1), European 2 (Eu2), European 3 (Eu3), African 1 (Af1) and Africa 2 (Af2)], according to the spoligotyping profile, specific deletions And single nucleotide polymorphisms (SNPs) 25, 26, 27, 28, 29 in specific genes. These clonal complexes demonstrate the diverse structure of the Mycobacterium bovis population and its association with geographic regions. In addition, the recent WGS work carried out by Zimpel and collaborators (2020) designed a phylogeny based on the SNP of Mycobacterium bovis, with more than 1,900 genomes, indicating that there are at least four different lineages (named Lb1 to Lb1 to Lb4), they are not completely consistent with the previously defined clonal complex, although geographic specificity can also be confirmed30. These authors performed differential analysis of phylogeny and molecular dating, but did not study recombination30.
Previous work using different molecular techniques, such as spoligotyping, MIRU-VNTR (mycobacterial interspersed repeat unit-variable tandem repeat number), and recent SNP typing revealed a certain level of genetic diversity among M. bovis strains 31,32,33, 34,35. The differentiation of genetic variation has become an important tool in the study of disease epidemiology, which is helpful for in-depth understanding of pathogenesis, virulence and disease transmission. The emergence of the WGS method provides the possibility to reveal the evolutionary driving factors imposed by the Mycobacterium bovis genome in the process of adaptation and persistence to different hosts and epidemiological scenarios.
In this work, we use comparative genomic analysis on various Mycoplasma bovis datasets (n=70), including isolates from different clonal complexes, to gain insights into the evolutionary process of Mycoplasma bovis, especially to resolve phylogenetic relationships And recombination events. As a supplement to this analysis, a sub-data set of M. bovis isolates (n = 42) obtained from a well-characterized multi-host tuberculosis area in Portugal 31,36 was further explored to infer non-identity The balance between the relative ratio of sense (dN) to synonymous (dS) nucleotide substitutions, as well as the evolutionary contribution of specific genomes mentioned in the literature, they are 37,38 obtained by MTBC ancestors through HGT, and encode 3R (DNA repair, replication, and recombination) system gene components 39. Choose genes obtained through HGT because they may represent ancient polymorphisms, so it is expected that they may contain a higher proportion of synonymous changes. The genes included in the 3R system were selected because previous work on M. tuberculosis strains indicated general negative/purification selections that work on these genes, and they may play an important role in evolution 39. Another goal of this work is to infer the existence of reorganization events. For this reason, considering that our dataset from Portugal only contains the genomes of European clone complex 2 and the strains that have not assigned the clone complex, we decided to include publicly available genome data to finally obtain a representative of all clone complexes, and Improve robustness and breadth of results.
42 newly sequenced Mycoplasma bovis genomes from the Portuguese endemic multi-host tuberculosis scene (details below), previously characterized from an epidemiological perspective36, are the center of this work. Considering that the data set from Portugal only has representatives of the European 2 clone complexes and strains without designated complexes, publicly available whole-genome sequencing data has been added to expand the data set that includes all representatives of the M. bovis clone complexes. Therefore, three whole-genome sequencing data sources were used in this work: complete/draft genome assembly, up to 10 scaffolds stored in NCBI (National Center for Biotechnology Information) (n = 15 isolates); stored in SRA ( The Illumina fastq file of the sequence read archive) represents the complex diversity of M. bovis clones (n = 12 isolates)30; and 42 newly sequenced genomes from Portugal. Mycobacterium bovis BCG (Bacille Calmette-Guerin) was excluded from the NCBI search. M. bovis AF2122/97 is usually used as a reference genome to be included in the dataset. Due to the public unavailability of the whole genome sequence represented by the African 1 cloning complex, and the small number of genomes from representative strains of Af2 and Eu1, the original sequencing data provided by SRA was used in these cases. The work of Zimpel and his collaborators (2020) helped to identify the genome from the aforementioned cloning complex and helped select Mycobacterium bovis for inclusion in the data set. For Eu3, only one type of genome is described (Branger et al., 2020), so the genome we include is a separate representative of the Eu3 complex.
Globally, this data set includes 70 cattle M. bovis isolated from 8 host species, distributed in 12 countries from 1985 to 2016. 36 species are designated as Eu2, 7 species are Eu1, 1 species are Eu3, 3 species are Af1, 4 species are Af2 and 19 are not attributable to any clonal complex (details below). The detailed information (including the accession number) of Mycobacterium bovis used in this study is shown in Table 1 and Supplementary Table 1.
42 newly sequenced whole genomes of Mycobacterium bovis from Portugal’s animal tuberculosis hotspots and distributed for more than 12 years are the center of this study, as potential wildlife-livestock disease systems have been regularly monitored 31,36 (Supplementary Fig. 1). According to subsequent procedures, these strains were isolated from cattle (n = 14), red deer (n = 16) and wild boar (n = 12) from 2003 to 2015: collect and handle animals in accordance with the recommended protocol guidelines The tissue samples are in the OIE Terrestrial Animal Handbook and are inoculated on Stonebrink and Löwenstein-Jensen pyruvate solid medium and liquid medium. The cultures are incubated at 37 °C and growth is checked once a week for at least 12 weeks. The colonies are stored directly in a glycerol solution at -80ºC. In the Mycobacterium selective medium (Middlebrook 7H9, BD Diagnostics), the original archived samples were passed through a single in vitro passage in vitro to obtain the DNA of the WGS program. For this, the frozen culture stock solution was enriched with 5% sodium pyruvate and 10% ADS (50 g albumin, 20 g glucose, 8.5 g sodium chloride in 1 L water) on Middlebrook 7H9 at 37°C Retrain. After 4 weeks of growth, the medium was renewed and the culture was regularly monitored until growth was observed. The cells were harvested by centrifugation, the pellet was resuspended in 500 µL phosphate buffered saline (PBS), heated at 99 °C for 30 minutes, centrifuged, and the supernatant was stored at -20 °C until WGS. All procedures are carried out in level 3 biosafety facilities.
The WGS paired-end genome library is prepared using the unique index of each DNA sample, and uses Illumina MiSeq (2 × 250 pb) (40 samples) and HiSeq (2 × 150 pb) (two isolates) technology (Eurofins Genomics, Germany) for sequencing. According to the manufacturer's instructions, use the Illumina Genome Analyzer with double-ended module attachment to sequence the genomic DNA, and use the Nextera XT DNA Library Prep Kit from Illumina to construct the library.
Taking into account the data recovered from the SRA (n = 12), the identification of the clone complex can be used as the metadata of the corresponding publication 30, 41, 43. When considering the complete genome, except for Mycobacterium bovis AF2122/97 and Mycobacterium bovis 3601, which are recognized members of the Eu1 and Eu3 clonal complex 25, 29, respectively, it is the same as the complete genome of Mycobacterium tuberculosis H37Rv (NCBI accession number NC_000962.3). Genome alignment is performed using MAFFT (multiple alignment program of amino acid or nucleotide sequence, version 7.458) and parameter -addfragments48. Then, search for the absence of different clonal complexes and/or the presence of SNP features.
The newly sequenced Mycobacterium bovis (n = 42) and the original reads of the assembled genome draft (n = 3) align the complex with the reference genome Mycobacterium tuberculosis H37Rv through the vSNP pipeline and the presence of the deletion and/or SNP features of different clones A search was conducted.
Collect information from the lack of features and/or the presence/absence of SNP and spoligotyping profiles to assign genomic data to the corresponding clonal complex. For the four draft assemblies, the spoligotyping profile cannot be inferred, so they are included in the "no complexity" group.
The bioinformatics workflow followed by this work starts from de novo assembly and mapping to a reference strategy, with the goal of exploring recombination events and specific genome polymorphisms. Figure 1 provides a flowchart of the steps followed. For recombination analysis, all genomes are used to increase the robustness of inferences and related indicators.
In order to reduce errors in the generation of genome consensus sequences, we first obtained de novo assembly, and then obtained core multiple alignments. The Unicycler pipeline is currently available at https://github.com/rrwick/Unicycler49 and is used to perform de novo assembly of 54 sequenced genomes (42 newly sequenced and 12 fastq files recovered from SRA). In short, before assembling from scratch, read quality analysis was performed in FastQC version 0.11.7 (https://github.com/s-andrews/FastQC), and Trimmomatic version 0.36 (option "cutting Adapters and other lighting-specific sequences from reads" and "Cut bases from the end of the read, if lower than the threshold quality of 20" are applied) (http://www.usadellab.org/cms/?page= trimmomatic) 50. Then, SPAdes optimiser49 was used for genome assembly, and Pilon version 1.1851 was used for post-assembly optimization. A conservative bridging mode was selected to avoid incorrect assembly, and the k-mer size was searched and selected between 20% and 95% of the read length. Follow the SPAdes guidelines and consider the read size, delete contigs smaller than 300 bp, and establish a 20 read depth coverage cutoff of 52. In the de novo assembly strategy, genomic regions such as the highly repetitive proline-glutamate (PE) and proline-proline glutamate (PPE) paralogs were not removed.
The quality of the de novo assembly is assessed through the QUAST pipeline (http://quast.sourceforge.net/quast.html), which facilitates the renewal of the contig and the M. bovis AF2122/97 reference genome (NCBI accession number LT708304.1) Mapping (see Supplementary Table 1 for quality parameters).
With the help of the vSNP pipeline (https://github.com/USDA-VS/vSNP), the FASTQ file of the newly sequenced M. bovis from Illumina sequencing is compared with the M. bovis AF2122/97 reference genome (LT708304.1)) . According to the best practice recommendations of the Genome Analysis Toolkit (GATK) 53, 54, 55 apply standard filter parameters or variant mass scores for recalibration. The results are filtered using the lowest SAMtools mass score of 150 and AC = 2. Also use Kraken (http://ccb.jhu.edu/software/kraken/) to check the readings to rule out contamination. The vSNP pipeline used to map to sequence strategies in our work examines a series of defined SNPs and targets, and also excludes mixed infection scenarios. The read genome coverage is better than 99% (Supplementary Table 1).
To avoid mapping errors and wrong SNPs, filter out a variant in the following cases: (1) it is supported by less than 20 readings, (2) it is found with a frequency of less than 0.9, (3) it is in at least one Strains, but at least there are gaps in another strain. The integrated genomics viewer (IGV) version 2.4.19 (http://software.broadinstitute.org/software/igv/)56 was used to visually verify SNPs and positions with mapping or alignment problems. Since the proline-glutamate (PE) and proline-proline glutamate (PPE) genes are highly duplicated and are part of a multi-gene family, they are easily misunderstood by Illumina sequencing and mismapping, so they are preferred The mycobacterial bioinformatics workflow removed members of the tuberculosis complex when using the strategy of mapping to sequence to confirm SNPs. Therefore, we filtered out PE/PPE genes and indels from the analysis.
According to Bovilist (http://genolist.pasteur.fr/BoviList/), all SNPs are divided into functional categories. The SnpEff pipeline (https://pcingola.github.io/SnpEff/) is used to infer SNP consequences (synonymous or non-synonymous changes). Created a new database of Mycobacterium bovis AF2122/97 genome (LT708304.1).
The core genome multiple alignment was performed using Parsnp v1.2, currently available at https://github.com/marbl/parsnp57, using 69 complete genomes/draft assemblies (with option -c) and M. bovis AF2122/97 (LT708304.1) is used as a reference. Four core multiple alignments were performed: only the members of the Eu2 cloning complex (n = 37), including all the members of the European cloning complex (n = 44), including the junction point of the European and African cloning complex (n = 51), and include all Mycobacterium bovis in this study (n = 70).
The core alignment generated by Parsnp is used to infer the maximum likelihood (ML) phylogenetic tree using CIPRES Science Gateway v3.3 (http://www.phylo.org/)58 using RAxML, and perform 1000 guided replications.
Three different algorithms and bioinformatics tools are used to check for the presence of recombination events in parallel: SplitsTree4 software, Gubbins (unbiased lineage through recombination in nucleotide sequences) pipeline, and RDP4 (recombination detection program, version beta 4.101) software.
The split decomposition method implemented in SplitsTree4 v4.15.1 (http://www.splitstree.org/)59 is used to calculate the rootless phylogenetic network, using Phi test for statistical verification, and the significance threshold is p = 0.05. The core multi-alignment analysis of Parsnp is used as input, and the splitting decomposition as a network standard is realized.
Gubbins pipeline v2.3.1 (https://github.com/sanger-pathogens/gubbins60 runs with default parameters as another way to evaluate the impact of recombination on Mycobacterium bovis. The algorithm implemented in the pipeline reconstructs the relevant clone lineage The complete genome/draft assembly of our data set and the reference genome (bovine bovid AF2122/97, LT708304.1) are mutually; and scan the position of the SNP on each branch of the tree to detect the SNP cluster representing the recombination event. The zero of the branch Assume that there is no recombination event, which means that the SNPs that occur on the branch should be evenly distributed. The core multiple alignment from Parsnp and the best scored ML tree from RAxML are used as input files.
Finally, in order to confirm the reorganization event suggested by the Gubbins pipeline, the six algorithms implemented in RDP467 (RDP61, GENECONV62, Bootscan63, Maxchi64, Chimaera65 and SiScan66) are applied to Parsnp's core multiple alignment under default settings. We determined that at least three of the algorithms implemented in RDP4 must consistently demonstrate an important signal to verify each recombination event.
Considering that both Gubbins and RDP software look for recombination signals by checking the core multiple alignments in a window of up to 500 bp, and confirming that the inclusion of PE/PPE genes during de novo assembly will not interfere with the recombination signals found, further analysis is done by homolinearity Check the vicinity of the gene identifying the recombination event. The synlinear map using the complete genome was constructed using MAUVE-multi-genome alignment (http://darlinglab.org/mauve/mauve.html) to exclude local genome translocations or inversions. In addition, the entire genome was used to perform homolinearity analysis on the amino acid sequence through the SyntTax web server (https://archaea.i2bc.paris-saclay.fr/SyntTax/).
A more in-depth analysis of the genome data set obtained from the Portuguese multi-host tuberculosis system is to check the polymorphism of the genes mentioned in the literature. These genes are 37,38 and the gene encoding 3R obtained by the MTBC ancestors through HGT (DNA repair, replication and recombination) system components 39. Use ClustalX v2.1 (http://www.clustal.org/clustal2/) and use DnaSP v6.12.03 (http://www.ub.edu/dnasp/) to calculate gene diversity and nucleotide diversity (π) and Tajima's D neutral test parameter input.
A maximum likelihood (ML) phylogenetic tree based on 69 Mycoplasma bovis isolates and reference genomes was obtained (Figure 2A). Compared with single-gene-based trees or multi-locus-based trees, this strategy allows for the generation of more powerful trees that do not capture the variability of the entire genome and therefore exhibit a lower ability to discriminate between species 68,69. The topological structure of the ML tree is usually consistent with the complex classification of clones. The genome of Eu2 is clustered in a branch, and the genome of Af1 is also clustered together (Figure 2A). The result is also consistent with the known evolutionary relationship of Mycobacterium bovis, that is, there is a big difference between the Eu1 member and the group consisting of all other clonal complexes and genomes, but the clonal complex 30 is not specified. The small inconsistency between the clonal complex and the relationship observed on the phylogenetic tree can be explained by the fact that the clonal complex is described based on specific genomic regions, while the phylogenetic tree is based on multiple alignments of core genomes representing the entire genome.
The maximum likelihood phylogenetic tree (GTR) is constructed based on the core genome alignment of the Mycobacterium bovis genome before (A) and after (B) removal of the recombination site. The branch colors represent the Mycobacterium bovis clone complex: Europe 1 is purple, Europe 2 is red, Europe 3 is blue, Africa 1 is orange, and Africa 2 is green. The tree is rooted and drawn to scale, and the branch length is measured as a replacement for each site.
The Mycobacterium tuberculosis complex is described as clonally evolved, and most of the evidence accumulated over the years supports the idea that ongoing HGT and recombination events will not occur at the detectable level of MTBC15,17,18.
Previous work has shown that there may be limited recombination between MTBC strains20,21, while others have failed to identify measurable recombination events70,71. Re-discuss this issue with the focus on Mycobacterium bovis, which is different from the previous work that only considered Mycobacterium tuberculosis 70,71; or consider MTBC as a whole, with almost no M. bovis representing 20; or only consider restrictive cattle fractions. The mycobacteria dataset, in this work, there are a total of 70 strains, representing all clonal complexes, used to screen for recombination. The data set is scaled according to four cumulative levels: (1) Eu2 members, (2) all European clone complex members (ie Europe), (3) European and African clone complex (Eu + Af) and (4) the entire data Collections (including genomes that are not included in any cloning complexes already described).
To further study this hypothesis, a split-decomposition network was performed to assess the absence of recombination events between genomes, because this method can visualize the ancestral relationship between individuals and display conflicting phylogenetic signals. All four data sets in the analysis confirmed the existence of loops in the network (that is, areas that do not converge into a single tree), but the Phi test has no statistical support (Eu2, p = 0.0956; Europe, p = 0.1637; Eu + Af p = 0.2774; the entire data set p = 0.2451), which provides poor evidence for the existence of reorganization events (Figure 3A-D).
In Europe 2 genomes (n = 37) (A), European genomes (n = 44) (B), European and African genomes (n = 51) (C) and the entire data set (n = 70) (D).
After this analysis, and taking into account the cyclic observations in all networks, the reconstruction algorithm implemented in the Gubbins pipeline was applied to reconstruct the clonal lineage and to supplement the estimation of the effect of the recombination on the M. bovis genome. Infer the cumulative number of recombination events, most of which occurred in terminal branches (that is, in a single genome) (Table 2). These indicators show the consistency of the entire data set and indicate that the frequency of recombination events is 200 to 300 times that of mutations. Once the rho/theta parameter representing the relative rates of recombination and point mutations on the branch appears to be between 0.0037 and 0.0056 (Table 3 ). Recently, the published work of the 38 M. bovis strain demonstrated a higher rho/theta value (rho/theta = 0.1) than that obtained in this dataset, but the work of Patané and colleagues used reference-based assembly to infer recombination parameters , A procedural detail, due to the assembly procedure, has been associated with the abundance of putative recombination events in the terminal branch.
Next, the r/m parameter represents the diversity ratio of recombination and mutation introduction, and its average value is between 0.025 and 0.037, indicating that compared with mutations, recombination has a lower overall impact on the genetic diversity of M. bovis (Table 3)) . For extensive comparison, a similar method was used to estimate the r/m parameter for the MTBC dataset composed of 23 genomes, showing an average value of 0.48620, while for Patané and colleagues' 38 M. bovis dataset, it proved the average value Is 0.98. In the first study, only two of the 23 genomes included in the work of M. bovis (M. bovis BCG and the reference strain), so the obtained value may be biased due to the overexpression of the M. tuberculosis genome . In the second report, the analyzed Mycobacterium bovis populations were mainly recovered from the United States and livestock hosts. In contrast, in our data set, more geographic locations and host species are represented, and genomes grouped into different clonal complexes with different population genetic characteristics are also used, thus achieving a deeper and broader Population knowledge. The difference r/m average value obtained with our data set is consistent with the concept that the degree of recombination varies greatly between the lineages assigned to the same taxonomic species, so these results indicate that the M. bovis clone complex may exhibit recombination differences The impact is also as suggested by Didelot & Maiden72. Nevertheless, significantly expanding this data set by including a larger number of M. bovis genomes will allow for further clarification of this point. Both the r/m and rho/theta parameters show variability between branches, and this result is consistent with reports on other bacterial species72,73.
Finally, in order to confirm the reorganization events identified by the Gubbins pipeline, six different algorithms were used in the RDP4 software to independently test different core multi-comparisons. Globally, less than half of the events identified by Gubbins were confirmed by RDP4 (Tables 4 and 5). Considering the entire data set, three recombination events were confirmed, two involving internal nodes and the other involving a single genome in a terminal branch, for which clonal complexes could not be assigned (Tables 4 and 5). The identification of events in terminal branches may indicate that the recombination is still in progress in contemporary M. bovis strains or the result is misplaced70. In this hypothetical recombination region, approximately 20% of the positions have undefined nucleotides (N), thus affecting the recombination signal (Supplementary Figure 2). In addition, this region affects the rrs gene, encoding the 16S ribosomal RNA that is expected to be highly conserved, so this putative recombination signal may be the result of sequencing errors or misalignment. Then the whole genome alignment between Mb0003 and Mycobacterium bovis AF2122/97 was performed, and the existence of undefined nucleotides and SNPs was confirmed, so the possible problems related to the wrong alignment were not due to the biological information implemented in this work Appeared after learning the program.
No gaps or undefined nucleotides were found in the recombination regions of the internal nodes (Figures 4 and 5). Regarding these events, one contains only the Eu2 genome and affects the pks12 gene, which encodes a possible polyketide synthase; while the other is registered in the Eu1 genome and affects the narX gene that encodes a possible nitrate reductase (Table 4) . In general, the recombination analysis shows that there is a limited number of recombination fragments with statistical support, and the inferred indicators indicate that the recombination has a low impact on the M. bovis lineage. The recombination signal is expected to be low, but it is important to distinguish the true evolution signal from the background noise, which is a challenging task. In order to reduce the noise signal introduced by reference-based assembly and mismatch problems 70, 71, all the rest except the complete genome were assembled from scratch, and the assembly quality was checked and ensured by QUAST pipeline analysis (Supplementary Table 1) . In addition, a series of supplementary analyses were conducted to provide the robustness and accuracy of the overall survey. Therefore, the sequencing quality of narX and pks12 genes was evaluated by read mapping against Mycobacterium bovis AF2122/97. The recommended SNP position in the recombination region was confirmed by applying the criteria mentioned in the method section (at least 20 readings and 0.9 change frequency). The polymorphism of the narX gene was fully confirmed in the two genomes (Mb1792361 and Mb7240415; 2.3%) and the genomes of the pks12 genome: Mb0891, Mb1711, Mb1789, Mb1870, Mb17046, Mb1756, and Mb12 genes. However, for the genome Mb2043, six of the eight positions do not meet the read depth criterion because the SNP is supported by a maximum of 17 reads, which is below the established cutoff value of 20. Therefore, the recombination of six genomes (8.6%) at this genome site can be confirmed (Figures 4 and 5).
The detailed visualization of the recombination region alignment of the Mycobacterium bovis dataset affects the narX gene that encodes a possible nitrate reductase. No gaps or undefined nucleotides were found in the recombination region of the internal nodes. This particular event is registered in the Eu1 genome. The sequencing quality of the narX gene was evaluated by plotting the reads of Mycobacterium bovis AF2122/97. Confirm the recommended SNP location in the recombination area by applying the criteria mentioned in the method section (at least 20 readings and 0.9 change frequency). The polymorphism of the narX gene was fully confirmed in the genomes of Mb1792361 and Mb7240415 (2.3%).
Detailed visualization of the recombination region alignment of the Mycoplasma bovis dataset affecting the pks12 gene. No gaps or undefined nucleotides were found in the recombination region of the internal nodes. Regarding the event affecting the pks12 gene encoding possible polyketide synthase, it only contains the Eu2 genome. The sequencing quality of pks12 was evaluated by the read mapping of Mycobacterium bovis AF2122/97. Confirm the recommended SNP location in the recombination area by applying the criteria mentioned in the method section (at least 20 readings and 0.9 change frequency). The polymorphisms of genomes Mb0891, Mb1711, Mb1789, Mb1870, Mb1758, Mb2043, and Mb1960 have been fully confirmed.
PE and PPE genes have repetitive regions that are easily misread by Illumina sequencing and mismapping, so they are usually deleted from the bioinformatics workflow of M. tuberculosis members only when using the mapping-to-sequence strategy. The inference of recombination events applied in this work is based on de novo assembly without filtering out PE/PPE. We believe that by implementing three different complementary methods and algorithms through SplitsTree, Gubbins pipeline and RDP4 software, the strategies applied are robust to processing and filtering the reorganized regions caused by error signals. However, in order to exclude the interference of the PE/PPE gene on the Gubbins and RDP4 software to identify SNP clusters, and therefore the identification of the recombination regions proposed to affect the narX and pks12 genes, the neighborhood of these genes was examined (Supplementary Fig. 3–5). In M. bovis AF2122/97, the narX gene is separated by narK2 and Mb1764c, while pks12 is surrounded by Mb2075c e Mb2073c (Supplementary Figure 3-5). The map generated using the MAUVE synline map of the complete genome provides information about gene sequence conservation and rearrangement, showing four collinear blocks, and no signs of genome translocation or inversion. In addition, the complementation analysis with the amino acid sequence proved the homology in all the complete genomes, and no PE/PPE was found in the adjacent regions of narX or pks12. For narX, one genome (Mb0030) has a lower synonymy score because the narX gene was identified as two fragments (fragments 1891 and 1890). For pks12, due to similarities, Mb0030 and Mb003 exhibited lower synlinearity scores, while pks12 was identified in two and three fragments, respectively, representing different domains of the protein (Supplementary Figure 3-5). Taking into account this information, and Gubbins and RDP4 software both perform analysis, check the core multiple alignment of the maximum 500 bp in the window, we confirmed that the PE/PPE gene will not interfere with the recombination signal that affects narX and pks12.
Although the recombination signals detected in this data set may be considered residual, it is true that recombination in M. bovis cannot be ruled out, so it should continue to be the subject of further analysis, in which whole genomes from different epidemiological scenarios are sequenced to Important.
Comparing the ML phylogenetic trees obtained before and after recombination correction (Figure 2A, B) did not lead to significant changes in the inferred phylogenetic relationship, and the M. bovis strains were clustered in the same group.
After mapping 42 newly sequenced M. bovis reads with the reference genome of M. bovis AF2122/97, a SNP alignment containing 1816 polymorphic positions was obtained. Most SNPs (87.1%) are located in the coding region, and the affected genes are characterized according to the functional categories shown in Bovilist (Figure 6A, B). Taking into account the total number of genes in each functional category, the genes in the "lipid metabolism" category showed more SNPs, followed by "cell wall and cell processes" and "intermediate metabolism and respiration", revealing that they are in M. bovis evolution.
Hierarchical analysis of the M. bovis dataset from Portugal (n = 42). The total number of registered SNPs and affected genes for each functional category (A). The total number of synonymous and non-synonymous changes registered by function category (B).
On a global scale, the average dN/dS ratio is better than 1.5, which indicates that the global evolutionary pressure is to get rid of the ancestral state and represents a positive (diversified or directed) and/or relaxed purification choice scenario. In the categories of "virulence, detoxification, adaptation", "insertion sequences and phages", and "regulatory proteins", more than two-thirds of SNPs are non-synonymous (Figure 6B).
In all categories, there are genes with multiple SNPs, resulting in an average mutation rate (that is, the average SNP per gene) greater than 1 (Figure 6A). Pks12 (Mb2074c) with 15 SNPs and fas (Mb2553c) with 8 SNPs have higher mutation values. Both of these genes are involved in fatty acid metabolism. The pks gene encodes polyketide synthase (PKS), which is a multifunctional enzyme involved in mycobacterial cell wall lipid biosynthesis74,75. This gene encodes a multifunctional polypeptide that is involved in the synthesis of mycoketides74,76. The fas gene is involved in the synthesis of mycolic acid. Both of these genes play an important role in the biosynthesis of the cell wall in contact with the host.
In order to further study the evolution of Mycobacterium bovis, two sets of specific genes were analyzed. Previously published works using sequence composition and phylogenetic methods identified genes that were acquired by MTBC ancestors through HGT before diversification37,38. These genes are listed in Supplementary Table 2. The SNP distribution of a total of 77 genes that may be related to HGT was analyzed, and 26 polymorphic sites were identified, which in most cases (78%) resulted in non-synonymous (NS) changes (Supplementary Table 2). Previous work on the MTBC genome demonstrated that the putative HGT region exhibits a higher NS SNP ratio compared to the rest of the genome. If one thinks that these recombination regions were acquired by MTBC ancestors, and therefore, they overrepresent ancient polymorphisms, then the proportion of synonymous changes is expected to be higher, because NS substitutions are expected to be eliminated by negative selection because of amino acid changes May change the function of the protein. Therefore, our results indicate that the functional consequences may stem from the replacement of HGT-like genes, which reflects their importance for valuable adaptive genetic diversity.
In parallel with this analysis, the genes encoding the components of the 3R (DNA repair, replication, and recombination) system were thoroughly examined in accordance with the list previously published by dos Vultos and collaborators (2008)39. The exchange of identical DNA fragments cannot be directly observed, although it may be a frequent process when closely related bacteria are involved, such as in the case of this data set; in addition, this process may be the key to DNA repair methods72, so Play a role in homologous recombination. A total of 26 polymorphic positions distributed by 54 genes were identified (Supplementary Table 3). In this set of genes, NS changes accounted for about 65% of the consequences, which is consistent with previous reports on Mycobacterium tuberculosis strains.


Post time: Oct-21-2021