Genome Analysis of Streptococcus gordonii SK 12

1 Faculty of Dentistry, University of Malaya, Kuala Lumpur, Malaysia 2 Genome Informatics Research Laboratory (GIRG), High Impact Research Building (HIR) Building, University of Malaya, Kuala Lumpur, Malaysia 3 Centre for Oral Health Research, School of Dental Sciences, Newcastle University, United Kingdom 4 Genome Solutions Sdn Bhd, Innovation Incubator UM, Research Management & Innovation Complex, University of Malaya, Kuala Lumpur, Malaysia ORIGINAL ARTICLE


Streptococcus, Staphylococcus and Enterococcus
(2).Streptococcus gordonii are among several species of streptococci that are common agents of IE (3, 4).More recently, S. gordonii have been reported to be opportunistic agents causing bacteraemia in immuno-compromised patients.It has been reported that Streptococcus gordonii contributes to approximately 40% of cases which involved neutropenia cancer patients (5).It is also one of the pioneer species of bacteria that initiate dental plaque formation (3).It is thought that the ability to colonize early in dental plaque is related to the large number of cell surface adhesin proteins that are produced by S. gordonii and mediate adhesion both to salivary pellicle coating the tooth surface and to other oral bacteria (1).The gram-positive, mesophilic and nonmotile S. gordonii grow in pairs or bead like chains.They belong to the mitis group oral streptococci which are generally considered commensals in the human oral cavity.S. gordonii can enter the bloodstream via inflamed gums or other oral tissues, or even just during daily toothbrushing.Normally, bacteraemia is only transient, but occasionally the presence of bacteria in the bloodstream can eventually cause IE.In this study, we sequenced and characterised the S. gordonii SK12 genome using a variety of bioinformatics approaches.We aimed to yield a better understanding on the oral bacteria biology, genetics, and pathogenicity in order to further target and combat IE.

MATERIALS AND METHODS
1. Bacterial DNA extraction and sequencing.
S. gordonii (SK12) was isolated from the oral cavity of a volunteer in Denmark by Kilian and colleagues (6).The genomic DNA was extracted as previously described (7).The SK12 strain was sequenced using Illumina HiSeq 2000 platform (8).
2. Data pre-processing and genome assembly.
To ensure high quality genomic sequence data, raw sequencing reads generated by Illumina HiSeq machine were quality-checked using PRINSEQ lite version 0.20.3.CLC Genomic Workbench 5.1.5was utilized to remove the adaptor sequences, low quality reads and sequences.The pre-processed reads were again imported into CLC workbench 5.1.5for assembly into contigs and scaffolds.

Genome annotation
To predict genes and their functions, we annotated the assembled genome using Rapid Annotation Using Subsystem Technology (RAST) pipeline (9).Using the assembled genome sequence as an input file to the RAST server, the functional elements such as proteincoding genes, tRNA genes and rRNAswere predicted.The functions of the protein-coding genes were also predicted using the RAST subsystem technology.

Phylogenetic analysis
The phylogenetic analysis was performed to determine the taxonomic position and infer the phylogenetic relationship between SK12 and its closely related Streptococcus strains/species.16S rRNA sequences from other Streptococcus species/strains were extracted via RNAMMER program (10).Next, the 16S rRNA sequences were aligned using MAFFT software (11).Lastly, the 16S rRNA-based phylogenetic tree was generated using Molecular Evolutionary Genetic Analysis version 5 (MEGA5) software with 1000 bootstraps (12).
5. Prophage and Genomic Island analysis PHAST (PHage Search Tool) (13) was used to identify the putative prophages in the genome of SK12.On the other hand, the putative Genomic Islands (cluster of genes of probable horizontal origin) were predicted using the Island Viewer online tool (14) .

Virulence factors analysis
To identify putative virulence genes in the sequenced genome of SK12, BLAST searches were performed on the RAST-predicted protein-coding genes against Virulence Factor Database (VFDB) that stores manually curated known virulence genes from literature (15).The putative virulence genes were predicted based on the protein sequence homology.The virulence profile of SK12 was represented or visualize in a heat map generated using inhouse scripts.

Genome characteristics
The assembled genome of S. gordonii SK12 consists of 27 contigs, with a contig N50 of 226,260 bp.The size of this sequenced genome was approximately 2,145,851 bp with a G+C content of 40.63% which is similar with the average G+C content of the published S. gordonii genomes (16).
Figure 1 describes the subsystem distribution statistics of S. gordonii SK12 based on the RAST genome annotation.RAST predicted 2,097 coding sequences (CDSs) and 56 rRNA/tRNAs in the SK12 genome.RAST functional annotation analysis predicted that most of these genes are likely to be involved in basic functions such as those associated with carbohydrates (238 genes), amino acid and derivatives (204 genes), co-factors, vitamins, prosthetic group, pigment (87 genes), DNA metabolism (78 genes), membrane transport (60 genes), RNA metabolism (99 genes).No genes were predicted to be in the functional category of photosynthesis and nitrogen metabolism.

TAXONOMIC CLASSIFICATION
To identify the taxonomic position of S. gordonii SK12, we reconstructed a neighbour joining phylogenetic tree using 16S RNA gene sequences.The candidates of the tree include 24 species from The National Centre for Biotechnology Information (NCBI).The size of the 16s rRNA was 1,500 base pair and all the sequences were compiled into a file.The sequences were aligned using a Multiple Sequence Alignment Program (MAFFT) (11) and the phylogenetic tree was generated using Molecular Evolutionary Genetics Analysis (MEGA) 6.0 software (12).In general, S. gordonii SK12 was clustered into a big clade which included all the Streptococcus species (Figure 2).Our data showed that S. gordonii Challis was the closest neighbour of our strain SK12.
To further confirm relat ionships between these species, we constructed a more robust core-genome SNP-based tree.To construct this phylogenetic tree, all genome sequences that we used in this analysis were uploaded into the Pan-Genome Sequence Analysis (PANSEQ) for the alignment and SNP identification.The identified SNPs in the core or conserved genomic region among all genomes were extracted and aligned.The SNPs were used to gennerate a phylogenetic tree (Figure 3).Our data clearly showed that S. gordonii SK12 was also included in a big clade with the rest of the Streptococcus species and the closest neighbour was S. gordonii strain Challis substrate CH1.This result was consistent with the result obtained from the 16S gene-based phylogenetic tree.

GENOMIC ISLAND ANALYSIS
Genomic Islands (GIs) are the parts of the genome that have likely arisen from horizontal gene transfer.Horizontal gene transfer is very important in the evolution of bacteria and can influence traits such as antibiotics drug resistance, symbiosis, fitness, and adaption to different environments (17).GIs are usually characterised by their large size (mostly more than 10 Kb), their frequent association with tRNA-encoding genes and a different G+C content compared with the rest of the genome (17).Genomic Islands are postulated to have important roles in bacterial adaptation and pathogenicity as well as other important functions.Identifying the Genomic islands in our strain will provide more insights into the new capability of this bacterium which might be acquired through horizontal gene transfer (18).
Using bioinformatics approach, we found 10 putative GIs in the genome of SK12 (Table 1).Of these, GI 1 contains the gene encoding the bacitracin ABC transporter Bcr.Bacillus subtilis cells carrying bcr genes which are responsible for bacitracin resistance and collateral detergent sensitivity.These complex mechanisms consist of three main components which are two hydrophobic proteins BcrB and BcrC which presumably form a diffusion channel and two-identical ATP-binding subunits, BcrA.The hydrophobic protein BcrC mediates partial bacitracin resistance by binding to the antibiotic and is responsible for the collateral detergent sensitivity by binding to the detergent, which may induce the disruption of the neighbouring membrane.It has been shown that resistance to bacitracin is closely related to detergent sensitivity.The higher the degree of resistance to bacitracin the more sensitivity it is to detergent.The key point of this process appears to  be the membrane protein BcrC, which itself alone can provide resistance to bacitracin and simultaneously render the strain sensitive to detergent.The role of BcrA is to provide energy for the transport of bacitracin across the cell membrane, but the presence of BcrA does not contribute to detergent sensitivity (19).The presence of this gene in the horizontally transferred GI1 suggests that SK12 might have acquired the capability for bacitracin resistance and collateral detergent sensitivity.
In GI 2, a gene encoding for a transcriptional regulator of the TetR family and a cadmium efflux system accessory protein were discovered.Transcriptional regulators of the TetR family are involved in the regulation of multidrug efflux pump expression, pathways for biosynthesis of antibiotics, responses to osmotic stress and toxic chemical, control of catabolic pathways, and differentiation processes and pathogenicity (20).Cadmium efflux system accessory protein has also been found in Listeria monocytogenes Lm_1889.The putative function of this protein (based on the predicted molecular functions) is in DNA binding and transcriptional regulation.
In GI 8, a gene encoding CRISPR-associated proteins was detected.There were 8 types of CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) associated proteins.CRISPR is believed to participate in the defence against virus in bacteria and archaea.These systems have been found in the genomes of approximately 40% of sequenced bacteria and 90% of sequenced archaea (21).CRISPRs consist of identical repeated DNA sequences (repeats), interspaced by highly variable sequences referred to as spacers.The spacers originate from either phages or plasmids.CRISPR-associated (cas) genes encode conserved proteins that together with CRISPRs make-up the CRISPR/Cas system, responsible for defending the prokaryotic cell against invaders.CRISPR-mediated resistance involves three stages: (i) CRISPR-Adaptation, the invader DNA is encountered by the CRISPR/Cas machinery and an invader-derived short DNA fragment is incorporated in the CRISPR array.(ii) CRISPR-Expression, the CRISPR array is transcribed and the transcripts are processed by Cas proteins.(iii) CRISPR-Interference, the invaders' nucleic acid is recognized by complementarity to the crRNA and neutralized.An application of the CRISPR/Cas system is the immunization of industryrelevant prokaryotes (or eukaryotes) against mobile-genetic invasion (22).The presence of these CRISPR-associated genes in the genomic island might suggest that SK12 has capability to prevent invasion of foreign DNA through the CRISPR/Cas system.

VIRULENCE FACTORS
Virulence factors are genes that increase the severity of infections.According to the heat map (Figure 4), we predicted the total number of virulence factors of S. gordonii SK12 is 118.No one virulence gene is specific to SK12 as indicated in the heat map (Figure 4).The virulence factors that have homologues in all the other bacteria analyzed are listed in Table 2.
A previous study showed that fibronectinbinding proteins such as fbp54 and pavA, allowing the bacteria to adhere to host cells (23).In invasive streptococci, fibronectin-binding proteins help to trigger endothelial cells to uptake the streptococci via Rac1-dependent phagocytosis which follows the classical endocytic pathway with lysosomal   (23).Additionally, fibronectin binding proteins initiate the process that contribute to deep tissue tropism and lead to invasion of bacteria into vascular endoepithelial lining.Hence, we suggest that in S. gordonii SK12 fibrinogen binding proteins may mediate adherence to multiple tissues and play major roles in the pathogenesis of septic arthritis, endocarditis and other infectious diseases (23).
In addition, cpsY transcriptional.regulatorpredicted in SK12 is believed to be associated with systemic infection and is required for survival in neutrophils but not in macrophages.It has been shown that cpsY knockout strains have growth defects when cultured in vitro in human plasma.It is also found to regulate the methionine metabolic   pathway in addition to contribute in systemic infection.Furthermore, cpsY is believed to be essential for the bacterial invasion and survival in whole blood due to systemic dissemination (24).
Another virulence factor that we found in SK12 is sortase, srtA.In Streptococcus sanguinis, SrtA has been annotated as a putative housekeeping sortase which involved in covalent attachment of the majority of substrates (25).However, sortases of S. sanguinis has been found to have a modest effect in competitive colonization at the onset of IE.

PROPHAGE ANALYSIS
As mobile DNA elements, phage DNA is a vector for lateral gene transfer between bacteria (26).The integration of phages into the bacterial genome can bring in new genes to the bacteria which may affect phenotypes such as drug resistance and virulence.We found one putative intact or complete prophage region in the sequenced genome of SK12 (Figure 5).The genomic size of this prophage is about 36564bp starting from 1597011-1633575bp.The prophage is located in contig 8 and has a G+C content of 41.25% which is slightly higher than the average G+C content of the SK12 genome (27).To identify the origin of this prophage, we extracted the prophage sequence and BLASTed it against NCBI nucleotide database.This analysis showed that this putative intact prophage sequence was highly similar to Streptococcus phage PH15 with a 93% of sequence identity, suggesting that it has close relationship with this known prophage.From the PHAST prediction results, a few genes were predicted in the prophage (Figure 5 & 6).For instance, a gene encoding for terminase protein was predicted in this intact prophage.This gene exhibits a terminase activity that binds the lambda DNA and proheads and packages the DNA into the prohead.Terminase also has endonuclease activity and cleaves the phage DNA at a specific site known as cos, so that the single genome lengths are packaged into each phage head (28).
Another gene predicted to encode for head protein phage is also present in the predicted prophage.Head proteins markedly contribute to immunological memory to the phage and consist of highly antigenic outer capsid protein and major capsid protein (16).A tail phage protein is also present in the predicted prophage and has a structurally well conserved dodecameric portal at the capsid.The portal plays critical roles in head assembly, genome packaging, neck/tail attachment,

CONCLUSION
Here we report a new genome sequence of S. gordonii SK12.As expected, this clinically-derived strain has genome size and G+C content which are consistent with other S. gordonii published strains.Our phylogenomic analysis confirms that this strain is indeed S. gordonii as it is closely related to the well-studied S. gordonii Challis.Through the acquisition of genes through horizontal transfer, SK12 has obtained genes that might give this potential pathogen the ability for bacitracin resistance and collateral detergent sensitivity through ABC transporter.Moreover, this strain might have strong immune/defense system to prevent the invasion of foreign DNA, supported by the presence of multiple CRISPR associated genes in the genomic island.Besides that, SK12 has numerous virulence genes which may explain how this apparent commensal colonizer of the oral cavity is able to cause serious diseases in some circumstances.The addition of this

Figure 1 :
Figure 1: RAST functional analysis.Different functional categories/features were represented by different colors.Numbers in brackets represented the number of genes in the functional category.

Figure 2 :
Figure 2: 16S RNA-based phylogenetic tree of oral streptococci.S. gordonii SK12 was closely related to S. gordonii Challis.

Figure 3 :
Figure 3: Core-genome SNP-based phylogenetic tree.The closest bacterial species to SK12 was S. gordonii Challis.

Figure 4 :
Figure 4: A heat map showing the virulence gene profiles across different species or strains.

Fibronectin-binding
protein Fibrinogen-binding protein Putative fibronectin-binding protein-like protein A Putative fibrinogen-binding protein-like protein A PavA Adherence and virulence protein A Fibronectin/fibrinogen binding protein Fibronectin-binding protein-like protein A Fibronectin-binding protein A, putative Hypothetical protein and genome ejection.In addition, phage-like protein and hypothetical protein phage are also present.Hypothetical protein phage does not have a specific function and phage-like proteins help the bacteria to survive in harsh environments and to wait for next opportunity to affect other new bacteria.

Figure 5 :
Figure 5: Bacterial genome map.An intact prophage was predicted in the SK12 genome by the PHAST software.

Table 1 :
10 predicted genomic islands in the SK12 genome.

Table 2 :
List of predicted Virulence factors in the SK12 genome.