PhyML [44] was used to infer phylogenies

PhyML [44] was used to infer phylogenies selleck screening library for each ortholog group and phylogenetic confidence was determined by the approximate likelihood-ratio test for branches (aLRT) method [45]. PhyML was also used to infer the core genome phylogeny by concatenating the aligned sequences of each ortholog group with one representative sequence in each strain and removing conserved alignment positions. Recombination between Pav lineages was detected by identifying gene trees in which Pav BP631 formed a monophyletic group with one or both of the other Pav strains. In addition to the whole-genome ortholog analysis,

we identified T3SE pseudogenes and gene fragments by BLASTing all of the amino

acid sequences Selonsertib of T3SEs in the database at http://​www.​pseudomonas-syringae.​org against the Pav genome sequences, as well as 24 other draft Psy genome sequences using tBLASTn. Homologous DNA sequences were extracted and examined for truncations, frameshifts, contig breaks (usually caused by the presence of transposases or other multi-copy elements disrupting the coding sequences), and chimeric proteins. Sanger sequencing was used to fill contig gaps in Pav T3SE orthologs and to confirm frameshift mutations and transposon insertions using primers flanking each gap. Sequences lacking frameshifts were translated to amino acid sequences, aligned using MUSCLE, and back-translated to DNA alignments using TranslatorX [43]. Sequences with frameshifts

were added to the nucleotide alignments using MAFFT [46]. Phylogenies were inferred for each alignment using PhyML. Gains and loss of each T3SE family was mapped onto the core genome phylogeny by identifying clades in each T3SE gene tree that are congruent with the core genome phylogeny, allowing for gene loss in some lineages. Divergence times were estimated for the most recent common ancestor of each of the Pav lineages and for P. syringae as a whole using the MLSA dataset from Wang et al.[6]. This included partial sequences of four protein-coding genes for ten phylogroup 1 Pav strains and twelve phylogroup 2 Pav strains, as well as 110 additional P. syringae strains. Analyses were carried out using an uncorrelated lognormal relaxed molecular clock in BEAST Cyclin-dependent kinase 3 v1.6.2 [47] with unlinked trees, and substitution models, allowing for recombination between loci. The HKY substitution model was used with gamma-distributed rate variation, with separate partitions for codon positions 1 + 2 and for third positions. Substitution rates were set to published rates based on the split of Escherichia coli and Salmonella[22] and the emergence of methicillin resistant Staphylococcus aureus (MRSA) [21]. Two independent Markov chains were run for 50 Million generations and results were combined for parameter estimates.

