The VIROME web-application provides summaries of MGOL BLAST hit data for viral metagenome peptides according to each of these environmental features using a weighted scheme. Figure 4 Flow-chart of VIROME environmental unlikely annotation. For each predicted viral metagenome ORF, E-scores (E<0.01) of top-hits against each unique library in the MGOL database are summed. Ratios of E-score distribution for each unique MGOL library are ... This process is illustrated for the ��Ecosystem�� environmental feature (Figure 4). For each viral metagenome peptide, all significant MGOL BLAST hits are considered (E �� 0.001) and the -log E-scores of the top hits against each unique ecosystem are summed.
Subsequently, the ratio of the top hit -log E-score for each individual unique ecosystem to the sum -log E score across all top ecosystem hits is calculated thus providing a weighting of the BLAST homology across ecosystems. The ecosystem having the lowest E-score BLAST hit (i.e., the highest quality hit) would have the largest share of the ecosystem environmental feature characterization for an individual viral metagenome peptide. Subsequently, the weighted analysis of each peptide having similarity to a MGOL sequence(s) can be summed and used to characterize an entire library by a given environmental feature. Because this weighted scoring system considers all significant MGOL hits and not just top BLAST hits, it provides a robust picture of the proportions of viral genetic diversity that are specific to a given environmental context or more broadly shared across contexts.
It is often true that the largest weighted frequency for a given MGOL environmental feature is similar to that of the query library. For instance, a query viral library from the Chesapeake Bay, which itself would be defined as an ��Estuary�� ecosystem, would show ��Estuary�� as the largest weighted fraction of MGOL hits according to the ��Ecosystem�� environmental feature. This common observation indicates that many viral genes show specificity to a particular environmental context, supporting reports from previous viral metagenomic studies [30,31]. Implementation The sequence quality (Figure 1A) and sequence analysis (Figure 1B) components of the VIROME bioinformatics pipeline are run using a workflow management system called Ergatis .
Ergatis has direct access to the executable component scripts and algorithms that comprise the pipeline Cilengitide and can execute computation locally or on a computational grid running Sun Grid Engine. Data from the sequence processing and BLAST analysis components are stored in a MySQL database. Subsequent analyses of these data, which assign viral metagenome peptides to VIROME categories and summarize the distribution of these peptides according to functional or environmental criteria, are done using the MySQL database and custom scripts.