inChorus: Multi-Engine Protein ID
PEAKS Video Tutorials

Increase Coverage and Confidence

PEAKS inChorus, a multi-engine protein ID search tool allows users to integrate PEAKS results with additional proteomic tools for greater coverage and confidence. Much like musical notes played in unison, inChorus launches and presents the findings produced by multiple search engines in unison: one simple report where matching proteins and peptides are scored together. This powerful solution provides users with a stable and reliable validation tool for increased coverage!

PEAKS inChorus Workflow

Protein View

Several approaches have been devised to aid scientists in the identification of proteins and peptides from tandem mass spectrometric data. Most involve comparing the masses of peptide fragments to theoretical masses, calculated from protein sequence databases. Because of poor quality data, unforeseen post-translational modifications, and sequence variations, these database search engines are unable to confidently identify all spectra. A search engine may be able to identify only 5% of spectra in a sample. Due to the possibility of false positive assignments, even these peptide assignments must be verified.

Two or more search engines, when used together, not only provide suitable automatic validation for peptide assignments, but can double the number of confident peptide assignments.1

The following chart summarizes each programs ability to assign peptide sequences to spectra by database search, as compared to the consensus results. By using a consensus approach, rather than an individual search engine, there is an increase in the number of spectra that are confidently explained by at least 50%. This increase is solely gained by using agreement between two search engines as the only measure of confidence. In this way low scoring (but nevertheless correct) matches returned by two search engines are given high confidence. These low scoring matches would otherwise have been rejected by any of the single search engines. Furthermore, each search engine returns a small number of unique, high-scoring and correct matches, that we can add to the consensus results to further improve coverage. Concurrence between X!Tandem and Mascot had a high rate of false positives, as such, consensus results where only Mascot and X!Tandem agreed were rejected from analysis.

 

This bar graph summarizes the amount and quality of each program's contribution to the consensus results. Notably, 3079 peptides were determined by consensus between programs. Of these 2981 (97%) were correct. Percentage correctness was high and fairly uniform where two of SEQUEST, PEAKS or X!Tandem were involved. 4-way consensus made up the bulk of the consensus results. X!Tandem was, marginally, the largest individual contributor to consensus results, and consensus results involving PEAKS had the lowest incidence of incorrectness. Evaluation of consensus and correctness on all 37044 spectra took a total of ~8 minutes.

PEAKS inChorus bar graph

 

What does this really mean?

  • The high percentage of correctness among results obtained by consensus between two or more protein identification programs speaks clearly for the advantage of using many search tools together. Automated comparison, even using a script as inefficient as the one used for this analysis, is far quicker than painstaking manual cross-referencing.
  • Gains in coverage of a protein by matching more peptides are another benefit to using more than one protein identification program. Coverage can be gained by considering peptides on which two separate programs agreed, but assigned very low scores. Coverage can also be gained by considering peptides that only one program could identify.
  • Confidence scores provided by individual programs, while useful for result evaluation in some cases, can be extremely misleading in others. Agreement between two protein identification programs may provide more definitive answers. A complex scoring algorithm to evaluate the strength of a consensus is not required.
  • The benefit of this approach is improved sensitivity in identification of peptides from MS/MS data, without sacrificing accuracy.

Not only do PEAKS users get validation and consensus reporting (with the ability to launch multiple new searches within PEAKS), but they also get the industry leading de novo sequencing algorithm, a database search engine, PTM analysis, in one software.

Footnote:

  1. Rogers I, Haskins W. Drastically increased coverage by using four search engines for Protein Identification. ASMS 2006: MP 328.