|
Here is a collection of BSI research conducted using PEAKS.
|
Zhang J, Xin L, Shan B, Chen W, Xie M, Yuen D, Zhang W, Zhang Z, Lajoie G, Ma B. PEAKS DB: De Novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics. 2011 Dec 20. [Epub ahead of print] |
|
Many software tools have been developed for the automated identification of peptides from tandem mass spectra. The accuracy and sensitivity of the identification software via database search are critical for successful proteomics experiments. A new database search tool, PEAKS DB, has been developed by incorporating the de novo sequencing results into the database search. PEAKS DB achieves significantly improved accuracy and sensitivity over two other commonly used software packages. Additionally, a new result validation method, decoy fusion, has been introduced to solve the issue of over-confidence that exists in the conventional target-decoy method for certain types of peptide identification software. |
|
Shan B, Xin L, Ma B. Integrating de novo Sequencing and Database Search for Peptide Identification. ABRF 2012: 116 |
|
Peptide identification with high sensitivity and accuracy is vital in mass spectrometry-based proteomics. Database searching is the primary method for identifying tandem mass spectra. Unfortunately, standard database searching is limited to the identification of spectra for which peptides are present in the database, preventing the identification of peptides from mutated or alternatively sliced sequences. De novo sequencing has the ability to provide alternative peptide identifications, as it does not require a protein database; however, identification without a database potentially reduces the accuracy. One approach to increase confidence of peptide identification is through high resolution tandem mass spectrometry on both precursor and fragment steps. A workflow is presented to combine de novo and database search for peptide identification on high resolution data. |
|
Xie M, Zhang J, Xin L, Shan B, Ma B. A Robust and Effective Strategy for Combining Results of Multiple Peptide Identification Engines. HUPO 2011: P1397. |
|
Many software packages have been developed for identifying peptides from mass spectrometry data. Their abilities are often complementary to one another. It is therefore useful to combine multiple search engines’ results to
improve the overall peptide identification performance.
Empirical statistical methods have been developed for unifying the scores of different engines and combining the results together. Implementations of these methods include the Trans-Proteome Pipeline [1] and the Scaffold software [2]. While these methods have contributed greatly to proteomics research, the complexity of the statistical model makes it difficult or impossible to add a new search engine by an end-user.
We propose a simple model for combining multiple engines’ results and demonstrate its effectiveness. |
|
Zhang J, Shan B, Xin L, Ma B. Identifying More Peptides at a Lower False Discovery Rate with PEAKS DB Software. HUPO 2011: P1363. |
|
In mass spectrometry based proteomics, researchers frequently face the dilemma between keeping more identified peptides and maintaining a lower false discovery rate (FDR). Such situation can only be improved with new analytical software that uses more accurate scoring functions to better separate the true and false identifications. Here we present a new software tool, PEAKS DB, for identifying significantly more peptides with lower FDR than other software commonly in use. |
|
Shan B, Zhang Z, Ma B. ETD is Better, Period. HUPO 2011: P1211. |
|
Electron-Transfer Dissociation (ETD) is widely known as a better fragmentation technology than Collision Induced Dissociation (CID) for identifying post-translationally modified peptides and peptides with higher charge states. However, for general proteomics study that intends to identify all peptides, regardless of the modifications and charge states, ETD has not been demonstrated as a superior method than CID. In this abstract we show that with the recent advancement of peptide identification software using ETD MS/MS data, ETD is indeed better even for general peptide identification. |
|
Zhang J, Xin L, Ma B. More Accurate Control of the False Discovery Rate in Mass Spectrometry Based Peptide Identification. HUPO 2011: P1151. |
|
The large volume of mass spectrometry data requires a reliable automated method for quality control of the peptide identified by software and submitted to public databases. The commonly used target-decoy method estimates the false discovery rate (FDR) of the software’s results. However, in this abstract we illustrate that the target-decoy method makes some unrealistic assumption about the analytical software, and is critically over-confident. We further propose a decoy-fusion method to solve this problem. |
|
Shan B, Xin L, Xie M, Ma B. PEAKS DB: New Software for Substantially Improved Peptide Identification from Orbitrap ETD Mass Spectrometry. ASMS 2011: M2170. |
|
Two new techniques, Orbitrap and ETD, are being rapidly adopted in mass spectrometry based proteomics. The adoption of these new technologies requires new analytical software to take full advantage of the new data types. In this study we present such new software, PEAKS DB, for peptide identification with Orbitrap ETD MS/MS data. The new software outperforms other tools commonly in use. Moreover, the combination of the new tool with other existing tools together provides even better results. |
|
Zhang J, Ma B. De Novo Sequencing vs. Database Search. ASMS 2011: M2336. |
|
De novo sequencing and database search with un-interpreted spectra are widely known as the two different approaches for peptide identification from MS/MS. De novo sequencing is the only choice for novel peptide identification. However, for peptides in a sequence database, researchers would immediately assume that the database search approach provides a better performance. However, in this study we show the opposite with today’s standard software, PEAKS de novo sequencing and Mascot database search, respectively. We first demonstrate that de novo sequencing performs remarkably well in terms of determining a significantly long sequence tags. Then we show that an approximate de novo sequence tag search outperforms the conventional database search approach even when the target peptides are in a known sequence database. |
|
Shan B, Xin L, Xie M, Ma B. Improvement in Analytical Software Makes a Difference on the Decision Tree Driven. ASMS 2011: M3133. |
|
Purpose: To develop an alternative fragmentation technique to improve peptide identification.
Method: Use data-dependent decision tree logic to determine the fragmentation method most likely to result in a successful identification. Result: New developments in analytical software improved performance of the decision tree-driven CID/ETD fragmentation and demand adjustment of decision tree parameters. |
|
Shan B, Xin L, Xie M, Ma B. New Computational Method for Identifying Peptides with Unspecified Modifications. ASMS 2011: T3056. |
|
Purpose: To identify modified peptides from tandem mass spectrometry (MS/MS), without the need to specify the expected post-translational modifications (PTM) types. Instead, all known PTM types from the Unimod database are used.
Methods: 1. PeaksPTM utilizes a two-pass search approach and a new scoring function to identify modified peptides by turning on all modifications from Unimod database. 2. A simple but effective strategy is used to combine multiple PTM search engines’ results to
further improve the identification rates.
Results: 1. This consensus strategy helps to identify more modified peptides with high confidence.
2. PeaksPTM contributes more identifications than other three widely used search engines. |
|
Yuen, D. SPIDER: Reconstructive Protein Homology Search with De Novo Sequencing Tag. University of Waterloo, April 24, 2011. |
|
Peptide identification is a central task in mass spectrometry based proteomics. Existing approaches include: (1) protein sequence database search with uninterpreted spectra, (2) de novo sequencing, (3) database search of de novo sequence tags, and (4) spectral library search. These approaches are usually performed separately according to the circumstances. In this abstract, we present the PEAKS-DB software that combines the first three approaches to significantly improve the sensitivity and reduce the FDR of peptide identification. |
|
Zhang J, Xin L, Shan P, Ma B. PEAKS DB - Substantially Improved Peptide Identification. ASMS 2011: Sanibel. |
|
Peptide identification is a central task in mass spectrometry based proteomics. Existing approaches include: (1) protein sequence database search with uninterpreted spectra, (2) de novo sequencing, (3) database search of de novo sequence tags, and (4) spectral library search. These approaches are usually performed separately according to the circumstances. In this abstract, we present the PEAKS-DB software that combines the first three approaches to significantly improve the sensitivity and reduce the FDR of peptide identification. |
|
Xin L. Probability Scoring System for De Novo And Protein Identification with Tandem Mass Spectrometry. University of Western Ontario, 2010. |
|
Based on the PEAKS raw ion score, we propose some new features to distinguish correct matches from false matches. Then we build statistical models on these features and a probability scoring system is established. Not only does the new scoring function provide the automated result validation, but also it improves the accuracy of the PEAKS algorithm. In addition we propose a novel local search method for improving the de novo} sequencing algorithm of PEAKS. The thesis is divided into two parts according to two different approaches. In the first part, we calculate a probability score for each amino acid from \textit{de novo} sequencing results. In the second part, probability scoring systems are established for both peptide matches and protein hits. Experimental results show that new probability scoring system outperforms PEASK4.5 scoring system in both probability accuracy and the ability to distinguish correct matches from false matches. |
|
Han X, Shan P, Ma B. Precursor Mono-Isotopic Mass and Charge Determination with Almost 100% Accuracy. ASMS 2010: WP001. |
|
Design an algorithm to determine the correct charge state and mono-isotopic mass of the precursor ion from high-resolution MS data. By generating all the potential isotopic envelope candidates in the given m/z range, we compare their theoretical isotopic distributions with the observed distribution, and pick up the most similar one. This greatly increases the accuracy of the mono-isotopic mass and charge state of the precursor ion. |
 |
He L, Ma B. No News is Good News: de novo Determination of Amino Acids when Peaks are Missing. ASMS 2010: MP018. |
|
To improve the accuracy of de novo sequencing through the determination of amino acids when peaks are missing. Using the probability that the fragment ions between a pair of amino acids are missing (learned from the NIST database), this probability is used to determine the local sequence when peaks are missing. The end result is an increase in the prediction correctness of de novo sequencing. |
|
Xin L, Shan P, Ma B. Determining the False Discovery Rate for Peptide Identification without a Decoy Database. ASMS 2010: MP025. |
|
The ability to estimate the FDR (false discovery rate) of MS/MS results are of crucial importance. Instead of the timely (although popular) method for FDR estimation by running a search on both the target and decoy databases, we propose to use the "second best matches" of the spectra on the target database to learn the distribution of the false matches, and use that distribution to estimate the FDR of the first matches. This method showed excellent performance without any penalty on searching speed. |
|
Shan P, Chen W, Ma B. Systematic Assessment of the Reproducibility of Relative Quantification Based on LC-MS with Replicates. ASMS 2010: ThP016. |
|
Knowing the reproducibility of the system provides more confident expression ratios, which are used as criteria for differential analysis. An understanding of the system of reproducibility can be achieved through a reproducibility assessment, which will minimize the number of false positives and negatives in differential analyses. This approach ensures greater experimental success via the incorporation of replicates. |
|
Liu X, Shan B, Xin L, Ma B. Better score function for peptide identification with ETD MS/MS spectra. BMC Bioinformatics. 2010 Jan 18;11 Suppl 1:S4. |
|
Background: Tandem mass spectrometry (MS/MS) has become the primary way for protein identification in proteomics. A good score function for measuring the match quality between a peptide and an MS/MS spectrum is instrumental for the protein identification. Traditionally the tobe-measured peptides are fragmented with the collision induced dissociation (CID) method. More recently, the electron transfer dissociation (ETD) method was introduced and has proven to produce better fragment ion ladders for larger and more basic peptides. However, the existing software programs that analyze ETD MS/MS data are not as advanced as they are for CID.
Results: To take full advantage of ETD data, in this paper we develop a new score function to evaluate the match between a peptide and an ETD MS/MS spectrum. Experiments on real data demonstrated that this newly developed score function significantly improved the de novo sequencing accuracy of the PEAKS software on ETD data.
Conclusion: A new and better score function for ETD MS/MS peptide identification was developed. The method used to develop our ETD score function can be easily reused to train new score functions for other types of MS/MS data. |
|
Liu X, Han Y, Yuen D, Ma B. Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. Bioinformatics. 2009 Sep 1;25(17):2174-80. Epub 2009 Jun 17. |
|
Motivation: The bottom-up tandem mass spectrometry (MS/MS) is regularly used in proteomics nowadays for identifying proteins from a sequence database. De novo sequencing software is also available for sequencing novel peptides with relatively short sequence lengths. However, automated sequencing of novel proteins from MS/MS remains a challenging problem. Results: Very often, although the target protein is novel, it has a homologous protein included in a known database. When this happens, we propose a novel algorithm and automated software tool, named Champs, for sequencing the complete protein from MS/MS data of a few enzymatic digestions of the purified protein. Validation with two standard proteins showed that our automated method yields greater than 99% sequence coverage and 100% sequence accuracy on these two proteins. Our method is useful to sequence novel proteins or "re-sequence" a protein that has mutations comparing with the database protein sequence. |
|
Chen C, Shan P, Zhang J, Bonneil E, Voyer J, Lajoie G, Thibault P, Ma B. New Algorithm for Label-Free Protein Quantification. ASMS 2009: MPB 043. |
|
Label free quantitative proteomics analysis is a flexible approach enabling the profiling of protein expression across different datasets. The success of this approach relies not only on the efficient detection of peptides over a wide range of ion abundance but also on the capability of correlating their precise coordinates in different LC-MS runs. Several approaches have been previously studied to achieve these goals including the use of normalized LC retention time for data acquired on high resolution mass spectrometry instruments. PEAKS Q offers this new algorithm as its approach for label-free quantification. We report a new approach termed "feature vector" that analyzes multiple samples simultaneously to increase the accuracy of feature detection and the protein coverage. |
|
Xin L, Shan P, Xie M, Lajoie G, Ma B. PTM Finder Based on PEAKS De Novo Sequencing Result. ASMS 2009: MPL 295. |
|
Identification of post-translational modification (PTM) by tandem mass spectrometry is still a major challenge in proteomics, especially if the PTMs are unknown. In typical existing software, tandem mass spectra are searched against an enlarged-database that includes all possible combinations of modified peptides. Because the search time grows exponentially with the number of allowed modifications, only a small number of known variable modifications can be included in each search. We propose a new approach based on de novo sequencing results to identify unknown variable PTMs from an MS/MS dataset. |
|
Liu X, Shan P, Ma B. Modeling ETD Fragmentation with Bayesian Network for Peptide Identification. ASMS 2009: ThPA 024. |
|
For each test spectrum-peptide, we randomly mutate the peptide sequence by replacing three consecutive residues with three other residues with the same total mass. If our model is good, then it should give the mutated sequence a lower score than the real sequence. By using the score function described above, 97.3% of the mutated sequences have scores lower than or equal to the real sequence. We also compared our score function with that of PEAKS Studio 5.0. We used PEAKS Studio 5.0 to do de novo sequencing for each test spectrum. From the resulting peptide of PEAKS Studio 5.0, we use a local search method to find a better peptide based on our score function. PEAKS Studio 5.0 was able to correctly compute 40.6% of all the amino acids in the test peptides. Our strategy improved this to 48.4%. |
 |
Shan P, Xin L, Yang W, Lajoie G, Ma B. Automated Multiple Round Searches to Increase Coverage of Peptide-Protein Identification. ASMS 2009: ThPA 003. |
|
One of the challenges researchers face in mass spectrometry-based proteomics investigations is that there are often a significant amount of high-quality spectra remaining un-interpreted due to PTMs and errors in MS/MS data and protein sequence databases. Specifying many variable PTMs in the protein identification software can increase the coverage, but also drastically slow down the searching speed. This dilemma can be partially solved with a two-round search approach: the first round searches a large database with only a few PTMs, followed by a second round on only the identified proteins but with many variable PTMs specified. However, this still requires a human's knowledge about the variable PTMs in the sample, in order to specify them correctly in the second round search. We propose to use PEAKS de novo sequencing results to automatically discover the variable PTMs existing in the sample. In addition, we propose a workflow for multi-round searches which results in higher protein coverage. |
|
Ma, B. and Lajoie, G. De Novo Interpretation of Tandem Mass Spectra. Current Protocols in Bioinformatics. 25:13.10.1–13.10.8.
March 1, 2009. |
|
De novo sequencing is an effective method for identifying unknown peptide sequences from their tandem mass spectra. This unit briefly introduces how this can be done manually. A protocol for using the PEAKS online software for automated de novo sequencing is described. Finally, we show how to use the PEAKS scores to validate the de novo sequencing results. |
|
Xie M, Zhang W, Yang W, Chen W, Lajoie G, Ma B. PEAKSOnline: A Free MS/MS de novo Sequencing and Protein ID Online Public Server. ASMS 2008: WP 629. |
|
By distributing the computation to multiple computers, de novo sequencing and database search throughput are increased remarkably. We describe a free server for high-throughput MS data interpretation supporting both de novo sequencing and database search approaches. |
|
Ma B, Yuen D. SPIDER: Novel Scoring Function Improves Homology Searches using MS/MS de novo Sequencing Results. ASMS 2008: ThP 648. |
|
Proteomic MS/MS database search algorithms rely upon existing databases and are vulnerable to mutation differences between the protein sample and the database used. The process of de novo sequencing can result in mass segment replacement errors. In a case where both of these would typically yield low confirmation, our algorithm as previously introduced, SPIDER1, finds database sequences that are homologous to the real peptide, by using the partially correct sequence tag and has proven accurate for correct peptide reconstruction from the partially correct tag and the homologous database sequence. The primary objective is to develop a new score that is statistically meaningful, and can be compared across different spectra, experiments, or instruments. When the correctness probability of each amino acid in a de novo sequencing result is known, the score should also take advantage of it. Secondly to develop an efficient algorithm, based on the new score, to search for homologous peptides and reconstruct the real peptides from the partially correct de novo sequencing result. |
|
Xin L, Lajoie G, Hughes C, Ma B, Smith D. New Quantitation Software Package Based on PEAKS Protein ID. ASMS 2008: TP 653. |
|
Isotopic labeling for protein expression analysis has become routine for quantitative proteomics studies. Reagents such as iTRAQ, ExacTag and ICAT are common tools used in this area. Label-free techniques can also be used in cases where isotopic labeling is impractical to perform. As a subsequent step to protein identification, some search engines provide modules for quantitation analysis based on these techniques. Here, we present a new software package designed to automatically quantify proteins from experiments using isotopic labeling or label-free techniques based on PEAKS protein identification results. |
|
Xin L, Lajoie G, Ma B. New Method for the Validation of de novo Sequencing Results. ASMS 2008: WP 645. |
|
Since de novo sequencing does not depend on protein databases, the validation and confidence methods developed in the database search approach such as the reverse-database query cannot be applied. Here we present a general validation algorithm which uses any de novo sequencing scores to calculate the correctness probabilities of each amino acid in the de novo sequencing results. In addition to result validation, these probabilities can also be used in other protein identification software such as SPIDER. |
|
Yuen D, Ma B, Rogers I. Improving de novo Sequencing Accuracy for Ion Trap data in PEAKS Software. ASMS 2007: MPK 175. |
|
De novo sequencing from MS/MS data is a well used method for sequencing peptides from organisms of unknown sequences, directly from their MS/MS spectra, or identifying peptides that vary from their database equivalents by some modification or mutation. De novo sequencing programs typically require scoring functions that evaluates the fitness between a peptide sequence and the spectrum. Ma et al demonstrated that two scoring functions, used together, can improve de novo sequencing accuracy, but relative importance of each scoring function was not thoroughly evaluated. In this work, the optimal weighting between multiple de novo sequencing score components is trained on a large dataset, and is demonstrated to provide a significant accuracy improvement in PEAKS Studio. |
|
Yang W, Yuen D, Ma B, Rogers I. Improving Protein Coverage by de novo Sequence Homology Searching with SPIDER. ASMS 2007: MPK 176 |
|
Database search of tandem-MS spectra has been a well used technique for protein identification. But several proteomics problems require more coverage and more scrutinous results than this technique can provide. Sequence homology searching based on peptide de novo sequences allow us to identify peptides that are not present in a database. This approach, when coupled with standard search techniques means we can better explain the data and improve coverage on the identified proteins. Alternatively, we can better explain peptides from organisms that are not present in any database1. In this work we build and evaluate a workflow involving PEAKS auto de novo sequencing2 and SPIDER3, a unique tool for peptide sequence tag based homology searching. |
|
Yuen D, Ma B, Rogers I. Peptide Sequence Reconstruction from de novo Sequences and their Homologues. ASMS 2007: ThPP 269 |
|
Because protein sequence databases will never be complete, contain gene prediction errors, and can't account for mutations between individuals, it is often necessary to derive a peptide sequence from MS/MS data where no exact match can be found in the database. De novo sequencing provides a useful technique for sequencing peptides without a database, but completely correct sequences are difficult to find. However, when coupled with a sequence tag homology search like SPIDER1, similar peptides can be returned from a protein sequence database. Here we present a technique for constructing the real peptide sequences from de novo sequences derived by PEAKS Studio2 and homologous entries from a database. |
|
Wang J, Ma B, Chen W. Disulfide bonded Dipeptide Analysis with PEAKS and Q-TOF Mass Spectrometry. ASMS 2007: MPK 171 |
|
Proteins and peptides are commonly studied using mass spectrometry; however, the most commonly used tools for MS data analysis are built with the assumption that peptides are linear. Disulphide bonds, creating complexes involving two or more peptides bonded together, cause problems for this kind of analyses. Chemical reduction, using 1,4-dithiothreitol (DTT), can break the disulfide bonds, making the peptides acceptable for standard analysis. But since this makes determination of the disulphide bond location more ambiguous, analysis of intact dipeptides becomes necessary. Also, since chemical reduction can be incomplete, even reduced samples can benefit from this analysis. Here we present an algorithmic solution for the analysis of MS/MS data of disulfide bonded dipeptides. |
|
Xu C, Ma B. Software for computational peptide identification from MS-MS data. Drug Discov Today. 2006 Jul;11(13-14):595-600. |
|
Protein identification in biological samples is an important task in drug discovery research. Protein identification is nowadays regularly performed by tandem mass spectrometry (MS-MS). Because of the difficulty of measuring intact proteins using MS-MS, typically a protein is enzymically digested into peptides and the MS-MS spectrum of each peptide is measured. Computational methods are then invoked to identify the peptides, which are later combined together to identify the protein. The most recognized peptide identification software packages can be classified into four categories: database searching, de novo sequencing, sequence tagging and consensus of multiple engines. |
|
Ma B, Rogers I. Application Note: PEAKS de novo performance on LTQ Orbitrap data. June 2006. |
|
High resolution, high mass accuracy instruments like Thermo’s LTQ Orbitrap, promise to significantly enhance proteomics analysis. De novo sequencing is one of the applications of peptide mass spectrometry that will be most affected by the increase in data quality. Here the authors present the improvement in results obtainable by PEAKS peptide de novo sequencing when using an LTQ Orbitrap mass spectrometer. During this demonstration of the accuracy of PEAKS de novo sequencing on a Thermo LTQ Orbitrap mass spectrometer, 97% accuracy is achieved. |
 |
Rogers I, Haskins W. Drastically increased coverage by using four search engines for Protein Identification. ASMS 2006: MP 328. |
|
This poster demonstrates the improvement in coverage by using more than one search engine. It should not be viewed as a benchmark comparison of search engines, as the performance shown is dependant on arbitrary score filter values. More important is the low error and high sensitivity when using a sequence tag hybrid approach (PEAKS) and a pure peptide fragment fingerprinting approach (like SEQUEST or MASCOT) together -- regardless of score! |
|
Chen W, Morey J, Rogers I. Filtering out MS/MS spectra of insufficient quality before database searching. ASMS 2006: MP 329 |
|
In studying proteins using liquid chromatography coupled tandem mass spectrometry (LC-MS/MS), researchers are often faced with very large data sets. Since each data set may contain thousands of spectra, a manual inspection of each one becomes impossible. Confounding the problem, electrical noise, poor detection and contaminants scanned by the MS mean that only a small portion of these data are quality MS/MS spectra representing peptides. The following presents a method of filtering out the poor quality spectra prior to de novo sequencing or database searching for protein identification. Database search engines and de novo sequencing tools are adequate in discarding the bad spectra; nevertheless, false positives abound, and plenty of time is wasted analyzing nothing. |
|
Ma B, Lajoie G. Improved positional confidence score in MS/MS peptide de novo sequencing. ASMS 2006: MP 348. |
|
De novo sequencing from MS/MS data is used widely for peptide and protein identification. However, due to the imperfections of the data and/or software, the results are not always reliable. Very often, only partially correct sequences can be obtained by de novo sequencing. If the correct portions of the sequences are known, they can be used as sequence tags to identify the proteins through a homology search. It is therefore very useful for de novo sequencing software to give a positional confidence for each individual amino acid in the peptide it computes from the MS/MS data. We describe here a new method to perform this task. |
|
Chen W, Rogers I. Intact Peptide Charge Determination from Ion Trap MS/MS. ASMS 2006: MP327. |
|
In identifying proteins using tandem mass spectrometry, researchers can match measured masses of peptides, and fragments of peptides, to theoretical masses calculated from a protein sequence database. Because a mass spectrometer measures mass-to-charge ratio (m/z), the peptide's charge (z) must be known to determine the mass used for database searching. When using an ion trap however, a peptide's charge state is often difficult to determine by the usual method: examination of the initial MS survey scan of a peptide. It has become common practice, then, to allow a database search engine to determine the charge on a peptide by choosing the charge that allows the best match to the database. This is a poor practice since, instead of inferring results from the data, we are determining what data will best fit the results. |
|
Yang W, Chen W, Rogers I, Ma B, Bendall S, Lajoie G, Smith D. PEAKS Q: Software for MS-based quantification of stable isotope labeled peptides. ASMS 2006: WP531. |
|
Several mass spectrometry-based stable isotope labeling technologies have been developed for global proteome profiling. These include methods for in vivo labeling, such as 14N/15N and SILAC (Stable Isotope Labeling with Amino Acids in Cell Culture), and in vitro isotope labeling of target peptides at their N/C terminal or at specific residues. In this work we describe a new software, PEAKS-Q, designed to automatically identify and quantify proteins from these isotope labeling experiments. The software is written in Java and includes an intuitive graphical user interface. |
 |
Han Y, Ma B, Zhang K. SPIDER: Software for Protein Identification from Sequence Tags Containing De Novo Sequencing Error. J Bioinform Comput Biol. 2005 Jun;3(3):697-716. |
|
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software. |
 |
Rogers I. Assessment of an Amalgamative Approach to Protein Identification. ASMS 2005. |
|
When studying proteins using mass spectrometry, researchers can identify which proteins are in a sample by matching measured masses to the calculated masses of peptides and sequence tags in a protein sequence database. Because of large databases and experimental data sets, this process is necessarily automated using protein identification software. However, because of instrumental and experimental limitations, analysis is made difficult by noise, contamination and inconclusive data. The problem becomes one of validation. The researcher must accept the software's suggestion and scoring scheme, or spend countless hours manually validating the results. Conclusions based on imperfect data, processed by imperfect software and inferred from non-validated results will always be suspect. |
 |
Ma B, Lajoie G. Improving the de novo Sequencing Accuracy by Combining Two Independent Scoring Functions in PEAKS Software. ASMS 2005. |
|
De novo sequencing from MS/MS data is a standard method for peptide sequencing that does not require the sequences to be in a database, and therefore is best for novel proteins. De novo sequencing is also better at finding PTMs. In addition, when homologues of the novel proteins are in the database, they can be found by sequence homology search after de novo sequencing. Even for proteins in a database, if de novo sequencing computes the correct sequence without looking at the database, the confidence is much higher than simply finding the sequence from the database. A de novo sequencing program typically requires a scoring function that evaluates the fitness between a peptide sequence and the spectrum. The choice of scoring functions affects the program's accuracy significantly. In this abstract we demonstrate that better accuracy can be achieved by combining two independent scoring functions. |
 |
Rogers I, Hendrie C, Li M. Protein ID: Comparing De Novo Based and Database Search Methods. ASMS 2004. |
|
Using the correct for the job is as important in proteomics as it is in any other discipline. When identifying proteins from MS/MS data there are a number of tools to choose from. In the case where the data comes from a well studied organism, the researcher may choose a standard database search tool. In the case where the results from a database search are questionable, some validation is necessary. In a situation where no database program turns up a hit, the researcher must rely on de novo sequencing – be it manually or using automatic de novo software. PEAKS is a powerful and intuitive software package, combining remarkably accurate de novo sequencing with a new approach to protein identification. In this poster we prove PEAKS' new method is able to identify proteins just as well as standard database search software. In this light, we compare Mascot and PEAKS. Further, we show PEAKS to be the validation tool for standard database search software. Finally, and perhaps most importantly, we show PEAKS to be the best automatics de novo sequencing software. |
 |
Liang C, Smith JC, Hendrie C, Li M, Siu M. A Comparative Study of Peptide Sequencing Software Tools for MS/MS. ASMS 2003. |
|
A current bottleneck in proteomics is automated and accurate sequencing of enzymatically cleaved peptides. It is estimated that over two thirds of the MS/MS spectra produced by high end quadrupole-TOF and TOF-TOF instruments in proteomics-research based corporations do not provide useful information [1]. An important contributing factor in this is the lack of high-quality software. The software currently available for MS/MS peptide sequencing mainly falls into two categories: (1) database searching by assigning a peptide sequence based on scoring against a protein (or peptide) database; and (2) de novo sequencing by deriving a (partial) sequence directly from an MS/MS spectrum. This study compares several programs representative of these two categories. |
 |
Ma B, Zhang K, Hendrie C, Liang C, Li M, Doherty-Kirby A, Lajoie G. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom. 2003;17(20):2337-42. |
|
A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we describe a new de novo sequencing software package, PEAKS, to extract amino acid sequence information without the use of databases. PEAKS uses a new model and a new algorithm to efficiently compute the best peptide sequences whose fragment ions can best interpret the peaks in the MS/MS spectrum. The output of the software gives amino acid sequences with confidence scores for the entire sequences, as well as an additional novel positional scoring scheme for portions of the sequences. The performance of PEAKS is compared with Lutefisk, a well-known de novo sequencing software, using quadrupole-time-of-flight (Q-TOF) data obtained for several tryptic peptides from standard proteins. |
|