In a tandem mass spectrometer, the peptide is fragmented along the peptide backbone and the resulting fragment ions are measured to produce the MS/MS spectrum (Figure 1). Depending on the fragmentation methods used, different fragment ion types can be produced. The most widely used fragmentation methods today are Collision-Induced Dissociation (CID) and Electron-Transfer Dissociation (ETD). CID produces mostly b and y-ions; and ETD produces mostly c and z-ions.
Figure 1. In a CID MS/MS, many copies of the same peptide are fragmented at the peptide backbone to form b and y ions. The spectrum consists of peaks at the m/z (mass to charge) values of the corresponding fragment ions. A good quality spectrum often contains many (but not necessarily all) of the theoretical fragment ions.
The main idea of de novo sequencing is to use the mass difference between two fragment ions to calculate the mass of an amino acid residue on the peptide backbone. The mass can usually uniquely determine the residue. For example, the mass difference between the y7 and y6 ions in Figure 1 is equal to 129, which is the mass of residue E. Similarly, the next adjacent residue between y6 and y5 can be determined as L by the mass difference. Such a process can be continued until all the residues are determined. A mass table of amino acids is provided for reference.
Thus, if one can identify either the y-ion or b-ion series in the spectrum, the peptide sequence can be determined. However, the spectrum obtained from the mass spectrometry instrument does not tell the ion types of the peaks, which require either a human expert or a computer algorithm to figure out during the process of de novo sequencing. During this process, a few factors can cause difficulties:
These factors can cause de novo sequencing to figure out only a partially correct sequence tag from the spectrum.
Manual de novo sequencing requires human experts and is very time consuming. A reliable auto de novo sequencing solution saves precious human time and greatly reduces the labor cost in labs. Automated de novo sequencing has been extensively studied in the bioinformatics community and multiple algorithms have been developed. Although the basic principle used by computer algorithms is the same as manual de novo sequencing, computer algorithms usually carry out the computation in a very different procedure than manual analysis.
First released in 2002, PEAKS Studio software has become the industrial standard software for automated de novo sequencing, and is well known for its accuracy, speed, and ease of use. The following video highlights the de novo sequencing features of PEAKS Studio.
De novo sequencing was historically thought to be slow. Therefore it has been mostly used when the protein database was unavailable. However, with recent development in computer algorithms such as PEAKS, speed is no longer an issue. This makes de novo sequencing a viable choice for every mass spectrometry analysis in proteomics. Even when a database is available, de novo sequencing can contribute to peptide identification in the following ways.
These considerations have been employed in the standard workflow of PEAKS Studio as illustrated in Figure 2.
Figure 2. The standard PEAKS Studio workflow to take advantage of de novo sequencing.