De Novo Simple
PEAKS Video Tutorials

Simple Explanation of De Novo Sequencing

The ability to sequence peptides and discover the protein(s) present in a sample has a proven impact on pharmaceutical and biotechnological research. As more and more genomic information becomes available, it becomes crucial to leverage that information with targeted proteomics research.

Determining which proteins are expressed, under what conditions they are expressed, under what conditions they are suppressed and how they interact with the cellular environment gives us a better understanding of how to identify and characterize drug targets. To do this, we identify proteins that result from imposed experimental conditions.

When attempting to identify a protein, or sequence a peptide, scientists traditionally separate proteins in a sample using gel electrophoresis, then digest a protein (with an enzyme of their choice) and extract it from the gel. The digested protein is then sorted by mass, using chromatography and mass spectrometry, to isolate homogenous groups of peptides. In the final stage of mass spectrometry (MS/MS), the peptides, now grouped by peptide (because they all have the same mass) are fragmented. But each peptide will have fragmented differently (this is a random process) and a mass spectrum is constructed. The different fragments have different masses, and it is the masses of these fragments that appear on the MS/MS spectrum.

Masses of the amino acids that form the peptide sequence are known constants. Also, the peptide sequences fragment predictably. This is what allows us to identify peptide sequences from mass spectrometry data.

de novo

Figure 1: An example annotated spectrum and sequence alignment from PEAKS Studio.

For example, if the peptide sequence is VDVEK, the peptide will fragment forming many fragments, among them:

 

The masses of these fragments will appear as peaks in the spectrum (shown in figure 1). PEAKS may see the mass difference between the peaks corresponding to VDVE and VDV is the same as the mass of E, and so assign E to that part of the sequence. By determining the mass difference between all the peaks, and assigning an amino acid residue that corresponds to each mass difference, it is possible to sequence the peptide. This is a simplified example - peptides do not fragment that cleanly, peptides may have been modified after translation (altering their masses), there may be unrelated ions present, and instrument error and noise muddy the waters significantly.

The process described above is called de novo sequencing. The term de novo refers to the fact that this sequence is derived from new data, not by comparing the data to existing protein databases with the hope of finding a match.