The SPIDER algorithm tries to match the de novo sequence tags with the database proteins. When a significant similarity is found, the algorithm tries to use both de novo sequencing errors and homology peptide mutations to explain the differences (Figure 1). More specifically, it reconstructs a “real” sequence to minimize the sum of de novo errors between the real sequence and the de novo sequence, and homology peptide mutations between the real sequence and the database sequence.
It should be pointed out that the general homology tool such as BLAST is not the best option for searching with de novo sequence tags. It is very common that some fragment ions are missing from a peptide’s MS/MS spectrum, leading to possible de novo sequencing errors. Thus, an appropriate de novo tag homology search should tolerate common de novo sequencing errors such as (AT/TA) and (N/GG). However, being designed for a different purpose, BLAST penalizes those errors too much and may significantly reduce the search sensitivity . Moreover, BLAST will not attempt to reconstruct the real peptide sequence.
Besides the apparent mutation detection and cross-species search function, a very useful application of SPIDER is to use it iteratively to sequence a complete protein (e.g. antibody sequencing). This is achieved by: