False Discovery Rate (FDR) Tutorial

Home / False Discovery Rate (FDR) Tutorial

Introduction

Software has been routinely used to identify peptides from mass spectrometry data. Just like that scientific experiments need to be conducted with controls, software’s peptide identification results also need to be statistically validated to avoid false positives. For today’s peptide identification, the most accepted result validation method is through false discovery rate (FDR). This article explains what FDR is; how it is practically calculated; and a few common mistakes in the use of FDR controls.

Estimating FDR with the Target-Decoy Method

In practice, it is hard to tell which PSM is false – otherwise those false PSMs can be removed by the algorithm to achieve zero false discoveries. Therefore, the target-decoy method [2] has been widely used in practice to estimate the FDR. In this method, the software is used to search the concatenation of a target database and a decoy database with the same size. If the decoy is constructed properly, the software’s false identifications will be evenly distributed in the target and decoy databases. Since all the decoy identifications are false, FDR can be estimated by FDR = (# Decoy Hits) / (# target hits).


False Discovery Rate 3

Figure 3: With a properly constructed decoy, the false identifications distribute evenly on the target and decoy. Thus, the amount of decoy hits can be used to estimate the FDR.
 

 

The Decoy Fusion Method

There is a simple fix to avoid the first two common mistakes — The PEAKS DB paper [1] proposed a decoy fusion method. Instead of concatenating the target and decoy databases together, the decoy fusion method concatenate the decoy and target sequences of the same protein together as a “fused” sequence (Figure 5). This simple change makes some meaningful differences. For the two round search problem, the target and decoy lengths are still the same in the second round. For the protein score problem, the same amount of bonus will be equally applied to the target and decoy parts of the same fused sequence. Thus, the “same size” and “even distribution” prerequisites are recreated; and the FDR is again estimated accurately. The built-in result validation of the PEAKS software uses this decoy fusion method.

False Discovery Rate 5

Figure 5: The decoy fusion method “fuses” the target and decoy sequences together. Thus, the target and decoy sequences are guaranteed to have the same length even a two-round search algorithm is used.

Bitnami