RHybridFinder: An R package to process immunopeptidomic data for putative hybrid peptide discovery

The immune system recognizes foreign molecules from pathogens or tumour cells as “danger signals”, which in turn initiates a response to rid the body of the offending agent. However, in the case of proteins, the specific “danger signal”, or antigen, that triggers the immune response can not necessarily be predicted from an intact protein sequence. This is because the proteins are first broken down into peptides and presented on the surface of specialized immune cells via major histocompatibility complex (MHC) molecules. This is further complicated by the finding that these peptides can also be joined together to form unique peptide antigens not present in the original protein, known as proteasomal spliced peptides (PSPs). Immunopeptidomics is the study of the processed peptides by LC-MS/MS and serves as a tool for the identification of novel antigens for the facilitation of vaccine development, immunotherapy, and the understanding of various disease states. This paper by Saab and others presents a novel statistical computing method called RHybridFinder (RHF) that works with PEAKS de novo sequencing to create custom “Hybrid Proteome” databases that address the unpredictable nature of the un-spliced and spliced peptide antigens. This method was used in a primary research article presented by the Purcell laboratory in Science Immunology in 2018 (PMID: 30315122).

How was PEAKS used?

Full details of the analysis in PEAKS Studio can be found in the 2018 paper in Science Immunology and the 2021 STAR Protocols paper. In brief, LC-MS/MS data from a sample, e.g., purified MHC–bound peptides, is searched in PEAKS Studio using the database workflow and a standard database for the sample species. The database-matched peptides and high scoring de novo peptides are then important into R statistical software for analysis using the RHybridFinder function. This process generates a FASTA file representing possible linear (i.e., a direct match to database protein), and spliced peptides (i.e., fragment combinations that match combined sequences from either one (cis-sliced) or two (trans-spliced) peptides)). Importantly, any peptides that do not fit into one of these categories are excluded, thus removing peptides that are not a result of antigen processing (i.e., no biological explanation) and focusing the search space for subsequent use of the “Hybrid” proteome database for analysis of MHC-bounds peptides.

Frederic Saab, David J. Hamelin, Qing Ma, Kevin A. Kovalchik, Isabelle Sirois, Pouya Faridi, Chen Li, Anthony W. Purcell, Peter Kubiniok, Etienne Caron. RHybridFinder: An R package to process immunopeptidomic data for putative hybrid peptide discovery. STAR Protocols. 2021. 2. doi:10.1016/j.xpro.2021.100875.


Identification of proteasomal spliced peptides (PSPs) by mass spectrometry (MS) is not possible with traditional search engines. Here, we provide a protocol for running RHybridFinder (RHF), an R package for the computational inference of putative PSPs detected by MS. RHF extracts high confidence scored de novo sequenced peptides identified by PEAKS software. Those peptides are then matched to protein databases to infer cis- or trans-spliced major histocompatibility complex (MHC)-associated peptides. RHF is relatively fast and straightforward. PSPs have to be validated experimentally.