The pathway from gene to protein is not a direct route, and alternate processing steps can result in multiple proteins, or “proteoforms”, from a single gene. These proteoforms may differ in their primary amino acid sequences, or include modifications with complex sugars, which can ultimately affect how they function. Characterizing and understanding the variation in clinically relevant proteins, such as antibodies, can help provide information about disease progression and treatment responses that genetic information alone cannot. However, the precise sequencing of highly variable regions of antibodies has always presented a significant challenge even with the advancement of LC-MS/MS-based technologies. A recent paper in Analytical Chemistry by Dupre and colleagues describes how complimentary MS and data analysis methods can be used to achieve near-complete sequence coverage and serve to catalogue the variation in monoclonal antibody proteoforms within individual clinical samples. The combined methods presented in this workflow, and the thorough description of the data interpretation process make this paper a must read for any researcher interested in antibody characterization.
How was PEAKS used?
PEAKS Studio was used for de novo peptide sequencing and concatenation. The assembly of the overlapping “bottom-up” sequence data was completed using the ALPS system (as described in Tran et al., 2016), which integrates PEAKS de novo sequence data, database search results, confidence scores, and homology searches to improve the accuracy of assembly. Importantly, the concatenated sequences were checked against the measured intact mass, or “top-down” data, for each proteoform to confirm correct assembly.
Mathieu Dupré, Magalie Duchateau, Rebecca Sternke-Hoffmann, Amelie Boquoi, Christian Malosse, Roland Fenk, Rainer Haas, Alexander K. Buell, Martial Rey, and Julia Chamot-Rooke. De Novo Sequencing of Antibody Light Chain Proteoforms from Patients with Multiple Myeloma. Analytical Chemistry 2021 93 (30), 10627-10634. doi:10.1021/acs.analchem.1c01955
In multiple myeloma diseases, monoclonal immunoglobulin light chains (LCs) are abundantly produced, with, as a consequence in some cases, the formation of deposits affecting various organs, such as the kidney, while in other cases remaining soluble up to concentrations of several g·L–1 in plasma. The exact factors crucial for the solubility of LCs are poorly understood, but it can be hypothesized that their amino acid sequence plays an important role. Determining the precise sequences of patient-derived LCs is therefore highly desirable. We establish here a novel de novo sequencing workflow for patient-derived LCs, based on the combination of bottom-up and top-down proteomics without database search. PEAKS is used for the de novo sequencing of peptides that are further assembled into full length LC sequences using ALPS. Top-down proteomics provides the molecular masses of proteoforms and allows the exact determination of the amino acid sequence including all posttranslational modifications. This pipeline is then used for the complete de novo sequencing of LCs extracted from the urine of 10 patients with multiple myeloma. We show that for the bottom-up part, digestions with trypsin and Nepenthes digestive fluid are sufficient to produce overlapping peptides able to generate the best sequence candidates. Top-down proteomics is absolutely required to achieve 100% final sequence coverage and characterize clinical samples containing several LCs. Our work highlights an unexpected range of modifications.