This video covers the de novo sequencing and database searching, which includes automated summary reporting, de novo only findings, as well as the additional features of homology searching, multi-engine reporting and quantification.
In a typical proteomics mass spectrometry analysis, a researcher needs to identify all peptides that produce good quality MS/MS spectra, whether or not the peptides are in a protein sequence database.
PEAKS is designed to facilitate such analyses. To enable accurate and sensitive peptide identifications, PEAKS provides an integrated tool set that features:
- de novo sequencing: to identify novel peptides
- database search: to identify database peptides
- SPIDER: to find peptides with PTMs and mutations
- inChorus: to increase coverage by combining multiple database search engines
- PEAKS Q module for protein quantification (optional)
In this short video, let’s focus on a typical result that has been prepared with the newly improved PEAKS DB workflow that combines de novo sequencing and database search to increase accuracy and sensitivity.
PEAKS DB & De Novo
The result consists of a result summary, the identified proteins, the identified database peptides, and a list of peptides identified exclusively by de novo sequencing. In the first tab of the results, PEAKS summarizes the results in an easy to read summary view. The summary view is the central place for result filtration and validation. The specification of a filter is as simple as clicking this FDR button, selecting the false discovery rate on this FDR curve, and clicking the apply filter button. The result statistics are then updated dynamically.
First, this bar graph shows the score distribution of the target and decoy hits. PEAKS uses an enhanced target decoy method to estimate the false discovery rate. Here we observe that there are very few decoy hits above the score threshold, indicating highly accurate results.
Second, this scatter-plot compares the precursor mass error with the peptide score. If you are using a high-resolution instrument, you should see that the error is small for the high-scoring peptides and starts to scatter for peptides below the score threshold.
To see the proteins inferred from the identified peptides, we simply select the protein tab located on the left. Take a look at the proteins discovered, along with the peptides associated to each protein and each proteins coverage sequences.
To take a closer look at the peptides confidently identified from the sequence database we select the peptide tab from the left navigation. Reported here are each peptides -10lgP score, annotated spectrum match, ion table, and error map.
Detailed Examination and Navigation
PEAKS also provides very user-friendly ways for you to examine an individual peptide that has been identified. Let’s say I want to check a peptide that contains an oxidized methionine, which has a mass shift of 15.99 daltons. Go to the peptide view, sort by the PTM type, and those peptides with the oxidized methionine will appear at the top of the list. Select an interesting peptide and the spectrum annotation will appear at the bottom. You can conveniently zoom and navigate the spectrum by using your mouse wheel, or, you may focus on a particular area by dragging the mouse and selecting a smaller area.
De Novo Only
The final tab labelled “De Novo Only” displays the results that are identified exclusively by de novo sequencing. Since these spectra do not match any significant peptides hits in the database, they are particularly interesting novel peptides that no other software can find.
As mentioned before, PEAKS studio is an integrated proteomics toolset. In addition to the renowned de novo sequencing algorithm, and the newly improved PEAKS DB module, PEAKS studio also consists of:
- SPIDER – A homology search tool that is designed to match your de novo results directly to the database. This allows you to identify peptides even when you are working with an unsequenced organism or highly variable proteins.
- inChorus – Which combines multiple database search engines’ results to increase the coverage, while comparing different engines’ performance side by side. InChorus is able to simultaneously run and compare: MASCOT, XTandem!, Sequest, OMSSA, and of course PEAKS.
- Quantification module (Optional)PEAKS Q – an optional quantification tool, that quantifies proteins using all common MS and MS/MS labelling methods. These methods can include: iTraq, ICAT, SILAC, Label Free, and other user-defined labels.
PEAKS Identification Walkthrough
A quick walkthrough of how to set up, and interpret a PEAKS Studio identification run. It shows how to import FASTA databases from several different public sources. Then, gives descriptions and tips for all identification parameters. You’ll also learn how to interpret PEAKS DB, PEAKS PTM, SPIDER, and de novo sequencing results.
Welcome, this video will give a quick overview of PEAKS Studio Software. PEAKS provides you with an intuitive way to identify and quantify proteins using tandem mass spectrometry data. It combines clever algorithms that accurately identify and quantify proteins with an easy to use interface. When it comes to protein identification PEAKS has 4 main algorithms: de novo sequencing, PEAKS DB, PEAKS PTM, and SPIDER. In this video I will show you how to run these 4 algorithms from one search page, to quickly identify and filter protein identifications.
PEAKS can be used without a sequence database to perform de novo sequencing, but it is works best when configured with a good sequence database. So first, I’ll show how to easily configure a database. We have a configuration wizard that will guide you through the steps necessary to load a public database. To access it click window and then config wizard. The second page will list all the instrument vendors that PEAKS supports. Click the vendors that apply to you. The instructions for how to load that vendor’s raw data automatically will be given. Click next to continue to database configuration. Four public databases are listed here. Once you’ve selected your databases, click next and the databases will begin downloading automatically. Once the download is complete click ‘install’, then finish. The databases will then be ready for you when you set up a database search. If you already have a database you would like to search, click this configuration button, and then select the databases tab to configure your database. Use the fasta format drop down to select the database format. If none of the options match your format, the MSDB option works well with most databases. Click the validate database button and PEAKS will scan the database to see if it can read it.
You’re now ready to get started with your first identification run. To load data, go to the top left hand corner and select the new project button. This will bring up a page where you can add data files by clicking the add data button. One of the best ways to use PEAKS is to add multiple digests to increase coverage. In this case we’ll add a Glu C and Trypsin digest of the same sample. Add each file to a new sample with this button. Then specify the enzyme, instrument, and fragmentation type for each data file. Doing this helps PEAKS suggest some appropriate parameters for your instrument. Also, PEAKS uses machine learning based on your instrument and fragmentation settings to determine the most likely ions that appear in the spectra. This makes PEAKS more accurate and sensitive.
From here, you have two options. First, you can continue through the setup wizard by clicking data refinement. This will take you through PEAKS DB and quantification set up as well. By doing this you can set up all your parameters at once, leave the computer, and come back to your results. Or, you can click the ‘finish’ button to begin loading your data. In this example I will click the finish button because there are some important points I’d like to show you in the project tree.
Once your data is loaded, up in the top left corner you will see the project tree. Data that has successfully loaded into the project will appear with this green symbol. You can set up searches from here too. Click the project level then the PEAKS DB button to search all the data in the project together. This will give you one combined result for all data files in the project. Click a sample then the PEAKS DB button to search one individual sample. This will allow you to compare results from separate samples. Or, click individual data files then the PEAKS DB button. In this example we will select the project level because we want to combine our digests to get the highest coverage possible.
The first parameter window to come up will be data refinement. Here there are some optional parameters to select. You can merge scans with similar mass and retention times, correct precursor mass and charge states, or filter based on retention time and mass. Most of these are optional but we do recommend that you turn on precursor mass correction. Data refinement will also deconvolute and centroid the MS/MS scans and detect peptide features from the LC-MS information.
Next is the PEAKS DB parameters page. If it is your first time using PEAKS, the suggested parameters for your instrument will be given. If you are returning to PEAKS, your previous search parameters will come up by default. To select the default parameters for the instrument, select the drop down menu in the top right hand corner. First, put in your precursor mass error tolerance, since this is Orbitrap data we will enter 10 ppm. Next, enter the fragment mass error tolerance. We’ll select 0.5 Da in this case because the MS/MS scans were collected in the linear ion trap. Then set the enzyme rules. Since we specified the enzymes when setting up the project we can select ‘specified by each sample’. In this example, this will insure that trypsin rules will be used for the trypsin sample and glu c rules will be used for the glu c sample. The next enzyme options allow you to control the efficiency of the digest. Allow for an incomplete digest by letting one or both ends of the peptide disobey the enzyme rules with this drop down. Or allow for missed cleavages within the peptide here. Next set the PTMS with this button. The built in PTMS are separated into recent, common, uncommon, and artificial lists. Using PEAKS DB, it’s best to allow up to 10 variable modifications in a search. If you are interested in looking at more PTMs it is best to enter these into PEAKS PTM. In this case we will set Carbamidomethylation as fixed because iodoacetamide was used to remove disulfide bonds, and oxidation as a variable modification. You can also specify the maximum number of variable modifications per peptide.
Next, let’s talk about the checkboxes at the bottom, because some very important features can be enabled here. The estimate FDR button will create a decoy-fusion database. This means all proteins in the database will be shuffled and fused to target proteins. This allows PEAKS to accurately remove false positives by predicting the false discovery rate. The ‘find unspecified PTMs’ button will activate PEAKS PTM. By default it will search the 485 naturally occurring modifications in the unimod database. However, if you are only interested in a subset of those PTMs or custom PTMs click the ‘advanced settings’ button and select the PTMs you’re interested in. This is a very powerful feature of PEAKS, in this screen there is no limit to the number of modifications you can search. The third checkbox will activate SPIDER. SPIDER is a homology search tailored to de novo sequencing. It will find peptides that are similar to what’s found in your database but have one amino acid difference. With that in mind, you now know all you need to know to run an identification search with PEAKS. After these parameters are set, click ok, and four different algorithms will be used to search your data: de novo sequencing, PEAKS DB, PEAKS PTM, and SPIDER.
Now, let’s talk about interpreting the results. Notice how four result nodes have appeared in the top left hand corner below the project level. This indicates that all four searches are complete and your results are ready. If you click the SPIDER node, it will contain results from all four algorithms from all data sets in the project. So click here to get the most information. The first thing you’ll see is the summary view. This is where you can set your filters. We recommend clicking this FDR button first, it will bring up an interactive FDR curve. The x-axis shows the number of peptide spectrum matches sorted by -10lgP score. The y-axis gives the false discovery rate. Scroll along the graph to see the score and false discovery rate at each point. Most importantly, you can select one of the common cut offs along the right hand side. A 1% FDR is usually considered to be acceptable. Once you do this, the score where a 1% false discovery rate is achieved will be shown in the -10lgP score checkbox. For protein -10lgP, a threshold of 20 is accurate, and at least one or two unique peptides per protein indicates an accurately identified protein. Once you click apply filters, all results you see will be within these thresholds.
Next I’ll show you the protein tab. This is the most important page in your identification results. From here you can see all the proteins that were identified, their coverage and description. In the coverage view you can see the details of the peptides that support the protein identification. Each blue bar represents a distinct peptide different from all others matched to the protein. Click one to see the best spectrum that identified that peptide. Notice how multiple spectra can be grouped into the same peptide hit. XY or Z ions are shown in red and AB or C ions are shown in blue. Scroll over the peptide sequence to see which fragment ions are associated with which amino acid.
This can be very useful especially when considering modified peptides. Modifications are shown with a unique letter and colour. For example, this deamidation. If you click on the peptide, the modified amino acid will be shown with a lower case n. Scroll over it to see the high intensity fragment ions support this modification assignment.
This modification is considered to be confidently localized based on the fragment ions observed. If there is a pair of b or y ions showing fragmentation before and after the modification, the modification will appear above the protein sequence. Modifications without a confident localization will only appear below. We call this direct fragment ion proof. The threshold for this localization can be controlled to the right of the coverage view. Also, each modification is assigned an AScore, and this can be used as a cut off instead of fragment ion proof. The cut off we recommend for AScore is 20.
PEAKS is great at finding single amino acid variants with SPIDER. For example here, the T in the white background represents Threonine replacing Alanine, the amino acid in the database sequence. If you would like to manually validate this assignment, scrolling over the fragment ions that indicate the presence of a threonine show that this variant is highly likely due to the strong y-ion signal.
So far all the results we’ve seen were identified with PEAKS DB, PEAKS PTM, or SPIDER. Still, a current limitation of any protein identification search is that most LC-MS/MS data sets can only match a fraction of the MS/MS scans in a file to database peptides. So the question is, what is the source of the unidentified scans? This is where PEAKS shows its true power. Using de novo sequencing, it is able to give the most likely peptide sequence for every spectrum in the data file. The de novo only tab gives the de novo sequences for all spectra that could not be matched with PEAKS DB, PEAKS PTM, or SPIDER. You can tell if a de novo result is confident based on the colour. Amino acids above 90% confidence are displayed in red, above 80% confidence are displayed in purple, and above 60% confidence are displayed in blue. Use this button to set a confidence threshold, we suggest 80%. Then, sort by tag length to see the results with the longest string of confident amino acids. These will be your best de novo result. If they match a protein in the database with 6 amino acids in a row or more, an accession number will be given in the accession column. Click the proteins button to see where the de novo result aligns to the full protein sequence.
Now that you’ve reviewed your results, you’re ready to export them. Go back to the summary view and click the export button to share with your colleagues. HTML options are available as well as text exports that can be opened in programs such as excel, third party export options are also available for uploading PEAKS results to post-processing software.
Thanks for taking the time to check out this overview, for more detailed videos about specific features of PEAKS check out our website, www.bioinfor.com.
Peptide Feature Intensities
PEAKS now incorporates quantitative information into your identification results. Our quantitative module has accurate and sensitive peptide feature detection that can be used to get the relative abundance of a peptide. By matching the peptide feature area to an identified MSMS, you can:
- Determine the most abundant peptides in your sample
- Obtain quantitative information on endogenous peptides
It is often very important to integrate identification and quantitative information found in proteomic mass spectrometry data. This is why we have integrated a tool into PEAKS that provides peptide feature intensity information for identified peptides. By doing this, you can get an idea of the relative quantity of a peptide in your sample. Here you can see a graphical representation of a full proteomic LC-MS run. It is clear that there is a specific group of likely peptides represented by the high intensity peaks seen here. PEAKS then answer the question of: what is the identity of those high intensity peptide signals?
It does this using a concept already used in label free quantification algorithms called peptide feature detection. A peptide found in a LC-MS experiment will appear in a predictable way. It will have a visible and predictable isotopic distribution resulting from different carbon isotopes, and its intensity will follow a gamma distribution across the retention time range in which it illustrates. If the signal from the mass spec h has these characteristics we call it a peptide feature. PEAKS will automatically detect these peptide features and calculate the area under the retention time curve. It will include the area of all isotopes associated with the feature within 5% relative intensity of the most intense peak. These areas are then integrated into an XIC curve shown here. From this the area under the curve can be easily calculated.
We then have a group of peptide features. If that feature is selected as a precursor ion for MS/MS, and then the MS/MS is identified we can link the two together. This is how we’re able to match peptide feature intensity with an identified peptide.
Viewing this information in PEAKS is very intuitive. Once you click on the peptide tab, the associated peptide feature intensity is found in the area column. This can be sorted to see the peptides with the highest intensity signal.
This information has been proven to be very informative. For example, in the publication shown here they reported the normalized area under the curve of peptide features associated with endogenous peptides. This gave the research group proof of the most abundant peptides eluted from their sample. We ran a subset of the data through PEAKS. What’s great is that was able to generate similar results with one click of a button! Sorting the peptide table by feature area gives you a clear idea of the most abundant peptides in the sample.
If you would like to validate the link between identified peptides and peptide features, it’s quite easy to do. Right click on a peptide in the peptide table and select ‘show spectrum in LC/MS’. It will bring you to the location in the LC/MS heatmap where the MS/MS event occurred. The identified MS/MS will be highlighted in red. This map gives a top down view of the signal coming out of the mass spec in terms of m/z, retention time, and intensity. Peptide features that are detected will be marked with a red circle. Scroll over the circle and a box will appear showing the detected range in which the peptide feature occurred. The area under the curve of the peptide feature will be displayed in the popup. This is the area we display in the peptide table.
You can even get a more intuitive, 3D view of the peptide feature by clicking on the 3D button in the top right hand corner of the pane. From this view, the peptide feature can be seen very clearly.
I hope this has helped you become familiar with peptide feature intensities in PEAKS. Thanks for listening. Subscribe to our channel to learn more about PEAKS, complete software for proteomics.
PEAKS Multi-Round Search
Often there are many spectra remaining after a database search that have promising de novo sequences, leaving you wondering what the source of these spectra are. We call these ‘de novo only spectra’. Multi-Round search gives you the ability to search only these spectra. This helps you:
- Remove spectra produced from contaminants
- Find endogenous peptides in a enzyme digested sample
- Refine search parameters to identify difficult targets
Multi-Round search is a very helpful feature found in PEAKS. It is designed to help you identify spectra unmatched by an initial database search; however they matched promising de novo sequences. We call these results de novo only spectra. These can be found in any identification result by clicking on the ‘de novo only’ tab. This tab contains spectra that could not be matched by PEAKS DB, PEAKS PTM, or SPIDER. However, using de novo sequencing these spectra can still have excellent peptide spectrum matches. This makes us question why these good peptides are missed by database search.
Multi-Round search gives us the opportunity to answer that question. It takes the de novo only spectra and separates them from the scans already matched using a database search algorithm. It allows you to search just those spectra with different parameters. So, you can choose a new database, different ptms, different enzyme cleavage rules, the possibilities are endless.
For example, take this spectrum from a human antibody sample. Searching it against Swissprot we are able to identify a peptide sequence but not very confidently. The peptide does not come from an antibody protein, it is a poor peptide spectrum match, and it has a low score below our 1% false discovery threshold. The de novo peptide spectrum match is much better. It explains the majority of the high intensity peaks with major fragment ions. Using Multi-Round search, this spectrum will be carried forward to the next round. We searched the de novo only spectra identified in the original Swissprot search against NCBI. With our example spectrum, an exact match to the peptide sequenced by de novo was found in an antibody protein. Here is a summary of the hits we were able to find with this spectrum. Searching Swissprot we identified an unexpected protein with a low score, likely a false positive. Now with the new Multi-Round search we achieve ideal results, where we identified a peptide from an antibody protein with a high score and no mutations.
One of the best applications of this new search type is filtering out contaminant proteins. For example, you can download a contaminant database like cRAP shown here. First do a search with the contaminant database. Here you can see that we were able to identify a few contaminants in this dataset. These scans will be removed from the Multi-Round search. Then when you search your target database, this will give you a list of protein identifications without contaminants. Another good benefit of Multi-Round search is limiting your search space. By filtering out identified spectra, you can limit your search space in order to identify difficult targets. For example you can identify endogenous peptides in a digested sample. First run a search using the enzyme you used to digest the sample. Then run a Multi-Round search with no enzyme to find endogenous peptides.
To actually set up a Multi-Round search, all you have to do is select an existing database search result and click the Multi-Round search button. This will bring up a new search parameters pane where you can select any new parameters you wish to search the de novo only results with. Keep in mind; the de novo only results are controlled by the filters you set in the summary view of the initial search. Scans with identification results below the database search filters, and above the de novo ALC filters will be included in the de novo only results. So, check to make sure you are satisfied with those settings before starting your Multi-Round search.
That’s all you need to know to get started with Multi-Round search in PEAKS. Thanks for listening! Subscribe to our channel to learn more about PEAKS, complete software for proteomics.
PEAKS PTM Profiling
Detect and quantify modifications with LC-MS/MS data and compare PTM profiling on proteins between samples.
PTM Profiling is a great new feature found in PEAKS 7.5. It is built to deal with a specific issue regarding post translational modifications. In many cases a protein is identified with both modified and unmodified peptides at some positions. PEAKS PTM profiling works with all PTMS, in this example we will look at phosphorylation. In this protein there are several phosphorylation sites. However, unmodified peptides were found at those positions as well. This leads to two questions. At what positions do the phosphorylations occur? And, if it is phosphorylated, how much of the protein is phosphorylated?
To answer the question of whether or not the PTM is true, PEAKS provides an Ascore for each proposed PTM. The A score is based on the evidence from the fragment ions, and is the probability that that modification is present at that position compared to other possibilities. PEAKS also has the ability to assign PTM confidence based on direct fragment ion proof. So, the MS/MS spectrum shows fragmentation before and after the proposed modification site above 5% relative intensity, it is considered confidently modified. In either case, if the PTM is considered confident it will appear above the protein sequence.
To answer the question of how much of the protein is phosphorylated, PEAKS uses a concept implemented in label free quantification experiments. It uses the concept of peptide features, meaning the lcms signal of a peptide. It has been proven that the area under the curve of the peptide feature is proportional to the relative abundance of that peptide. So, for a peptide with a confident modification site shown in this LCMS, PEAKS will find the area under the curve of its associated LCSM feature. It will repeat this as well for all of the modified and unmodified peptides found at this position in the protein. This table shows all the modified and unmodified peptides which were found at this position. The Ascores are reported for each modified peptide. And, the peptide feature area is given as well.
Using this information, PEAKS creates a bar chart that gives a ratio of the relative quantity of phosphorylated peptide versus unphosphorylated peptide at each identified phosphorylation position. Only fully digested peptides are used in this chart to give added accuracy. Scrolling over the graph will give the percentage of the total amount. You can also use the drop down menus at the top of the window to compare the phosphorylation ratios across multiple runs. Here you can see at some positions there is consistency across runs. At other positions the modification didn’t appear. So, you can see the similarities and differences in phospohrylation across multiple runs. You can also export the raw ptm profiling data to get more details.
To actually run PTM profiling is easy, at the top right hand corner of the coverage view, select the PTM profiling button. This will compile the data and present the PTM profiling data for all the identified modifications in that protein. Only confident PTMs are used, so be sure to select either A score or minimal ion intensity and the desired cut off using the legend to the right of the coverage view.
That is all you need to know to get started with PTM profiling with PEAKS. Thanks for listening. Subscribe to our channel to learn more about the features of PEAKS complete software for proteomics.
Peptide De Novo Sequencing
Check out this video for a more in-depth analysis on PEAKS’ peptide de novo sequencing. This video will give a brief overview of how peptide de novo sequencing can be useful, as well as give a demonstration on how to perform a de novo analysis using PEAKS.
Welcome to the PEAKS Peptide De Novo Sequencing tutorial. In this video I will outline the benefits of de novo sequencing and how it is a part of PEAKS. I will then show you how to perform a de novo sequencing analysis using PEAKS, which is the most widely accepted tool for peptide de novo sequencing in mass spec labs.
For peptide identification with tandem mass spec, de novo sequencing derives the peptide sequence without using a protein database. This can arise when researching unsequenced organisms, antibodies, endogenous peptides, and peptides with unexpected PTMs. Even when a sequence database is available, a database search engine can fail to assign database peptides to many high quality tandem mass spectra.
Researchers choose to use peaks because it is fast and most importantly, accurate. For example, in this third-party comparison of de novo sequencing algorithms, PEAKS outperformed all other algorithms compared in the paper.
Customer Testimonial (Outstanding User Interface)
PEAKS makes understanding de novo results easy, with an incredibly user-friendly interface. One publication in particular states:
“An important factor when performing large-scale de novo sequencing experiments is the ease of use and flexibility of the software. In this respect, PEAKS, being a commercial quality program, was far superior and offered the most adaptive interface, with the ability to import various formats of data from a vast array of mass spectrometers. The sequencing result displayed by PEAKS was also considerably more useful.”
Saves Time and Easy to Use
Using PEAKS is as simple as 1-2-3, as I will demonstrate for you now.
- Select you data
- Click the de novo sequencing icon
- Specify your parameters, such as the error tolerance, enzyme, and PTM’s
PEAKS will then de novo sequence all of the tandem mass spectra in the data set, at a speed of up to 15 spectra per second on a regular PC, and even faster on a server.
Viewing De Novo Sequencing Results
Once the de novo sequencing process is finished, a de novo result node will appear below the selected data. It looks just like a small snow-capped mountain with “dn” in letters. Double click the node to open the result. The de novo sequences are listed in a table, along with the associated score and retention time for each sequence.
Selecting a peptide, will display the matching peptide-spectrum details, such as the spectrum annotation and the ion match table. You can easily zoom and navigate in the spectrum annotation using your mouse. This is an important feature if it is desired to examine individual peptides and as such there are several interesting ways to navigate the spectrum. Here are some examples:
- drag to zoom
- Use your scroll wheel to zoom into the y axis
- Use your scroll wheel to zoom into the x-axis
Local Confidence Scores
To make things easy, individual amino acids are colour-coordinated, based on their respective confidence level. The confidence value can be examined by hovering your mouse over a particular peptide, red being above 90% is considered a great score, purple being between 80-90% is good, and blue representing 60-80% is acceptable. Anything below 60% is coloured black. This local confidence on each amino acid is a unique feature of PEAKS. It allows you to adjust the minimum local confidence threshold to convert the de novo sequence into a sequence tag that only contains the highly confident amino acids.
But what we really want to look at is the total local confidence (TLC) and average local confidence (ALC). The TLC score, indicates the expected number of correct amino acids, while the ALC score, indicates the expected percentage of correct amino acids. For example, here we have a TLC of 10.9 and an ALC of 84%, which indicates a confidently identified sequence.
While the default is to sort the table by TLC, you can sort by other columns by clicking the column title, or you can use the search function to quickly locate a peptide.
Summary View – Exporting Results
This summary view shows the result statistics. If you like you can filter the results by setting a score threshold. We recommend to start with the default TLC and ALC values, and adjust those values manually by examining the de novo sequences around this threshold.
The de novo sequencing results can be exported to text formats if you want to use PEAKS as a subroutine in your lab’s own workflow. To export the filtered results:
- Click “Export” at the top of the summary view
- Choose the format you wish to export the results in, available formats include html, csv, or xml format
- Choose the location and directory name where you want to put the exported files
- Click OK
De Novo Sequencing is just the Beginning – PEAKS Workflow
For the analysis of mass spectrometry data, PEAKS does not stop at de novo sequencing. Instead, the de novo sequencing capability facilitates PEAKS to provide a number of unique benefits through several integrated tools. De novo sequencing is just the beginning.
First, the de novo sequencing results are used to confirm and improve PEAKS’ integrated database search. As a result, the performance of PEAKS is well above other database search software. For when sequences are not found by the database search engine, the peptides found exclusively by de novo sequencing can be analyzed by the integrated PTM finder and SPIDER tools, in order to locate unexpected PTMs and peptide mutations.
With PEAKS, you are not limited by only using a sequence database, which may be unavailable, incomplete, containing errors, or ineffective due to unexpected PTMs and mutations. PEAKS has all of the necessary tools to overcome these challenges, due to it’s unique de novo sequencing algorithm.
PEAKS DB: Peptide Identification
More than just de novo sequencing, PEAKS provides a sensitive and accurate tool for identifying known peptides and proteins. In this video, users will learn why the hybrid approach (de novo + database searching) is the optimal method for identification.
Welcome to the PEAKS Database Search Tutorial on peptide identification. In this video, I will be going over the benefits and features of the newly improved PEAKS DB search engine, and demonstrating how to perform an analysis.
PEAKS DB: De Novo Assisted Workflow
As this diagram illustrates, when you run PEAKS DB on raw MS/MS Spectra, the spectra are automatically de novo sequenced and the results are automatically combined with the database results. This gives you improved database results and can differentiate sequences exclusively identified by the de novo analysis. These exclusively de novo sequenced identifications can be potential novel peptides, or peptides with mutations and PTMs.
Configure a Database
Now let’s take a closer look at PEAKS DB by performing a live demo of the workflow. Before you run PEAKS DB you need to make sure you have a database configured. It is only necessary to configure once. Step-by-step instructions on how to configure a database can be found in the PEAKS “Help” menu under “Help Contents”. The instructions are in Section I-Part 6 under “Configuring Sequence Databases“
How to Run PEAKS DB
With the new interface it is easy to run a PEAKS DB workflow:
- Select your data
- Select the PEAKS DB icon
- Set the study parameters such as error tolerance, enzyme, fixed and variable PTM’s, as well as the desired database.
- Once finished click OK.
One of the best features of PEAKS is the improved summary view. In this view you can easily filter and validate your results, as well as to get an overall understanding of the identifications.
To filter your results:
- Select FDR from the toolbar
- Navigate along the curve to find the desired percentage, we usually recommend using a 1% FDR
- Click “Apply Filters”
In the Results Statistics section, users are able to visualize a graphical analysis of all peptides. The first figure summarizes the number of peptides spectrum matches, or PSMs, that are identified at the set FDR. Below, the figure on the left summarizes the number of identified PSMs and displays both the target and decoy identifications at each -10lgP score. The figure on the right shows the precursor error distribution. The error is small for high scoring peptides and scatters for those identified below the score threshold.
In the Experiment Control Section, the two figures can help check whether the instrument is well-calibrated. The left graph illustrates the distribution of the precursor mass error, where a distribution around 0 indicates a very well calibrated instrument. The right graph further plots the precursor error distribution against the precursor m/z
Once you have finished validating your results, it’s easy to export your results in a variety of formats. For example you can export to html, so that it may be integrated into a website. To do this:
- Select Export
- Choose html, additionally you can choose to export in csv, fasta, and xml formats
- Choose the location
- Click Export
The generated files can be viewed with a web browser, which makes it exceedingly easy to share the results with a colleague or submit the results to a journal.
In the Protein view we have a great view of each of the identified proteins. For each Protein it readily displays the associated -10lgP score, coverage percentage, number of peptides, number of peptides unique to that protein, and a brief description of the protein in the top pane. To take a closer look at a protein, select the protein and the associated peptides are displayed in the lower pane. You are also able to take a closer look at the protein’s coverage map by selecting the coverage tab in the lower pane.
In the peptide view we can see all of the identified peptides along with important details such as the -10lgp score, mass to charge ratio, and the Accession protein. Using the search box here, you can search a particular peptide by the scan number, peptide sequence and so forth. Or, you can sort the peptides using the column header, such as by PTM. This allows you to bring all peptides with a particular PTM to the top of the table. By selecting a peptide you are able to see its associated annotated spectrum, ion match table, and error map.
There are a couple of options available to refine the spectrum to display the information you desire. First there is an option to filter specific ions to display, found by selecting the wrench tool in the middle pane here. As a default, all b and y ions are selected, however to change this click the respective ions to add or remove from the spectrum. Once you have the desired ions displayed in the spectrum you can then zoom into an interesting area. To focus on a specific area click and drag your mouse, or you can use your mouse wheel to zoom into the x or y-axis. Double click to return to the original ratio.
De Novo Only View
The De Novo Only view displays all peptides that were not found in the database. To learn more about de novo sequencing results check out our PEAKS Peptide De Novo Sequencing Tutorial.
PEAKS Q: Label Free Quantification
In addition to protein and peptide identification, PEAKS excels at accurate label free quantification. This video predominantly uses slides to illustrate the fundamentals of the method.
Welcome to PEAKS label free quantification tutorial. In this session, we will go over the features and benefits of the software tool for label-free quantification and demonstrate how to perform data analysis.
Workflow in PEAKS LFQ
The PEAKS label-free quantification algorithm is intensity-based. As this diagram shows, the survey scans use peptide feature for quantification. MS2 scans are used for peptide/protein identification. Combining them produces peptide and protein quantification results. Let’s get into more detail for each steps.
LC-MS Heat Map
In shotgun proteomics, proteins are digested into a complex mixture of peptides, which are separated by on-line HPLC. At a given retention time, the fractions of the mixture eluted from the column are sent to a mass spec instrument and their precursor masses and intensities are recorded in a survey scan (the MS1 spectrum).
This figure shows a heat map of the mass spec signals generated by peptides eluting from the column. The map depicts all peptide features detected by the instrument, with the complexity of elution and isotope patterns.
The intensity of a peptide feature is proportional to the abundance and concentration of the peptide in the sample.The abundance ratio of a peptide between two samples can be estimated by the intensity ratio of the peptide feature in two heat maps.
There are several steps to determine the relative abundance of a peptide and protein by label-free quantification. The first step is “feature detection”.
A peptide feature is defined as a group of peaks in a heat map, characterized by eluting pattern in terms of retention time and isotope patterns in terms of mass charge.
The deconvolution of overlapped peptide features and retention alignment between runs are the key factors for the data analysis; for the overlapped peptide feature clusters cannot be avoided even with today’s high resolution instruments and LC separation techniques. PEAKS Label Free quantification successfully deconvolutes overlapped peptide features by using an expectation-maximization algorithm.
RT Alignment and Feature Matching
The second step is Retention time alignment and feature matching.
The retention time of a peptide feature in two LC-MS runs may changes subject to the LC column conditions, and so forth.
To match the same peptide features in different runs, retention time alignment is required. Here are two LC-MS runs, you can see where the retention time changed.
After alignment, the peptide features are matched.
Next step is ratio calculation. The relative abundance ratio is calculated by the area of the extracted ion chromatograms (XICs) in two runs.
In each scan, intensities of isotopic peaks are summed when the XIC is generated.
Here are two XICs of a peptide feature, the red one is from run 1, and blue one from run 2. The abundance ratio can be estimated by the ratio of areas of two XICs.
Next we make a significance assessment. Technical replicates are used to evaluate the variation of a feature between runs. A quality value is associated with a feature in terms of its intensity, isotope and eluting patterns. The feature quality is defined as 1 log (sigma), where, Sigma is the average variation.
Given the observation of a feature variation in two biological states, a significance value is calculated, which is defined as -10logP, where P is the P-value to observe such variation in the replicate runs.
The last step is peptide feature identification. This is done using the MS/MS spectra associated with the feature. PEAKS label free quantification is seamlessly integrated with PEAKS database search for peptide identification, thus, data analysis of label-free quantification is much easier then switching software or exporting from one format to another.
PEAKS Q: Labelling Quantification – SILAC
PEAKS Studio 8 provides our improved precursor ion quantification module. In this video, the improvements we have made to this labelling quantification option, specifically, SILAC, will be discussed using data from a recent publication.
Welcome to today’s tutorial. Today, I will be describing the features of our improved precursor ion quantification module in PEAKS 8. In this video, I will discuss the improvements we’ve made to this labelling quantification option, which applies to such experiments as SILAC, using data from this recent publication.
How Can PEAKS 8 Help?
PEAKS 8 offers enhanced computational features, which ensures more accurate and reliable SILAC quantification results. Firstly, the algorithm has been improved such that PEAKS 8 detects, highlights, and quantifies peptides at the feature level. Additionally, it provides more manual control for users, enabling the removal of peptides that are deemed to be less confident quantification candidates; an assessment that is based on characteristics of the feature vector labels. Finally, PEAKS 8 employs a computational solution to correct for the problematic conversion of heavy free arginine to proline. Overall, this will provide improved reliability and confidence in the results from your analysis.
How to Set Up An Analaysis
SILAC quantification results can be generated by employing the easy-to-use workflow in PEAKS, which can be initiated just by starting a new project. Select the “New Project” folder icon and add your raw data files from the Project Wizard screen. As PEAKS 8 allows you to analyse individual, as well as grouped, raw data files, we recommend adding each data file as a separate sample using the multiple beaker button. After you’ve added your data, simply choose your enzyme, instrument, and fragmentation option, and then move on through the workflow. While the workflow will take you through data refinement and identification, the focus of this tutorial will be on setting up a quantification project. For SILAC experiments, select “Precursor Ion Quantification”, choose your experimental method, and apply your chosen filters. You can then choose to group replicate data before you initiate your analysis. Finally, apply the R-to-P correction to ensure that peptides with divided heavy feature intensities are added together before PEAKS calculates the peptide ratios.
The PEAKS 8 Peptide view makes it very convenient for users to identify confident peptides. For each peptide, the view is divided into four quadrants: an MS2 scan, a survey scan, the LC-MS view of the peptide’s labelled features, and a pane that combines an extracted ion chromatogram with an isotopic distribution diagram. These four panes apply specifically to a selected peptide feature vector, which you’ll see when you click the “All vectors” button. Clicking “All vectors” will generate a table, which shows all identified features. By default, the feature vector with the highest quality is initially chosen.
Filtration Options: Summary View
Before examining your data, it’s best to select an appropriate set of filters. To change the feature vector filters, select the “Edit…” button in the second row of the filtration options to open the Filtration window. Set a -10lgP threshold through the drop-down menu or by clicking the false discovery rate button. Choose a quality threshold, the score for which is based on a comparison between feature vector characteristics; which include m/z and RT differences, XIC shape similarity, and feature intensities. Set average feature areas and a charge range. Finally, choose whether or not you only want to display peptide features that include a reference label; additionally, the minimum number of labels that must be present. This reference label can be set within “Experiment Settings” at the top of the filtration options.
You can also choose which proteins you want to filter from your display by selecting the “Edit…” button in the third row of the filtration options to open the Protein Filtration window. Set protein significance scores by choosing either an appropriate significance threshold or by setting a Benjamini-Hochberg FDR. Being that significance scores are -10lgP scores, if you choose to set a threshold, we recommend setting it to a value of 20 as this corresponds to a p-value of 0.01. Select a protein fold change cut off and choose the minimum number of peptides that you want to include. Additionally, choose to exclude variably modified peptides from quantification and decide if you want to calculate significance scores using the ANOVA or the PEAKS Q method.
As mentioned, PEAKS 8 SILAC quantification is heavily feature-based. The LC-MS and XIC panes show you the characteristics of the feature vectors used in peptide quantification. The LC/MS highlights and zooms in to where the feature vectors are located. The information displayed from the pop-up windows that appear when you scroll over the feature markers in this view, is also displayed in the feature vector window. As such, PEAKS 8 enables easier viewing of the labeled feature vector characteristics. Ultimately, the ratio of the quantified peptide calculated using the average light and heavy labels of every identified feature vector.
The ratio of the protein to which a peptide identifies is calculated from ratio of the light and heavy labels, where the labelled areas represent the sum of all peptide areas that were used in the analysis.
Managing Quality Control: Removing Less Confident Peptides
PEAKS 8 allows users greater control to decide which peptides are used in protein quantification. If you identify a peptide that appears to be a poor choice in your quantification calculation, simply uncheck the peptide from the analysis. Select the “Apply”, which is found in the filtration settings within the Summary view, to incorporate this change. When you return to the quantification pane, the area values of the protein will now only reflect the sums of the remaining check-marked peptides.
Isotopic Distributions and Peak Height
Apart from the sample profile, the XIC also provides insight into how good of a candidate a certain peptide may be for quantification. An isotopic distribution diagram and an XIC, for both labels, is displayed in the bottom-right pane of the Peptide view. Feature vectors with well-aligned peaks and well-matching distributions are better candidates for quantification, while feature vectors with poorly-aligned peaks and poorly-matched distributions are less confident candidates.
Computational Solution for Arginine to Proline Conversion
PEAKS 8 provides a computational correction for the problematic arginine to proline conversion in heavy labelled peptides. When the R-to-P correction is selected back in the Quantification set-up, the total intensity of the heavy feature label is recognized, as seen in the LC-MS view where two heavy regions are highlighted. PEAKS 8 will then sum the areas of the two heavy labels and display the combined value in the feature vector table. This ensures that the final ratio of the peptide’s associated protein is not affected by any conversion of arginine.
Overall, PEAKS 8 offers many improvements to the previous PEAKS 7.5. This includes an easy-to-use quantification workflow, multiple options for filtering your results, ratio calculations based on the areas of the labeled feature vectors, a better display that includes options for more control of what peptides you want to examine and include in the ratio calculations, and an accurate arginine to proline conversion correction.
Thank you for listening. If you’d like to try PEAKS Q with your own data, you can request a demo at biosoft.ca. Also, subscribe to our channel to learn more about PEAKS, complete software for proteomics.
PEAKS Q: Labelling Quantification – TMT/iTRAQ
PEAKS Studio 8 now contains an excellent tool for quantification by isobaric labelling methods such as TMT and iTRAQ. This video will highlight some of the benefits of this tool and how to use them for your research.
PEAKS now contains an excellent tool for quantification by isobaric labelling methods such as TMT and iTRAQ. This video will highlight some of the benefits of this tool and how to use them for your research.
Accuracy and sensitivity is a main focus of this tool. There are three important points to keep in mind to insure accurate and sensitive isobaric labelling results: supporting the most accurate methods such as multi-notch MS3, using computational methods to select only high quality spectra for protein quantification, and reliable protein significance prediction. The next important point when it comes to TMT and iTRAQ is scalability. Currently, the largest number of samples you can use in a single experiment is 10. If more samples are required in the study, multiple experiments must be compared. We will discuss how this can be done with PEAKS.
Accuracy and Sensitivity for TMT/iTRAQ
One of the main problems with isobaric labelling is interference. MS2 spectra can contain signal not only from the target precursor ion, but also interfering contaminants in the sample. Since the whole sample is labelled with the quantification tags, there is no way to separate reporter ion signal from the target and the contaminants. This is simulated by the experiment described here. A yeast digest was labeled using TMT 6plex labels in a relative dilution curve forming the ratio: 10 to 4 to 1 with the three lighter labels and back to 10 using the heavier labels. Human cell line was labelled as well, only using the 3 heaviest labels in the 6plex set to simulate interference. These figures show that using the typical MS2 quantification method the heavier label ratios of the yeast cell lysate would not follow the expected ratio due to human cell line interference. With multi-notch MS3 quantification the resulting MS3 spectrum was observed to produce ratios that closely matched the expected ratios of the yeast digest. So interference was greatly reduced.
PEAKS allows you to analyze multi-notch MS3 data easily. In this example, an 8plex experiment was set up where E.coli digest was set up to follow a 10 to 5 to 2 to 1 using the lighter labels and the reverse for the heavier labels. The four heavier labels were used to simulate contamination with the human cell culture. This is how the results appear in the PEAKS heatmap. Red indicates up regulation, green indicates down regulation. With this type of display it is easy to see that the E.coli proteins follow the expected dilution curve and the human proteins show intense signal in the heavier channels and almost no signal as expected in the lighter channels. We’ll now talk about how to set up this kind of data in PEAKS.
Setting up the project is easy. Click the create project button indicated here and add the data. Enter the enzyme, instrument, and fragmentation type of the MS2 scans. You can then click the data refinement button to proceed through the workflow. When setting up identification parameters it is not necessary to add the labelling tag as a fixed modification if using the workflow. Once you select your method in the quantification step it will automatically be added to the identification search. However, if you are not using the workflow you must remember to add it as a fixed modification. During quantification, first set up first your labelling method. This will make all of the labels appear in the experiment groups. Select all samples and add them to the right with this button. Select the mass error tolerance, in the case of MS3, high resolution mass spectrometry is typically used so a tight error tolerance can be given. Select the mass spectrometry level where the reporter ions should be found. Then select the identification cut off method. This is important for insuring that only high quality peptides will be used for protein quantification. If you chose to use a decoy database, a 1% FDR cut off is recommended. You can then click finish and let PEAKS work! It will create quantification results without any more input.
Selection of High Quality Spectra
Another important step in ensuring accurate and sensitive quantification results is selecting high quality spectra. An easy to use display is essential for this so that manual inspection can be used to ensure that the quantification results are reliable. PEAKS does this by putting all the important information in one display. This is the peptide view where all the peptide quantification info can be seen in the top pane. The identification result can be seen in the middle. Then a view of the quantification labels can be seen at the bottom. If MS3 was used, this will be the MS3 scan. If MS2 was used, a zoomed in view of the labels in the MS2 scan will be shown. Also, filters can be used from the summary view to select high quality spectra using this edit button. Identification quality plays a major role in quantification, so set an identification -10lgP threshold. PEAKS also calculates a quality score, which considers identification -10lgP, noise around reporter ions, and mass error of the reporter ions. Higher intensity reporter ions are also more reliable, so an intensity threshold can be set. You can set a minimum number of channels as well to prevent missing values from affecting your protein quantification results.
Protein Ratio Estimation
Once these filters are set the protein display will only show supporting peptides that pass these filters. These are the high quality peptides that are used for protein ratio calculation. You also have manual control, click the checkbox in the used column to remove a peptide. This will remove it from protein ratio and significance calculation.
Quantification significance is calculated at the protein level. Select either the ANOVA or PEAKS Q significance options. Either one is a calculation of the likelihood that the observed change between conditions is significant. In either case, the -10lg of the p-value is used. So, a cut-off of 20 is suggested. You can also select a Benjamini-Hochberg cut off. Select the modified exclusion checkbox to exclude peptides with variable modifications from protein ratio calculation. Modified peptides have different ionization efficiency than unmodified ones, so we give you the option to exclude them to avoid this from having an effect on your quantification results.
The end result is a confident list of protein quantification results that can then be exported and shared with your colleagues from the summary view.
Combine Multiple Experiments
Now let’s talk about another major problem with isobaric labelling, multiplexing. Since the largest experiment you can currently run is a 10plex experiment, you are limited in the number of samples you can use. So, if more samples are required, more experiments are required. This is when a global reference standard should be used. For example, here 131 is used as a reference to link experiment 1 and 2. 131 in samples 1 and 2 are replicates, so the abundance of the peptides in these two channels should be similar. This means they can be used for inter experiment normalization. Intra experiment normalization methods are provided as well. In this case, two 6 plex experiments were combined together. This allowed us to clearly find several proteins that were consistently differentially expressed between the two experiments.
Inter Experimentation Normalization
If you are using this type of experiment it can be configured once quantification is complete from this experiment settings button. From this page select the ‘all experiments’ from the select experiment drop down menu. Select the ‘perform inter experiment normalization’ checkbox, and add all but the reference channels to the right in the experiment groups section. This will insure that only the experimental samples will be shown in the heatmap. In this case, select 131 as the spiked channel for both samples. Then, click the ‘exclude spike channel for significance’ button. The reference channels are not expected to change between experiments and significance is by definition a measure of change. So, including these will negatively impact the significance score. So we give you the opportunity to remove them.
Intra Experimentation Normalization
The next step is to perform intra experiment normalization. Click the normalization button. From here, select auto normalization. Auto normalization sums the intensity of all reporter ion channels of all quantifiable peptides. This is then used as a global ratio within the experiment.
With these options set it is now possible to compare multiple experiments with TMT or iTRAQ labeling.
Thank you for listening; if you’d like to try PEAKS Q with your own data you can request a demo at biosoft.ca. Also, subscribe to our channel to learn more about PEAKS, complete software for proteomics.