PEAKS Online

High-throughput, Multi-User, LC-MS Proteomic Analysis Software Solution

Highlights

  • High-throughput, parallel processing to support very large-scale projects
  • Dual interfaces to provide visualisation and automation
  • Administrative control and multi-user support
  • Establish standardised workflows, search parameters, databases for routine analyses
  • Easily compare between samples using statistics table and multi-sample protein coverage view

Overview

Use PEAKS Online X to take advantage of powerful, shared computing resources to perform LC-MS/MS protein and peptide identification and quantification analyses. With the ability to run on any all-in-one installation, multi-CPU cluster, or cloud server, the restructured software platform allows large datasets to be processed efficiently by multiple users at the same time.

De Novo Sequencing Scoring
Similar to PEAKS Studio, hover over the peptide sequence to see the local confidence at the amino acid level. Each residue is also colour-coded to easily identify confident sequence tags. The local confidence score in PEAKS is the likelihood that the amino acid assignment in a peptide truly exists. The interactive annotated spectrum view, error distribution and ion match table appear below for further assessment.

Detailed Visualization for Easy Interpretation
The PEAKS Online X interface allows projects to be analysed on all levels, from the raw data to spectrum annotation and graphical visualization of the identification and quantification results. Result filters can be easily set and statistical tools are available every stage of analysis. With the PEAKS Q module add-on, results can be visualized using heat maps, volcano plots, and extracted ion chromatograms (XICs).
Multi-Sample Protein Coverage Comparison
Like PEAKS Studio, the “Protein Coverage” view visually maps the supporting peptides and de novo tags to the protein selected in the Protein table. Click on a peptide of interest from the Protein Coverage view and the annotated spectrum for the corresponding peptide spectrum match (PSM) will appear. In PEAKS Online X, the Protein Coverage view also maps the peptides on a sample basis for projects with multiple samples. This multi-sample protein coverage view allows users to quickly estimate the peptide abundance across all samples in the identification search using a heatmap.

Search Parameters Setup

Spec Library Mirror

Spec Library Results

Export
Protein Coverage

PEAKS Online, means high-throughput proteomics data analysis for multiple users on a network. Although the term ‘Online’ is often associated with the world wide web, PEAKS Online should not be confused with public access. Instead, PEAKS Online is a software package which includes a Server licence and Client licences and can be hosted on a public cloud such as AWS or Google Cloud or it can be deployed on a private network such as your own high-performance computing cluster. This cloud-based architecture allows PEAKS Online to be fully parallelised and scalable to your lab’s needs and is ideal for processing large-scale projects.

PEAKS Online provides users with the ability to utilise the established PEAKS workflows more efficiently and on a larger scale. The interactive tool used to send/retrieve data to/from the server is called PEAKS Client, and the results are presented in a similar manner as available in the PEAKS Studio desktop solution. Through the Web Client Interface or Client Command Line Interface (CLI), multiple users can also access the PEAKS Online server at the same time, supporting parallelism at the project and data level.

In PEAKS Online X dual interfaces are available to ensure easy integration into any proteomics workflow. The Web Client Interface allows users to setup and submit projects visually, as well as review and validate their results. The Client CLI on the other hand can be integrated into existing pipelines to continue automated data processing.


PEAKS Workflow

PEAKS Online X is designed to facilitate accurate and sensitive proteomic analysis using the PEAKS X software workflows, but with higher performance on a shared resource. Users will be able to perform de novo SequencingPEAKS DB Database Search, Spectral Library Search, PEAKS PTM, and SPIDER, as in PEAKS Studio package. Optional PEAKS Q, and PEAKS IMSPEAKS-IMS-Logo-2-(no-text) modules can be enabled, for labelled and label free protein quantification and ion mobility data support.

Like PEAKS Studio, raw data from the mass spectrometer don’t need to be converted prior to adding the data into the PEAKS Online X workflow. Data can be added to PEAKS Online projects from the local file system (through the browser or command line client) or remote data repositories (for where your data is not stored locally). This makes it easy to submit projects from any computer with access to your network, even if it does not have direct access to your data file storage.

Once data has been uploaded to the PEAKS Online X server, setup a workflow or select from a predefined list. With PEAKS Online, users can define standardized workflows to implement protocols in the lab for regulatory compliance and establish project specific approaches that can be used across the whole group.

If an analysis needs to be rerun with a new set of parameters, the whole workflow does not need to be reprocessed. Using the PEAKS technology, samples can be added/removed to a project and workflows can be modified “on the fly”. PEAKS Online X will only process the information that needs to be updated while keeping any of the relevant information generated previously.

Once the desired results have been generated, users can export the results for each individual result node, like PEAKS Studio, or export all the results at once in a single step.


PEAKS Online X uses the latest technology of distributed computing to fully utilise the computing power of your hardware. The PEAKS Online X architecture is built on the highly popular Apache Cassandra database system, which is used in heavy-load applications like Facebook and Netflix. PEAKS Online X further distributes its computational workload to different workers using the Akka Actor system. This new setup gives PEAKS Online X the ability to do a lot of things that is was not feasible in the PEAKS Studio desktop version. In particular, PEAKS Online X is easily able to take on large-scale projects by more efficiently handling the increased workload of large cohort proteomics studies. In addition, its ready-to-scale, high-performance setup where the throughput and performance can be adjusted dynamically by changing the hardware configuration. Given the right resources, PEAKS Online X can speed up data processing at least 10 times faster than PEAKS Studio and can handle 1000 samples or more.

In a recent benchmarking study, we compared the scalability of PEAKS Online X as we increase the number of computing resources. The testing data set contained 56 samples where each sample was made up of 12 fractions. In total, we had 672 three-hour MS runs which composed of 5 million MS1 scans and 30 million MS2 scans. With the standard PEAKS Online X 32-threads licence, it took about 10 days to finish the project, from data loading , data refinement, de novo sequencing, PEAKS DB, PEAKS PTM and SPIDER. However, when we increased the CPU cores, the performance increase linearly. With 512 cores, it took a little more than half day to finish the whole analysis.

Summary of Dataset Used for Benchmark Analysis

# of Samples56
# of MS/MS Runs (180 mins.run)672
# of MS5106521
# of MS/MS28858408

PEAKS Online is distributed as a software package that includes a Server licence with multiple Client access. The base package of PEAKS Online includes 4 Client licences and can utilise up to 32 logical cores/threads. However, PEAKS Online is ready to scale, and researchers can increase the performance and/or number of users to meet the needs of any research group. The strength of the PEAKS Online Server and number of Client licences can be purchased to align the ideal solution.

# of Clients# of Cores/Threads
432
564
6128
7256
8512

*Note: The number of cores/threads indicates the number of usable cores or threads that can be allocated to the PEAKS Online Server’s worker nodes.

Users can purchase additional individual client licences or a site client licence to meet their lab’s needs. For quantification and ion mobility data support, purchase the optional PEAKS Q and PEAKS IMS add-on respectively.

Overall, PEAKS Online X is composed of 3 components: the Database Node(s), Master Node, and Worker Node(s). These can be distributed to different computers, or placed on the same machine. Each component has a unique pattern of computing resource usage and hardware requirements.

Database Node(s) of PEAKS Online X store all application data and are the base for all proteomics processing. Since PEAKS Online X is a distributed computing framework that can run from multiple machines, we use the popular distributed database system Cassandra as the main data storage to provide I/O performance at scale. Each data node, as part of a Cassandra cluster, has high demand for memory space and disk I/O speed.

The Master Node is the central hub of PEAKS Online X’s computing framework. It takes charge of scheduling, dispatching, and synchronisation of computing tasks. Although it does not perform any data processing, it takes care of the web based user interface, loading of raw data and exporting of result data, and would benefit more from high performance CPUs.

Worker Nodes are responsible for the actual data processing and computation. PEAKS Online X is easily scalable, so a worker node can be configured to use a customised number of CPU threads. As a rule of thumb, a worker node needs 2GB available memory for each computing thread and another 2GB spare memory for its own usage. Other than a few GB for logging, a worker node generally has no requirement for hard disk I/O speed nor space.

32 Threads Server Licence128 Threads Server Licence
Master16 Threads, 20 GB Memory16 Threads, 32 GB Memory
Cassandra8 Threads, 16 GB Memory SSD (5T+)8 Threads Each, 16 GB Memory Each, SSD (5T+)
Workers(4x): 32 Threads Total (8 Threads Per), 80 GB Memory (20GB Per)(16x): 128 Threads Total (8 Threads Per), 320 GB Memory (20 GB Per)
Suggested ConfigurationAll-in-one Installation:

INTEL XEON GOLD 5120 Procesor X2
(56 Threads including hyper threading)
Memory: 144GB
SSD: 500G (for OS)
SSD: 5T (for Cassandra)
Cluster with 2 machines, each has the following hardware:

INTEL XEON GOLD 6238 Processor X2
(88 Threads including hyper threading)
Memory: 256 GB
SSD: 500G (for OS)
SSD: 5T (for Cassandra)

Note: Although PEAKS Online X is best used in a cluster setting with dozens to hundreds of nodes, if you only have limited computing resources, one powerful workstation/server, you can still use it in standalone mode. In fact, all components of PEAKS Online can be started from the same machine. But in this case, you must make sure you have enough computing resources to accommodate the needs from all components. Cassandra has very intense usage of the hard drive. If everything is installed on the same machine, it’s highly recommended to have a separate high speed hard drive specifically for the data component.

References:

Resources:

PEAKS Online X Brochure