Tutorials

ZOOM provides two sets of sample data in the “Sample_Data” directory. The “Solexa” directory contains an Illumina/Solexa test data set and the “SOLiD” directory has AB SOLiD test data set. The following provides instructions based on the provided sample data, however these steps can be repeated using any data from these instruments.

1. Load Data

2. Monitor the Job 3. Display Mapping Results
4. Finding SNP Candidates
5. Exporting Data into Files
6. Change Parameters to get More Mapping Results
7. Show Mapping Results of Several Jobs Together
8. Paired-end/Mate-Pair Read Mapping

1. Loading Data

Select the data relevant for your study. In this case ZOOM provides two sets of sample data in the “Sample_Data” directory.

In the Solexa directory, there are two directories:

  • “single_end” directory: “read.fastq” and “reference.fa”
  • “paired_end” directory: “read_1.fastq”, “read_2.fastq” and “reference.fa”
In the SOLiD directory, there are two directories:
  • “single_end” directory: “read.fastq” and “reference.fa”
  • “paired_end” directory: “read_F3.csfasta”, “read_F3_QV.qual” and “reference.fa”

Create a Job

In this simple example, we will only use one read file and one reference file; however, the same process can be used for jobs with read directory or multiple read files and multiple reference files.
Start by clicking the "Create new job" toolbar icon or select "New Job" from the File menu. The following window will appear:


The Basic Information page is used to assign a name and a directory to store the data related to your job. After completion, you can load the job to display results or perform post-analysis.

Start by entering a name for the job in the blank field beside the "Job Label", for example "Solexa_single_end_test".
Press the "browse" button to specify the directory you wish to save the job. For example "F:\ZOOMDB".

Enter a brief description about the job for future reference.
Click "Next" on the bottom of the window to continue.

Now Input reads. There are two ways to input reads files, by selecting read files or directories. ZOOM will automatically search for all the reads file inside. Please note that the read file should be in a standard format of next generation sequencing technologies. For example, “*_seq.txt”, “*_qual.txt”, “*.fastq”, “*.fasta” files for Illumina data, or“*.csfasta”, “*_QV.qual”,“*.fastq” files for AB SOLiD data.


Click the "Add read file/dir(s) to list" button, navigate to "Sample_Data\Solexa\single_end" directory and select the "read.fastq" file. The file will be selected in the read file list.

Click the "Add read files/dir(s) to list" button again to select other reads files. For example, select the "read.csfasta" file in "Sample_Data\SOLiD\sing_end" directory. Then the "read.csfasta" will be loaded into the read file list too.

Note that ZOOM also recognize that the "read.csfasta" file has a corresponding quality file "read_QV.qual" It will load the quality file too.

By clicking and dragging the mouse on the boundary between the “read file” and “quality file” headers, you can tune the width of the tablet and show the full name of the quality files.

ZOOM recognizes the corresponding quality file by the file names so please make sure that the read sequence file is in the same directory with the quality score file and the prefixes of the file names are same. For Illumina/Solexa data, the “_seq.txt” will be matched with “_qual.txt”. For ABI SOLiD data, the “.csfasta” will be matched with “_QV.qual”. The quality score in the FASTQ format will be loaded directly. Click the “Next” button on the bottom of the window to continue.

Assign the Reference Sequences where the reads data are mapped. Press the “Add file/dir(s) to list” button and choose the reference sequence “reference.fa” in the “Sample_Data\Solexa\single_end” directory. The sequences in the reference files should be in FASTA format. Multiple reference files or a directory can be loaded in. Click the “Next” button on the bottom of the window to continue.

For Mapping Parameters, use the following default parameters:


Click the “Finish” button. A new job will be created. A directory named “Sample_single_end_test” will be created. All information about this job will be stored in this directory. You can copy this directory anywhere. If you use ZOOM to load in this directory, the job can be shown and post-analysis can be carried out on it.

2. Monitoring the Job

After the job is created, the job will be shown in the “Job View” panel in the left window of the interface. For each job, ZOOM will automatically create a “task” to map these reads on the assigned server. If the amount of reads is large, ZOOM will automatically partition the reads into several parts and launch several tasks for each part of the reads. ZOOM will schedule these tasks automatically until all reads are handled, and the user can monitor the running status of the jobs and the tasks according to the corresponding progress bars in the “Running Monitor” window.

The Progress Bar will be usefull when loading large data files. The time is related to the data size of the reads data file. A progress bar will pop up showing the progress of loading data. ZOOM won’t respond until the progress bar has disappeared.

After loading the data, you will see the job in the Job View panel.


The “Job View” panel is shown in the upper left hand corner displays the organization of a particular job. Use the ‘+’ and ‘-’ boxes to expand and collapse the job in order to know the organization of this job. In each job node, there is a “Scheduling” node and a “Results” node.

The “Scheduling” node shows all the tasks this job has been split to and scheduled on the server.
The “Results” node will not appear until all reads mapping tasks are finished. It will contain the uniquely mapped results (suffixed by “[UNIQUE]”) and the top N mapping results (suffixed by “[ALL]”) according to the running parameters.

Clicking on the job node, the Running Monitor will show the progress of the job.
Click the “Job Properties” button to display the properties of this job, including the read files and the reference files, using parameters and mnemonic notes.


Click on a task node. The progress of the task will be shown.

3. Display Mapping Results

When the job icon turns into , the job is finished. You can show mapping results or carry out SNP analysis now. Make sure that you select the node under the “Results” node when choosing data to be analyzed. 1. Select the “UNIQUE” node in the “Results” node on the “Job View” panel and click the “Display mapping result” toolbar icon .


2. ZOOM will assemble the mapped reads into a consensus sequence and show the read depth overview along the reference sequence. This will take some time depending on the amount of mapped reads and the length of the reference sequence. A progress bar will pop up.

3.After the progress is finished, you can see a tabbed window containing the mapping results on the right hand of the main window of ZOOM as follows:

The line in the graph is the overview of the read depth of those mapped reads along the reference sequence. The horizontal ruler denotes the positions on the reference sequence. The vertical ruler denotes the read depth.
4. Press button to zoom in the graph or press .to zoom out in the graph.

5. Click the left button on your mouse and drag along the graph to form a rectangle region, and then release the mouse button. The selected rectangle region will be enlarged to the full window of the “Mapping Results Displaying Window” as follows:

6. Rest the cursor on a position of the peaks for a second. The average read depth of this position will be shown in a tooltip box besides the mouse.

7. Click on a place in the “Mapping Results Displaying window”. The detailed alignments of the mapped reads along the reference sequence will be shown as follows:

The sequence at the bottom of the window is the reference sequence. The sequence with green background over the reference sequence is the consensus sequence generated by the mapped reads along the reference sequence. The orange background of the nucleotides on the read or the consensus sequence highlights the difference from the nucleotide on the position of the reference sequence. The default display of the read is in the nucleotide space. For AB SOLiD data, the default display is the decoded nucleotide reads according to the mapping results.

Press to switch the reads display from the nucleotide space to the color space, and vice versa. The reads shown in color space look like the following:

8. Click or drag the horizontal scrollbar will let you navigate along the reference sequence.

9. Click or drag the vertical scrollbar on the right to show more reads aligned to this region when all the reads mapped here cannot fit in the “Mapping Results Window”.

Click on the “Reference Sequence Selecting Bar”. The reference sequence name list will be displayed. If there are multiple reference sequences, there will be a dropdown list where you can choose one reference sequence to show the alignments on it.

In this case, there is only one reference sequence named “reference sequence”.

10. Click on the “Locating bar”.

The “2513-2500” (you may see different numbers) is the offsets of current showing range in the “Mapping Results Illustrating window” on the reference sequence. Click on “remember current position”, and click the “Locating bar” again. You will see: “0:2513-2590” is recorded here, and by selecting this entry, you can go back to this region at any time.
Enter a new position or a position range in the “Locating bar” such as “1234” or “1234-4560”. Then read alignments in the new region will be shown in the “Mapping Results Illustrating window”.

11. Enter a single position such as “1234” in the “Locating bar” or click on a column in the ““Mapping Results Illustrating window”. A light blue bar will highlight this position as follows:

12. Click on any read in the “Mapping Results Illustrating window”.

The read will be highlight by a red rectangle. At the same time, more information of this mapped read will be shown in the “read information” tab window below the “Mapping Results Illustrating window”:

Note that the direction of the alignment shown in the “read information” tab is the same as the direction of the read sequence in the read files. If a read is mapped to the reverse chain of the reference sequence, the reference segment is reversed and the left offset is larger than the right offset as shown in the above picture.

13. Click the “Copy the read sequence” button, then the read name and the read sequence will be copied to the clipboard of system.
14. Click the “Solexa_single_end_test” job node and click toolbar icon.

A summary of the mapping results will be shown in the pop up window. The summary includes the total number of reads in the read data files, the number of reference sequences and the length of the reference sequences:

Click on the “Unique Mapping Results” tab to show the number of reads mapped uniquely and the statistics of the uniquely mapping results:

15. Click the “UNIQUE” results node, and click the toolbar icon.

The summary of the uniquely mapped results will be show in a “Mapping Summary” tab window beside the “read information” tab.

4. Finding SNP Candidates

We suggest that users find SNP candidates only using the uniquely mapped reads (i.e. using the [UNIQUE] result node other than [ALL] result node). Because the [All] result node contains top N mapping results for each read, those reads mapped to multiple positions of the reference sequence will make the SNP finding process unreliable.
1. Click the “Solexa_single_end_test[UNIQUE]” result node, and click the “filter SNP candidates” toolbar icon (or Select “SNP Filter” from the “Tools” menu).

A window showing “Filter criteria” will pop up as follows:
There are five filtering criteria which you can apply for the SNP finding.

2. Click on the checkbox of the filtering criterion “At least … reads are mapped to this position” and revise the value to 10.

3. Press “OK” button. Then SNP finding on all the reference sequences will be carried out. A progress bar will pop up:

4. When all SNP candidates are located, a table containing SNP candidates will appear in the “SNP Caller” tab as follows:

Each row of the table is a SNP candidate. The table has 9 fields showing 9 features of each SNP.
Click the “SNP Summary” button. The amount of SNP candidates satisfying the filtering criteria and the filtering criteria adopted will be shown:

5. Double click the first row in the table to show the first SNP.

The light blue bar will highlight the SNP position. You can check the alignment around this position in detail. You can double click each row in the table to see the SNP candidate details.

6. Click one read in this position and click the “read information” tab. You can check the quality of the position of this read to know whether the SNP candidate is more likely a true SNP or a sequencing error.


7. Click the “SNP Caller” tab to show the SNPs. Click the “<” or the “>” button to jump to the previous or the next SNP candidate.

8. Click the “Read Depth” field in the header of the SNP table to sort the candidates according to the read depth in ascending order. Click it again to sort in descending order. Similarly each field in the SNP table can be sorted.

9. Click the “Export all SNPs” button to export the SNP candidates into a file. All SNP candidates will be exported in a format of the nine fields as each line in the SNP table.

5. Export data into files

The mapping results and consensus sequence can be exported to files. Note that only results nodes can be exported. 1. Select the “Solexa_single_test[UNIQUE]” result node.

2. Select “Export” from the “File” menu. Select “Mapping Results” from the popup menu. There are four output formats to export mapping results into.


3. Select “Export” from the “File” menu. Select “Consensus Sequences” from the popup menu. The consensus sequence built according to the mapping results will be exported in FASTA format. Note that we suggest only building a consensus sequence on the [UNIQUE] result node based on similar reason for SNP finding.

6. Change parameters to get more mapping results

For the unmapped reads of this job, adjusting parameters such as the reference sequences, mismatch number allowed between reads and reference sequences may achieve more mapping results.
1. Click the “Solexa_single_end” job node and click the “reprocess unmapped reads” toolbar icon .


The following window will pop up:

The process is similar to creating a new job, except that the reads data is the unmapped reads of the selected job. Assign a name to the new job for these unmapped reads. The default name is the original name suffixed by “.more”.
2. Click “Next” twice to the “mapping parameters” step.

3. Check the radio box from the “the unique …” to “top…”, and modify the value to 2, to keep up to 2 mapping results for each read.

4. Modify the mismatch number from 2 to 4, which will allow up to four mismatches between the reads and the reference sequences.

5. Click the "Achieve high sensitivity (more mapping results but lower mapping speed)" box.
This will achieve full sensitivity to find all the mapping results with up to 4 mismatches.

6. Click the “Finish” button to create this new job. A new job “Solexa_single_end_test.more” will be created and processed. After the new job is finished, there will be an additional job appearing in the “Job View panel” as follows:

The new job has two Results nodes --- the [UNIQUE] and the [ALL] node because we set the parameters to collect top two mapping results for each read. The uniquely mapped result is in [UNIQUE] result node, while the top two mapping results are in the [ALL] node.
Click the toolbar icon. The job summary window will appear:

1983 reads are unmapped in the “Solexa_single_end_test” job. There are two summaries for uniquely mapped results and the top two mapped results, respectively.

7. Click the “Unique Mapping Results” tab. You can see that 1783 reads are mapped after increasing the mismatch number from 2 to 4 between the reads and the reference sequence.

8. Click the “All Mapping Results” tab. There are 1795 mapping positions in the top two mapping results. Note that this is the number of mapping positions rather than the number of mapped reads, because one read might be mapped to multiple positions.

7. Show Mapping Results of Several Jobs Together

If two or more jobs have the same reference sequence, you can choose to merge the mapping results of these jobs to show the mapping results together.

1. Press the “Ctrl” key on the keyboard, and click the “Solexa_single_end_test[UNIQUE]” Results node and the “Solexa_sinlge_end.more[UNIQUE]” Results node. Release the “Ctrl” key.


2. Click the “” toolbar icon, to display the merged mapping results in the “mapping results window”.

You can do any operation on it as single result node, or SNP finding on these merged mapping results.

8. Paired-end/Mate-pair read mapping example

We assume that you have gone through the above single-end reads mapping process. Now we will explain how to map paired-end/mate-pair reads, focusing only on the operations that are different from mapping single-end reads.
1. Click to create a new job named “ABI_mate_pair_test” as follows:

2. Click the “Next” button to move to the “Input reads” step.
Click “paired files mode” button to change to the mode of inputting mate-pair reads as follows:

The read file list window is split into two windows, each window load each end of the mate-pair reads file. Make sure every two files in the same row of the left and the right window are paired.

3. Click the “Auto find pair read files into list” button. Choose both “read_F3.csfasta” and “read_R3.csfasta” files in “Sample_Data\SOLiD\mate-pair\” directory. ZOOM will automatically recognize the possible paired files and put them in one row together with their quality file if any.

ZOOM automatically finds paired read files according to the suffix of the files: _F3.csfasta will be paired with _R3.csfasta; and _1.fastq will be paired with _2.fastq.

If you choose a directory, ZOOM will automatically pair the files satisfying the naming rule. Thus if you want ZOOM to pair the read files for you, please make sure the file suffixes are correct. You can choose to add some patterns of recognizing paired-end files, otherwise, you will need to feed in the reads files one pair by one pair by yourself as follows:
Double click the left “forward read file” window, and select the “Sample_Data\Solexa\pair-end\read_1.fastq”.


Double click the right “reverse read file” window, and select the “Sample_Data\Solexa\pair-end\read_2.fastq”.

PLEASE KEEP IN MIND that two reads files in one row are paired. When you select one file, the two files in the row are both selected as follows:

Select the Solexa data and click the “Remove pairs from list” button to delete the file. We will not be using this set of data in the following tour.

4. Click “Next” button to move on to the “Reference sequences”. Choose the “reference.fa” file in the “Sample_Data\SOLiD\mate-pair\” directory. Click “Next” to move on to the “Mapping Parameters”.

5. The estimated range of the distance between two reads of one mate-pair is [800, 2000]. Set the paired-end parameters as follows:

Keep the top two mapping results for each read:

Click “Finish” to create the “ABI_mate_pair_test” job.
6. After the job is finished, click the “ABI_mate_pair_test[UNIQUE]” result node and click the toolbar icon to show the mapping results.

7. Click any place in the “Mapping Results Illustrating window”, select a read, and press the “Find its mate-pair” button. ZOOM will then jump to the pair of the selected read.