parallel sequencing, also known as next generation sequencing, is a technology
enabling high-throughput sequencing of genomes or loci of interest. A single
locus of sequences is provided for three samples for which variant detection is
the quality of the sequence reads
- Map the reads to a reference
- Examine sequence variation
- Visualize the mapping and
variants on the chromosome
are as follows.
Start Galaxy at the instructor provided link. (Students who are not in the
classroom can use
First time users should register by clicking on User and selecting Register.
This automatically logs in the user. Otherwise, login by clicking on User and
Import the data by clicking on “Shared Data” and selecting “Published
Histories”. Search for the Name “Source Data” and click on the “Source Data”
link. Next click on the link “Import history”. (A copy of the dataset is
Viewing the data:
In the Galaxy bar, click on “Analyze Data”. The imported files should appear
on the right with a green background. To view any of the files click on the
icon that looks like an eye. The files containing reads have the .fastq
extension, and the reference sequence has the .fa extension.
grooming: Go to the blue panel on the left, and click on the section “NGS:
QC and manipulation” to expand it. In the list of tools, click on “FASTQ
Groomer”. For the option “File to groom”, choose the first fastq file. Use
the defaults Sanger for “Input FASTQ quality scores type” and “Hide Advanced
Options” for “Advanced Options”. Click the Execute button. Repeat these
steps for the second and third fastq files.
In the same tool section, find the tool “FastQC:Read QC” and click on the link
to activate the tool. In the central panel, for “Short read data from your
current history”, select the first groomed fastq file. Enter the Title for
the ouput file as FastQC. “Contaminant list” is “Selection is Optional”.
Click the Execute button. To view the results, on the right panel click on
the eye icon next to the FastQC results.
the left panel, click on the section “NGS: Mapping” to expand the list of
mapping tools. Click on the link “Map with BWA for Illumina” to activate the
mapping tool. The reference genome is the fasta file we imported, chr21.fa.
It is in our History in the right panel. In the central panel, for the option
“Will you select a reference genome from your history or use a built-in
index”, select “Use one from the history”. Then choose chr21.fa for “Select a
reference from history”. For the option, “Is this library mate-paired”,
select “Single-end”, and for the “FASTQ file” option, choose the first groomed
fastq file. “BWA settings to use” are “Commonly Used”. Click on the Execute
button. Repeat these steps for the second and third groomed fastq files.
Finally, view the SAM (Sequence Alignment/Map format) output in the right
panel by clicking on the eye icon next to one of the “Map with BWA for
Format conversion: Many tools
require a binary version of SAM. It is called BAM. To convert sam to bam, go
to the left panel, and click on “NGS:SAM Tools”. Select the tool,
SAM-to-BAM. “Choose the source for the reference list” should be “History”.
“Convert SAM file” should be the first “Map with BWA …” file. “Using
reference file” should be “chr21.fa”. Repeat this for the second and third
“Map with BWA …” files.
- Pooling data: Pooling data: The variant caller FreeBayes can operate on pooled data. To merge the BAM files, click on “Convert, Merge, Randomize BAM datasets" under NGS:BAM Tools. Merge BAM Files” in the left panel in the same section. In the central panel, select the first SAM-to-BAM dataset. Use "+Insert BAM datasets to filter" to add additional file and repeat the step for the third file. Then click the Execute button.
- Variant detection: In the blue Tools panel on the left, click on the section “NGS: Call Variant Detection” to display the tools in this section. Or scroll to the top of this panel, and type FreeBayes in the search box. In the search results, click on FreeBayes to set up the variant detection calculation. Use version 0.0.3. For the option “Choose the source for the reference list”, select History. For the option “BAM file”, choose the merged file from the previous step. The option “Using reference file” should be chr21.fa, and the Basic options should be chosen. Click the Execute button. View the results by clicking in the right panel on the eye icon next to the FreeBayes results.
- Sorting the results: In the Tools panel, click on the section “Filter and Sort” to see the list of tools. (If you used the search box, click the x next to the query term to clear the search results.) Click on the Sort tool and look at the central panel. Set the “Sort Query” option to the FreeBayes result. The “on column” option should be c6, since the QUAL column is the sixth column. “with flavor” should be set to “Numerical sort”, and “everything in” should be “Descending order”. Click the Execute button, and view the results by clicking on the eye icon next to the Sort results in the right panel.
- Viewing the mapped reads: Click on MergedBams.bam file in the right panel to reveal the display option “display at UCSC main”. Click on main. This opens the UCSC Genome Browser with MergedBams.bam displayed. To see the top scoring variant, type chr21:27,818,520-27,818,550 in the search box, and click the go button. Then view the reads, by going down to the “Custom Tracks” section and select full in the menu labeled Convert, Merge, Randomize. Then click refresh. The variant in the reads is now visible.
- Viewing the variant analysis results (vcf file): Return to the Galaxy window to the panel on the right, and click on “FreeBayes on …” to reveal the display option “display at UCSC main”. Click on main. This opens another UCSC Genome Browser with the FreeBayes results added to the display. Scroll down to the “Custom Tracks” section and change the FreeBayes menu to pack. Change the Convert, Merge, Randomize menu to dense. This yields a track displaying only the variants.
- Examining the biological context: To view a variant in an exon, go to the textbox at the top of the page, type chr21:27,061,784-27,067,061, and click the go button. Scroll down to the bar labeled “Genes and Gene Prediction Tracks”; ensure that the menu for “UCSC Genes …” is set to pack. Now the variants can be seen in the context of the tracks chosen for display in the UCSC Genome Browser. Zoom in on the first variant in this view by holding down the shift key and the left mouse button to make a rectangle enclosing it. Finally, click on a variant in the FreeBayes track to view detailed information about it.