PL-Grid instance of Galaxy provides all the tools necessary for quality control of your raw reads, using FASTQC software, and for preparation to alignment to the reference throughout trimming and filtering, using Flexbar software. This tutorial will give you an introduction to how to use these tools and it will guide you through the process.
For the reference, the figures above show where to quickly find the used tools.
In this tutorial we will use sample single-end (SE) reads from total RNA sequencing of two different chicken lines.
The dataset is available in the list of published histories (https://galaxy.plgrid.pl/history/list_published) - look for the Adapters history. You may import it to your working space from there.
we will use the file ‘test_dataset.fastq’ which is available for download from the Bismark homepage (it contains 10,000 reads in FastQ format, Phred33 qualities, 50 bp long reads, from a human directional BS-Seq library).
Link to input dataset:
File format: fastqsanger [HINT: this is important, because galaxy will recognize all kinds of fastq files as generic fastq format. However, most tools require more specific fastqsanger format]
You could do it also later using "Edit Attributes" in your history Data/History window.
|NGS: QC and manipulation -> FASTQC: FASTQ/SAM/BAM -> FastQC: Read QC Quality reports using FastQC|
You could also add the "Contaminant list" if you know basic assumptions of library preparation step, or add the list provided us by FASTQC authors (https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt).
The FASTQC results should be available in your history window (click the eye icon near the name of the history step related to the executed FastQC run):
Some of these results are presented below:
Flexbar software demultiplexes barcoded runs and removes adapter sequences. Moreover, trimming and filtering features are provided. Flexbar increases read mapping rates and improves genome and transcriptome assemblies. It supports next-generation sequencing data in fasta/q and csfasta/q format from Illumina, Roche 454, and the SOLiD platform.
|NGS: QC and manipulation Personalized medicine -> Flexbar flexible barcode and adapter removal|