âšī¸ Skipped - page is already crawled
| Filter | Status | Condition | Details |
|---|---|---|---|
| HTTP status | PASS | download_http_code = 200 | HTTP 200 |
| Age cutoff | PASS | download_stamp > now() - 6 MONTH | 0.2 months ago |
| History drop | PASS | isNull(history_drop_reason) | No drop reason |
| Spam/ban | PASS | fh_dont_index != 1 AND ml_spam_score = 0 | ml_spam_score=0 |
| Canonical | PASS | meta_canonical IS NULL OR = '' OR = src_unparsed | Not set |
| Property | Value |
|---|---|
| URL | https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ |
| Last Crawled | 2026-04-02 00:06:59 (5 days ago) |
| First Indexed | 2017-03-23 11:56:36 (9 years ago) |
| HTTP Status Code | 200 |
| Meta Title | Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data |
| Meta Description | null |
| Meta Canonical | null |
| Boilerpipe Text | Function
A quality control tool for high throughput sequence data.
Language
Java
Requirements
A
suitable Java Runtime Environment
The
Picard
BAM/SAM Libraries (included in download)
Code Maturity
Stable. Mature code, but feedback is appreciated.
Code Released
Yes, under
GPL v3 or later
.
Initial Contact
Simon Andrews
Download Now
View our tutorial video
FastQC aims to provide a simple way to do some quality control checks on raw sequence
data coming from high throughput sequencing pipelines. It provides a modular set of
analyses which you can use to give a quick impression of whether your data has any
problems of which you should be aware before doing any further analysis.
The main functions of FastQC are
Import of data from BAM, SAM or FastQ files (any variant)
Providing a quick overview to tell you in which areas there may be problems
Summary graphs and tables to quickly assess your data
Export of results to an HTML based permanent report
Offline operation to allow automated generation of reports without running the interactive application
Documentation
A
copy of the FastQC
documentation
is available for you to try before you buy (well download..).
Example Reports
Good Illumina Data
Bad Illumina Data
Adapter dimer contaminated run
Small RNA with read-through adapter
Reduced Representation BS-Seq
PacBio
454
Changelog
01-03-23: Version 0.12.0 released
Fix a bug in file type detection on OSX
01-03-23: Version 0.12.0 released
Add total base count to basic stats
Add dup_length option to set the level of truncation for duplicate finding
Make default truncation length always 50bp
Removed the deduplicated duplication line from the duplicate plot
Improve memory handling and add a --memory option to the command line
Move BAM parsing to htsjdk
Make colours colourblind friendly
Generate SVG versions of graphs, and add a --svg option to use these in the report
Add line numbers to parsing errors
Change the default adapter sequences to search
08-01-19: Version 0.11.9 released
Fixed a bug when analysing empty files
Added support for multi-read fast5 files
Fixed a corner case bug in adapter detection
Bundled a JRE with the OSX build so you don't have to install it
Fixed a hang if the program runs out of memory
04-10-18: Version 0.11.8 released
Fixed a performance bug in highly duplicated sequences
Changed the behaviour of the sequence length module when run with --nogroup
Other minor bug fixes
10-01-18: Version 0.11.7 released
Fixed a crash if the first sequence in a file was shorter than 12bp
21-12-17: Version 0.11.6 released
Disabled the Kmer plot by default
Fixed a bug when long custom adapters were being used
Changed the tile number cutoff to accommodate the novaseq
Fixed various format changes in nanopore data from ONT
Added new Clontech sequences to the contaminant list
Added a --min-length option to remove short sequences
Added an option to specify the output name of data streamed into the program
08-03-16: Version 0.11.5 released
Fixed the smallRNA adapter sequence so that abundance isn't under-represented in the adapter content plot
Fixed a bug in the warn / error code for the per-base sequence content plot
Fixed a typo in the documentation for the duplication plot
09-10-15: Version 0.11.4 released
Changed the OSX launcher to not rely on the internal JVM framework, but use any command line java which is found
Fixed a typo in one of the adapter sequences
Fixed a bug which meant that some file extensions weren't removed from report names in non-interactive mode
Made the per-tile module not collect any stats if it's disabled in limits.txt
Fixed a bug in the calculation of duplication for highly duplicated, ordered files with very small numbers of sequences
Fixed an incorrect error flag in the per-base quality module where there were less than 100 observations in a read group
25-3-15: Version 0.11.3 released
Fixed a bug when disabling the per-tile plot from limits.txt
Fixed a bug which caused the program to continue when processing of multiple files was actually complete
Fixed a bug which meant format selection in the interactive application didn't work
Added checks for mis-itentifying tile numbers in confusing sample ids
Added the SOLID smallRNA adapter to the standard search set
Fixed a bug when extracting casava names from uncompressed fastq files
Added support for processing files of Oxford Nanopore reads
6-6-14: Version 0.11.2 released
Fixed incorrect warn/fail defaults for per-seq quality plot
Fixed memory leaks in Kmer and per-seq quality modules
Added an option to use a custom limits file
Fixed a bug in the naming of the folder inside the zip output file
Fixed a bug in the --extract option
2-6-14: Version 0.11.1 released
Added configurable warn/fail thresholds for all modules
Allow modules to be selectively turned off
Added a per-tile quality plot for Illumina libraries
Added an adapter content plot
Improved the duplication plot
Improved the Kmer module
Used embedded graphics in the HTML output so you can distribute a single file
Added the ability to read data from stdin
Changed how base grouping works to better accommodate long reads
Dropped support for Solexa64 format (NB
not
Phred 64 which is still supported)
3-5-12: Version 0.10.1 released
Added a workround to allow the analysis of concatenated gzipped files
Fixed a bug when FastQC was installed in a path containing characters needing to be escaped in a URL
Added an option to specify the location of the java interpreter on the command line
9-9-11: Version 0.10.0 released
Added a Casava mode to sanely process the multiple fastq files produced by the latest illumina pipeline
Fixed a bug in Kmer analysis which missed of the last possible Kmer in each sequence
Fixed a classpath bug if using the wrapper script under windows
31-8-11: Version 0.9.6 released
Fixed a crash in libraries where every sequence ended in poly-N
Fixed the launch wrapper to set the classpath correctly on OSX
16-8-11: Version 0.9.5 released
Fixed a bug in text output for the per-base sequence content module
Made progress reporting absolute, and not approximate
Added a print CSS style so reports are printable again
13-7-11: Version 0.9.4 released
Improved the error reporting for failed files in the offline application
16-6-11: Version 0.9.3 released
Added support for bzip2 compressed fastq files
Added new CSS theme for HTML reports, contributed by Phil Ewels
16-5-11: Version 0.9.2 released
Fixed a bug where grouped base numbers weren't reported in the per-base quality text report
Fixed a crash in the Kmer analysis when analysing small files
30-3-11: Version 0.9.1 released
Added --quiet and --nogroup options to command line
Added encoding type to the basic stats
Added detection of Illumina <1.3 1.3 1.5 and 1.9 encodings
10-2-11: Version 0.9.0 released
Added support for very long reads (esp 454 and PacBio)
Duplication detection now uses only the first 50bp of each read
21-1-11: Version 0.8.0 released
Made all graphs easier to interpret
Added an option to analyse only mapped sequences from a BAM/SAM file
Added an option to analyse two or more files in parallel
24-11-10: Version 0.7.2 released
Fixed bug when analysing libraries with no unique sequences
Added an option to specify a custom contaminant list on the command line
24-11-10: Version 0.7.1 released
Improved the command line interface with proper options and error handling
Added an option to force the file format where guessing from the filename doesn't work
27-10-10: Version 0.7.0 released
Added a Kmer enrichment analysis to find non-aligned enriched sequences
Cleaned up axis labels on all graphs
27-10-10: Version 0.6.1 released
Fixed a bug which caused some sequences and qualities from BAM/SAM files to be reversed
18-10-10: Version 0.6.0 released
Sequences can now be read from SAM/BAM format files
Added smoother lines to the graphs
29-09-10: Version 0.5.1 released
Fixed a formatting bug in the text output
Fixed the %GC plot to work well with reads over 100bp
Improved the fitting of the modelled curve to the %GC plot
Added more illumina oligos to the contaminants file
16-09-10: Version 0.5.0 released
Improved the fitting of the normal distribution to %GC plot
Calculated the total duplicated sequence % in the duplicate sequence module
Added pass/fail/warn icons next to each section of the HTML report
Put Icons and Images into subfolders in the HTML report
30-07-10: Version 0.4.3 released
Fixed the reporting of sequence counts in the Basic Stats module
Added a warning before overwriting reports in the interactive application
26-07-10: Version 0.4.2 released
Fixed y-axis scale on per-base quality plot
Added fail / warn checks to modules which lacked them and improved existing checks
Added a modelled distribtion to the per-sequence GC plot
Scale the width of report graphs for long sequence reads
24-06-10: Version 0.4.1 released
Changed the duplicate module to reduce memory usage for long sequences
Changed the way duplicate levels are counted to be more realistic
18-06-10: Version 0.4 released
Added a sequence duplication level module
Added a lauch wrapper for easier use from the commandline
Added full machine parsable output for integration into pipelines
28-05-10: Version 0.3.1 released
Fixed a bug where invalid template files caused a crash
Non-interactive use now correctly reports progress for all files, not just the first one
Added some missing documentation
13-05-10: Version 0.3 released
Added support for gzip compressed fastq files
Added identification of overrepresented sequences
Improved colorspace support
Added an option to save non-interactive reports to a specific directory
06-05-10: Version 0.2 released
Added support for colorspace fastq files
Added templating support to allow customisation of HTML reports
Unzipped non-interactive reports by default, and added an option to turn this off
Added easily computer readable summary file to reports
28-04-10: Version 0.1.1 released
Fixed a bug which prevented non-interactive use on a headless system
26-04-10: Version 0.1 released
Initial set of 9 modules
Interactive and offline operation functional |
| Markdown | 
[About](https://www.bioinformatics.babraham.ac.uk/index.html) \| [People](https://www.bioinformatics.babraham.ac.uk/people.html) \| [Services](https://www.bioinformatics.babraham.ac.uk/services.html) \| [Projects](https://www.bioinformatics.babraham.ac.uk/projects/index.html) \| [Training](https://www.bioinformatics.babraham.ac.uk/training.html) \| [Publications](https://www.bioinformatics.babraham.ac.uk/publications.html)
## FastQC
| | |
|---|---|
| Function | A quality control tool for high throughput sequence data. |
| Language | Java |
| Requirements | A [suitable Java Runtime Environment](https://adoptopenjdk.net/) The [Picard](http://picard.sourceforge.net/) BAM/SAM Libraries (included in download) |
| Code Maturity | Stable. Mature code, but feedback is appreciated. |
| Code Released | Yes, under [GPL v3 or later](http://www.gnu.org/copyleft/gpl.html). |
| Initial Contact | [Simon Andrews](https://www.bioinformatics.babraham.ac.uk/people.html#simon) |
| [Download Now](https://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc) | |

[View our tutorial video](http://www.youtube.com/watch?v=bz93ReOv87Y)
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
The main functions of FastQC are
- Import of data from BAM, SAM or FastQ files (any variant)
- Providing a quick overview to tell you in which areas there may be problems
- Summary graphs and tables to quickly assess your data
- Export of results to an HTML based permanent report
- Offline operation to allow automated generation of reports without running the interactive application
## Documentation
A [copy of the FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/) documentation is available for you to try before you buy (well download..).
## Example Reports
- [Good Illumina Data](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc.html)
- [Bad Illumina Data](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_sequence_fastqc.html)
- [Adapter dimer contaminated run](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/RNA-Seq_fastqc.html)
- [Small RNA with read-through adapter](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/small_rna_fastqc.html)
- [Reduced Representation BS-Seq](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/RRBS_fastqc.html)
- [PacBio](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/pacbio_srr075104_fastqc.html)
- [454](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/454_SRR073599_fastqc.html)
## Changelog
- 01-03-23: Version 0.12.0 released
- - Fix a bug in file type detection on OSX
- 01-03-23: Version 0.12.0 released
- - Add total base count to basic stats
- Add dup\_length option to set the level of truncation for duplicate finding
- Make default truncation length always 50bp
- Removed the deduplicated duplication line from the duplicate plot
- Improve memory handling and add a --memory option to the command line
- Move BAM parsing to htsjdk
- Make colours colourblind friendly
- Generate SVG versions of graphs, and add a --svg option to use these in the report
- Add line numbers to parsing errors
- Change the default adapter sequences to search
- 08-01-19: Version 0.11.9 released
- - Fixed a bug when analysing empty files
- Added support for multi-read fast5 files
- Fixed a corner case bug in adapter detection
- Bundled a JRE with the OSX build so you don't have to install it
- Fixed a hang if the program runs out of memory
- 04-10-18: Version 0.11.8 released
- - Fixed a performance bug in highly duplicated sequences
- Changed the behaviour of the sequence length module when run with --nogroup
- Other minor bug fixes
- 10-01-18: Version 0.11.7 released
- - Fixed a crash if the first sequence in a file was shorter than 12bp
- 21-12-17: Version 0.11.6 released
- - Disabled the Kmer plot by default
- Fixed a bug when long custom adapters were being used
- Changed the tile number cutoff to accommodate the novaseq
- Fixed various format changes in nanopore data from ONT
- Added new Clontech sequences to the contaminant list
- Added a --min-length option to remove short sequences
- Added an option to specify the output name of data streamed into the program
- 08-03-16: Version 0.11.5 released
- - Fixed the smallRNA adapter sequence so that abundance isn't under-represented in the adapter content plot
- Fixed a bug in the warn / error code for the per-base sequence content plot
- Fixed a typo in the documentation for the duplication plot
- 09-10-15: Version 0.11.4 released
- - Changed the OSX launcher to not rely on the internal JVM framework, but use any command line java which is found
- Fixed a typo in one of the adapter sequences
- Fixed a bug which meant that some file extensions weren't removed from report names in non-interactive mode
- Made the per-tile module not collect any stats if it's disabled in limits.txt
- Fixed a bug in the calculation of duplication for highly duplicated, ordered files with very small numbers of sequences
- Fixed an incorrect error flag in the per-base quality module where there were less than 100 observations in a read group
- 25-3-15: Version 0.11.3 released
- - Fixed a bug when disabling the per-tile plot from limits.txt
- Fixed a bug which caused the program to continue when processing of multiple files was actually complete
- Fixed a bug which meant format selection in the interactive application didn't work
- Added checks for mis-itentifying tile numbers in confusing sample ids
- Added the SOLID smallRNA adapter to the standard search set
- Fixed a bug when extracting casava names from uncompressed fastq files
- Added support for processing files of Oxford Nanopore reads
- 6-6-14: Version 0.11.2 released
- - Fixed incorrect warn/fail defaults for per-seq quality plot
- Fixed memory leaks in Kmer and per-seq quality modules
- Added an option to use a custom limits file
- Fixed a bug in the naming of the folder inside the zip output file
- Fixed a bug in the --extract option
- 2-6-14: Version 0.11.1 released
- - Added configurable warn/fail thresholds for all modules
- Allow modules to be selectively turned off
- Added a per-tile quality plot for Illumina libraries
- Added an adapter content plot
- Improved the duplication plot
- Improved the Kmer module
- Used embedded graphics in the HTML output so you can distribute a single file
- Added the ability to read data from stdin
- Changed how base grouping works to better accommodate long reads
- Dropped support for Solexa64 format (NB **not** Phred 64 which is still supported)
- 3-5-12: Version 0.10.1 released
- - Added a workround to allow the analysis of concatenated gzipped files
- Fixed a bug when FastQC was installed in a path containing characters needing to be escaped in a URL
- Added an option to specify the location of the java interpreter on the command line
- 9-9-11: Version 0.10.0 released
- - Added a Casava mode to sanely process the multiple fastq files produced by the latest illumina pipeline
- Fixed a bug in Kmer analysis which missed of the last possible Kmer in each sequence
- Fixed a classpath bug if using the wrapper script under windows
- 31-8-11: Version 0.9.6 released
- - Fixed a crash in libraries where every sequence ended in poly-N
- Fixed the launch wrapper to set the classpath correctly on OSX
- 16-8-11: Version 0.9.5 released
- - Fixed a bug in text output for the per-base sequence content module
- Made progress reporting absolute, and not approximate
- Added a print CSS style so reports are printable again
- 13-7-11: Version 0.9.4 released
- - Improved the error reporting for failed files in the offline application
- 16-6-11: Version 0.9.3 released
- - Added support for bzip2 compressed fastq files
- Added new CSS theme for HTML reports, contributed by Phil Ewels
- 16-5-11: Version 0.9.2 released
- - Fixed a bug where grouped base numbers weren't reported in the per-base quality text report
- Fixed a crash in the Kmer analysis when analysing small files
- 30-3-11: Version 0.9.1 released
- - Added --quiet and --nogroup options to command line
- Added encoding type to the basic stats
- Added detection of Illumina \<1.3 1.3 1.5 and 1.9 encodings
- 10-2-11: Version 0.9.0 released
- - Added support for very long reads (esp 454 and PacBio)
- Duplication detection now uses only the first 50bp of each read
- 21-1-11: Version 0.8.0 released
- - Made all graphs easier to interpret
- Added an option to analyse only mapped sequences from a BAM/SAM file
- Added an option to analyse two or more files in parallel
- 24-11-10: Version 0.7.2 released
- - Fixed bug when analysing libraries with no unique sequences
- Added an option to specify a custom contaminant list on the command line
- 24-11-10: Version 0.7.1 released
- - Improved the command line interface with proper options and error handling
- Added an option to force the file format where guessing from the filename doesn't work
- 27-10-10: Version 0.7.0 released
- - Added a Kmer enrichment analysis to find non-aligned enriched sequences
- Cleaned up axis labels on all graphs
- 27-10-10: Version 0.6.1 released
- - Fixed a bug which caused some sequences and qualities from BAM/SAM files to be reversed
- 18-10-10: Version 0.6.0 released
- - Sequences can now be read from SAM/BAM format files
- Added smoother lines to the graphs
- 29-09-10: Version 0.5.1 released
- - Fixed a formatting bug in the text output
- Fixed the %GC plot to work well with reads over 100bp
- Improved the fitting of the modelled curve to the %GC plot
- Added more illumina oligos to the contaminants file
- 16-09-10: Version 0.5.0 released
- - Improved the fitting of the normal distribution to %GC plot
- Calculated the total duplicated sequence % in the duplicate sequence module
- Added pass/fail/warn icons next to each section of the HTML report
- Put Icons and Images into subfolders in the HTML report
- 30-07-10: Version 0.4.3 released
- - Fixed the reporting of sequence counts in the Basic Stats module
- Added a warning before overwriting reports in the interactive application
- 26-07-10: Version 0.4.2 released
- - Fixed y-axis scale on per-base quality plot
- Added fail / warn checks to modules which lacked them and improved existing checks
- Added a modelled distribtion to the per-sequence GC plot
- Scale the width of report graphs for long sequence reads
- 24-06-10: Version 0.4.1 released
- - Changed the duplicate module to reduce memory usage for long sequences
- Changed the way duplicate levels are counted to be more realistic
- 18-06-10: Version 0.4 released
- - Added a sequence duplication level module
- Added a lauch wrapper for easier use from the commandline
- Added full machine parsable output for integration into pipelines
- 28-05-10: Version 0.3.1 released
- - Fixed a bug where invalid template files caused a crash
- Non-interactive use now correctly reports progress for all files, not just the first one
- Added some missing documentation
- 13-05-10: Version 0.3 released
- - Added support for gzip compressed fastq files
- Added identification of overrepresented sequences
- Improved colorspace support
- Added an option to save non-interactive reports to a specific directory
- 06-05-10: Version 0.2 released
- - Added support for colorspace fastq files
- Added templating support to allow customisation of HTML reports
- Unzipped non-interactive reports by default, and added an option to turn this off
- Added easily computer readable summary file to reports
- 28-04-10: Version 0.1.1 released
- - Fixed a bug which prevented non-interactive use on a headless system
- 26-04-10: Version 0.1 released
- - Initial set of 9 modules
- Interactive and offline operation functional
Having problems with the site? Please [let us know](mailto:simon.andrews@babraham.ac.uk) |
| Readable Markdown | null |
| Shard | 79 (laksa) |
| Root Hash | 9785297403088939879 |
| Unparsed URL | uk,ac,babraham!bioinformatics,www,/projects/fastqc/ s443 |