Description

ExtractAdaptersFastqc reads which adapter sequences where found from a FastQC raw report. These sequences can be used as input for a QC tool such as cutadapt. The sequences can be output in plain text format with a newline character as a separator between the sequences. Alternatively the sequences can be output in FASTA format.

Installation

ExtractAdaptersFastqc requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of ExtractAdaptersFastqc here. To generate the usage run:

java -jar <ExtractAdaptersFastqc_jar> --help

Manual

The tool wil only find sequences that are known to fastqc. This is by default defined in this files: - /Configuration/adapterlist.txt - /Configuration/contaminantlist.txt

These files are required for this tool to find the correct adapters.

The adapter list is only available to fastqc 0.11+

Example

A default run would look like this, output will go to stdout:

java -jar <ExtractAdaptersFastqc_jar> \
-i <fastqc_data_file> \
--knownContamFile <contems_file> \
--knownAdapterFile <adapter_file>

To select output files:

java -jar <ExtractAdaptersFastqc_jar> \
-i <fastqc_data_file> \
--knownContamFile <contems_file> \
--knownAdapterFile <adapter_file> \
--adapterOutputFile <output_file> \
--contamsOutputFile <output_file>

Usage

Usage for ExtractAdaptersFastqc:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--inputFile, -i yes no Fastqc data file (i.e., fastqc_data.txt file in the FastQC output)
--adapterOutputFile no no Output file for adapters, if not supplied output will go to stdout
--contamsOutputFile no no Output file for adapters, if not supplied output will go to stdout
--skipContams no no If this is set only the adapters block is used, other wise contaminations is also used
--knownContamFile no no This file should contain the known contaminations from fastqc
--knownAdapterFile no no This file should contain the known adapters from fastqc
--adapterCutoff no no The fraction of the adapters in a read should be above this fraction, default is 0.001
--outputAsFasta no no Output in fasta format, default only sequences

About

ExtractAdaptersFastqc is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of ExtractAdaptersFastqc can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

ExtractAdaptersFastqc is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on ExtractAdaptersFastqc. We have had good results with this IDE.

Contact

For any question related to ExtractAdaptersFastqc, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.