Description

This tool divides a fastq file into smaller fastq files, based on the number of output files specified. For ecample, if one specifies 5 output files, it will split the fastq into 5 files of equal size. This can be very useful if one wants to use the chunking option in a pipeline: FastqSplitter can generate the exact number of fastq files (chunks) as needed.

FastqSplitter will read groups of reads (100 reads per group) and distribute this evenly over the output FASTQ files. FastqSplitter will iterate over all the output files while writing the read groups.

Example: A fastq file is split with a group size of 100 and three output files. read 1-100 will be assigned to output1 read 101-200 will be assigned to output2 read 201-300 will be assigned to output3 read 301-400 will be assigned to output1 read 401-500 will be assigned to output2 etc.

This will make sure the output fastq files are of equal size and there is no positional bias in each output file.

Installation

FastqSplitter requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of FastqSplitter here. To generate the usage run:

java -jar <FastqSplitter_jar> --help

Manual

FastqSplitter needs an input file and as many output files as are required. If five output files are given, the input file will be split in five files.

Example

To split a file into three different files of roughly equal size:

java -jar <FastqSplitter_jar> \
-I myfastQ.fastq \
-o mySplittedFastq_1.fastq \
-o mySplittedFastq_2.fastq \
-o mySplittedFastq_3.fastq

Usage

Usage for FastqSplitter:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--inputFile, -I yes no Path to input file
--outputFile, -o yes yes (unlimited) Path to output file. Multiple output files can be specified.

About

FastqSplitter is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of FastqSplitter can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

FastqSplitter is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on FastqSplitter. We have had good results with this IDE.

Contact

For any question related to FastqSplitter, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.