SeqStat is a package that contains tools to generate stats from a FastQ file, merge those stats for multiple samples, and validate the generated stats files.
Generate outputs several stats on a FASTQ file.
Outputted stats:
This module will merge seqstat files together and keep the sample/library/readgroup structure. If required it's also possible to collapse this, the output file then des not have any sample/library/readgroup structure.
A file from SeqStat will validate the input files. If aggregation values can not be regenerated the file is considered corrupt. This should only happen when the user will edit the seqstat file manually.
SeqStat requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of SeqStat here. To generate the usage run:
java -jar <SeqStat_jar> --help
By default stats are outputted to stdout in json format. If an output file is specified it writes to the file in json format.
When merging the files SeqStat will validate the input files and the output files. If aggregation values can not be regenerated the file is considered corrupt.
See example.
To run SeqStat and save the output in a JSON file:
java -jar <SeqStat_jar> generate \
-i input.fastq \
-o output.json \
--sample <sample_name> \
--library <library name> \
--readgroup <readgroup name>
Merging multiple files:
java -jar <SeqStat_jar> merge \
-i <seqstat file> \
-i <seqstat file> \
-o <output file>
Merging multiple files as collapsed format:
java -jar <SeqStat_jar> merge \
-i <seqstat file> \
-i <seqstat file> \
--combinedOutputFile <output file>
Both output formats at the same time:
java -jar <SeqStat_jar> merge \
-i <seqstat file> \
-i <seqstat file> \
-o <output file> \
--combinedOutputFile <output file>
Default:
java -jar <SeqStat_jar> validate \
-i <input file>
Usage for SeqStat:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
toolName | no | no | Name of the tool to execute |
tool args | no | yes (unlimited) | Arguments for the tool |
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--fastqR1, -i | yes | no | FastQ file to generate stats from |
--fastqR2, -j | no | no | FastQ file to generate stats from |
--output, -o | yes | no | File to write output to, if not supplied output go to stdout |
--sample | yes | no | Sample name |
--library | yes | no | Library name |
--readgroup | yes | no | Readgroup name |
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--inputFile, -i | yes | yes (unlimited) | Files to merge into a single file |
--outputFile, -o | no | no | Output file |
--combinedOutputFile | no | no | Combined output file |
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--inputFile, -i | yes | no | File to validate schema |
SeqStat is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of SeqStat can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
SeqStat is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on SeqStat. We have had
good results with this IDE.
For any question related to SeqStat, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.