This tool breaks a reference or bed file into smaller scatter regions of equal size. This can be used for processing inside a pipeline.
ScatterRegions requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of ScatterRegions here. To generate the usage run:
java -jar <ScatterRegions_jar> --help
This always require a reference fasta with a dict file next to it. If the a bed file is supplied the tool will validate this file to the given reference.
Default run:
java -jar <ScatterRegions_jar> \
-R reference fasta \
-o <output dir>
With scatter size:
java -jar <ScatterRegions_jar> \
-R reference fasta \
-o <output dir> \
-s 5000000
Usage for ScatterRegions:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--outputDir, -o | yes | no | Output directory |
--referenceFasta, -R | yes | no | Reference fasta file, (dict file should be next to it) |
--scatterSize, -s | no | no | Approximately scatter size, tool will make all scatters the same size. default = 1000000 |
--regions, -L | no | no | If given only regions in the given bed file will be used for scattering |
--notCombineContigs | no | no | If set each scatter can only contain 1 contig |
--maxContigsInScatterJob | no | no | If set each scatter can only contain 1 contig |
--bamFile | no | no | When given the regions will be scattered based on number of reads in the index file |
--notSplitContigs | no | no | When this option is set contigs are not split. |
ScatterRegions is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of ScatterRegions can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
ScatterRegions is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on ScatterRegions. We have had
good results with this IDE.
For any question related to ScatterRegions, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.