This tool can filter a seattle seq file. A given bed file will only select variants inside this regions. Filtering on specific fields is also possible.
This tool can merge gene counts from the filter step into 1 combined matrix. Genes that are not there will be filled with 0.
This tool can filter a seattle seq file. A given bed file will only select variants inside this regions. Filtering on specific fields is also possible.
SeattleSeqKit requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of SeattleSeqKit here. To generate the usage run:
java -jar <SeattleSeqKit_jar> --help
The seattle files should have the columns 'chromosome', 'position' and 'geneList' to work. The gene output files are counted per gene and not per transcript. One variant can be counted twice here when the location is on more genes.
The count files per sample are not required to have counts for all genes. Everything that is not share will become 0. The number of files is unlimited, more files only means more memory.
The seattle files should have the columns 'chromosome', 'position' and 'geneList' to work. The gene output files are counted per gene and not per transcript. One variant can be counted twice here when the location is on more genes.
Run with regions selection:
java -jar <Filter_jar> \
-i <input file> \
-o <output file> \
--intervals <bed file>
Run where a field should contain the given text:
java -jar <Filter_jar> \
-i <input file> \
-o <output file> \
--fieldMustContain <field>=<text>
Default run to merge 3 samples:
java -jar <MergeGenes_jar> \
-i <sample1 key>=<gene count file> \
-i <sample2 key>=<gene count file> \
-i <sample3 key>=<gene count file> \
-o <output file>
Run with regions selection:
java -jar <MultiFilter_jar> \
-i <sample>=<input file> \
-o <output dir> \
--intervals <sample>=<bed file>
Run where a field should contain the given text:
java -jar <MultiFilter_jar> \
-i <sample>=<input file> \
-o <output dir> \
--fieldMustContain <field>=<text>
Usage for SeattleSeqKit:
| Option | Required | Can occur multiple times | Description |
|---|---|---|---|
| --log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
| --help, -h | no | no | Print usage |
| --version, -v | no | no | Print version |
| toolName | no | no | Name of the tool to execute |
| tool args | no | yes (unlimited) | Arguments for the tool |
| Option | Required | Can occur multiple times | Description |
|---|---|---|---|
| --log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
| --help, -h | no | no | Print usage |
| --version, -v | no | no | Print version |
| --inputFile, -i | yes | no | Seattle seq input file |
| --outputFile, -o | yes | no | Seattle seq output file |
| --geneColapseOutput | no | no | Output file to count per gene hits |
| --intervals | no | no | Intervals bed file |
| --fieldMustContain | no | no | Field must contain given text |
| --fieldMustBeBelow | no | no | Field must be below given numeric value |
| --fieldMustBeAbove | no | no | Field must be below given numeric value |
| Option | Required | Can occur multiple times | Description |
|---|---|---|---|
| --log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
| --help, -h | no | no | Print usage |
| --version, -v | no | no | Print version |
| --inputFile, -i | yes (2 required) | yes (unlimited) | Gene counts per sample |
| --outputFile, -o | yes | no | Output merges genes counts |
| Option | Required | Can occur multiple times | Description |
|---|---|---|---|
| --log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
| --help, -h | no | no | Print usage |
| --version, -v | no | no | Print version |
| --inputFile, -i | yes | yes (unlimited) | Seattle seq input file |
| --outputDir, -o | yes | no | Output directory |
| --multiSampleTreshold | no | no | Minimal number of samples per gene, default: 2 |
| --geneColapseOutput | no | no | Output file to count per gene hits |
| --intervals | no | yes (unlimited) | Intervals bed file |
| --fieldMustContain | no | yes (unlimited) | Field must contain given text |
| --fieldMustBeBelow | no | yes (unlimited) | Field must be below given numeric value |
| --fieldMustBeAbove | no | yes (unlimited) | Field must be below given numeric value |
SeattleSeqKit is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of BIOPET pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of SeattleSeqKit can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
SeattleSeqKit is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test from the project's root. We recommend using an IDE to work on SeattleSeqKit. We have had
good results with this IDE.
For any question related to SeattleSeqKit, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.