Description

Tool - Filter

This tool can filter a seattle seq file. A given bed file will only select variants inside this regions. Filtering on specific fields is also possible.

Tool - MergeGenes

This tool can merge gene counts from the filter step into 1 combined matrix. Genes that are not there will be filled with 0.

Tool - MultiFilter

This tool can filter a seattle seq file. A given bed file will only select variants inside this regions. Filtering on specific fields is also possible.

Installation

SeattleSeqKit requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of SeattleSeqKit here. To generate the usage run:

java -jar <SeattleSeqKit_jar> --help

Manual

Tool - Filter

The seattle files should have the columns 'chromosome', 'position' and 'geneList' to work. The gene output files are counted per gene and not per transcript. One variant can be counted twice here when the location is on more genes.

Tool - MergeGenes

The count files per sample are not required to have counts for all genes. Everything that is not share will become 0. The number of files is unlimited, more files only means more memory.

Tool - MultiFilter

The seattle files should have the columns 'chromosome', 'position' and 'geneList' to work. The gene output files are counted per gene and not per transcript. One variant can be counted twice here when the location is on more genes.

Example

Tool - Filter

Run with regions selection:

java -jar <Filter_jar> \
-i <input file> \
-o <output file> \
--intervals <bed file>

Run where a field should contain the given text:

java -jar <Filter_jar> \
-i <input file> \
-o <output file> \
--fieldMustContain <field>=<text>

Tool - MergeGenes

Default run to merge 3 samples:

java -jar <MergeGenes_jar> \
-i <sample1 key>=<gene count file> \
-i <sample2 key>=<gene count file> \
-i <sample3 key>=<gene count file> \
-o <output file>

Tool - MultiFilter

Run with regions selection:

java -jar <MultiFilter_jar> \
-i <sample>=<input file> \
-o <output dir> \
--intervals <sample>=<bed file>

Run where a field should contain the given text:

java -jar <MultiFilter_jar> \
-i <sample>=<input file> \
-o <output dir> \
--fieldMustContain <field>=<text>

Usage

Usage for SeattleSeqKit:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
toolName no no Name of the tool to execute
tool args no yes (unlimited) Arguments for the tool

Usage for Tool - Filter:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--inputFile, -i yes no Seattle seq input file
--outputFile, -o yes no Seattle seq output file
--geneColapseOutput no no Output file to count per gene hits
--intervals no no Intervals bed file
--fieldMustContain no no Field must contain given text
--fieldMustBeBelow no no Field must be below given numeric value
--fieldMustBeAbove no no Field must be below given numeric value

Usage for Tool - MergeGenes:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--inputFile, -i yes (2 required) yes (unlimited) Gene counts per sample
--outputFile, -o yes no Output merges genes counts

Usage for Tool - MultiFilter:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--inputFile, -i yes yes (unlimited) Seattle seq input file
--outputDir, -o yes no Output directory
--multiSampleTreshold no no Minimal number of samples per gene, default: 2
--geneColapseOutput no no Output file to count per gene hits
--intervals no yes (unlimited) Intervals bed file
--fieldMustContain no yes (unlimited) Field must contain given text
--fieldMustBeBelow no yes (unlimited) Field must be below given numeric value
--fieldMustBeAbove no yes (unlimited) Field must be below given numeric value

About

SeattleSeqKit is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of BIOPET pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of SeattleSeqKit can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

SeattleSeqKit is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on SeattleSeqKit. We have had good results with this IDE.

Contact

For any question related to SeattleSeqKit, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.