Description

This will will search for a combination of variants within a multi sample vcf file. The tool can filter on INFO fields and a maximum distance of the snps on the reference.

Installation

DigenicSearch requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of DigenicSearch here. To generate the usage run:

java -jar <DigenicSearch_jar> --help

Manual

Because of the number of possible combination this tool requires to run on a spark cluster. If required the tool can still run local by submitting the tool to a local master, see also https://spark.apache.org/docs/latest/submitting-applications.html#master-urls By default this tool runs on the complete genome but with the option --regions a bed file can be provided to limit the number of locations

Example

A default run:

java -jar <DigenicSearch_jar> \
-i <input vcf> \
-o <output dir> \
-R <reference fasta> \
-p <ped file

A run on limited locations:

java -jar <DigenicSearch_jar> \
-i <input vcf> \
-o <output dir> \
-R <reference fasta> \
--regions <bed file> \
-p <ped file

Usage

Usage for DigenicSearch:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--inputFile, -i yes no Input vcf files
--outputDir, -o yes no Output dir for the tool
--reference, -R yes no Reference fasta file to use, dict file should be next to it
--regions no no Only using this regions in the bed file
--aggregation no no Only using this aggregation in the bed file, the 4th column is used for aggregation
--pedFile, -p yes yes (unlimited) Input ped file for family relations and effected/non-effected
--usingOtherFamilies no no This option uses affected members from other families to check if the variant is correlated to the trait. If the variant in the other family is above the treshold fraction, then it is likely to be related to the trait. If the fraction of the variant is 0.0 in the members of the affected family then it is still possible it is related to the trait. If the fraction of the variant is between 0.0 and the threshold fraction, then the variant is probably not related to the trait and is filtered out.
--detectionMode no no Detection mode, possible values: Varant, Allele, Genotype
--singleAnnotationFilter no yes (unlimited) Filter on single variant
--pairAnnotationFilter no yes (unlimited) Filter on paired variant, must be true for 1 of the 2 in the pair
--singleAffectedFraction no no minimal affected fraction for each variant
--pairAffectedFraction no no minimal affected fraction for for at least 1 of the 2 variants
--singleUnaffectedFraction no no maximum unaffected fraction for for each variant
--pairUnaffectedFraction no no maximum unaffected fraction for for at least 1 of the 2 variants
--maxDistance no no maxDistance in base pairs. This option will make the assumption that both variants are on the same contig
--binSize no no Binsize in estimated base pairs
--maxContigsInSingleJob no yes (unlimited) Max number of bins to be combined, default is 250
--externalFile no yes (unlimited) External file used for filtering
--singleExternalFilter no yes (unlimited) Filter on paired variant, must be true for 1 of the 2 in the pair
--pairExternalFilter no yes (unlimited) Filter on paired variant, must be true for 1 of the 2 in the pair
--sparkMaster no no Spark master, default to local[1]
--onlyFamily no no Limit execution to a single family

About

DigenicSearch is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of BIOPET pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of DigenicSearch can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

DigenicSearch is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on DigenicSearch. We have had good results with this IDE.

Contact

For any question related to DigenicSearch, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.