This will will search for a combination of variants within a multi sample vcf file. The tool can filter on INFO fields and a maximum distance of the snps on the reference.
DigenicSearch requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of DigenicSearch here. To generate the usage run:
java -jar <DigenicSearch_jar> --help
Because of the number of possible combination this tool requires to run on a spark cluster. If required the tool can still run local by submitting the tool to a local master, see also https://spark.apache.org/docs/latest/submitting-applications.html#master-urls By default this tool runs on the complete genome but with the option --regions a bed file can be provided to limit the number of locations
A default run:
java -jar <DigenicSearch_jar> \
-i <input vcf> \
-o <output dir> \
-R <reference fasta> \
-p <ped file
A run on limited locations:
java -jar <DigenicSearch_jar> \
-i <input vcf> \
-o <output dir> \
-R <reference fasta> \
--regions <bed file> \
-p <ped file
Usage for DigenicSearch:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--inputFile, -i | yes | no | Input vcf files |
--outputDir, -o | yes | no | Output dir for the tool |
--reference, -R | yes | no | Reference fasta file to use, dict file should be next to it |
--regions | no | no | Only using this regions in the bed file |
--aggregation | no | no | Only using this aggregation in the bed file, the 4th column is used for aggregation |
--pedFile, -p | yes | yes (unlimited) | Input ped file for family relations and effected/non-effected |
--usingOtherFamilies | no | no | This option uses affected members from other families to check if the variant is correlated to the trait. If the variant in the other family is above the treshold fraction, then it is likely to be related to the trait. If the fraction of the variant is 0.0 in the members of the affected family then it is still possible it is related to the trait. If the fraction of the variant is between 0.0 and the threshold fraction, then the variant is probably not related to the trait and is filtered out. |
--detectionMode | no | no | Detection mode, possible values: Varant, Allele, Genotype |
--singleAnnotationFilter | no | yes (unlimited) | Filter on single variant |
--pairAnnotationFilter | no | yes (unlimited) | Filter on paired variant, must be true for 1 of the 2 in the pair |
--singleAffectedFraction | no | no | minimal affected fraction for each variant |
--pairAffectedFraction | no | no | minimal affected fraction for for at least 1 of the 2 variants |
--singleUnaffectedFraction | no | no | maximum unaffected fraction for for each variant |
--pairUnaffectedFraction | no | no | maximum unaffected fraction for for at least 1 of the 2 variants |
--maxDistance | no | no | maxDistance in base pairs. This option will make the assumption that both variants are on the same contig |
--binSize | no | no | Binsize in estimated base pairs |
--maxContigsInSingleJob | no | yes (unlimited) | Max number of bins to be combined, default is 250 |
--externalFile | no | yes (unlimited) | External file used for filtering |
--singleExternalFilter | no | yes (unlimited) | Filter on paired variant, must be true for 1 of the 2 in the pair |
--pairExternalFilter | no | yes (unlimited) | Filter on paired variant, must be true for 1 of the 2 in the pair |
--sparkMaster | no | no | Spark master, default to local[1] |
--onlyFamily | no | no | Limit execution to a single family |
DigenicSearch is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of BIOPET pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of DigenicSearch can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
DigenicSearch is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on DigenicSearch. We have had
good results with this IDE.
For any question related to DigenicSearch, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.