VepNormalizer modifies a VCF file annotated with the Variant Effect Predictor (VEP). Since the VEP does not use INFO fields to annotate, but rather puts all its annotations in one big string inside a "CSQ" INFO tag it is necessary to normalize it.
VepNormalizer requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of VepNormalizer here. To generate the usage run:
java -jar <VepNormalizer_jar> --help
This tool modifies a VCF file annotated with the Variant Effect Predictor (VEP). Since the VEP does not use INFO fields to annotate, but rather puts all its annotations in one big string inside a "CSQ" INFO tag it is necessary to normalize it.
The tool will parse the information in the CSQ header to create INFO fields for each annotation field. The tool has two
modes: standard
and explode
.
The standard
mode will produce a VCF according to the VCF specification. This means that every VEP INFO tag will
consist of the comma-separated list of values for each transcript. In case the value is empty, the VEP INFO tag will
not be shown for that specific record.
Mode explode
will, on the other hand, create a new VCF record for each transcript it encounters. This, thus, means
each VEP INFO tag will consist of a single value (if present at all). This can be useful if one must work on a
per-transcript basis. Please note, however, that this means records may seem to be "duplicated".
The CSQ tag is by default removed from the output VCF file. If one wishes to retain it, one can set the
--do-not-remove
option.
An input file, output file and mode are required. Optionally the CSQ tag can be kept.
java -jar <VepNormalizer_jar> \
-I input.vcf \
-O output.vcf \
-m standard \
--do-not-remove
Usage for VepNormalizer:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--InputFile, -I | yes | no | Input VCF file. Required. |
--OutputFile, -O | yes | no | Output VCF file. Required. |
--mode, -m | yes | no | Mode. Can choose between |
--do-not-remove | no | no | Do not remove CSQ tag. Optional |
VepNormalizer is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of VepNormalizer can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
VepNormalizer is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on VepNormalizer. We have had
good results with this IDE.
For any question related to VepNormalizer, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.