Description

VepNormalizer modifies a VCF file annotated with the Variant Effect Predictor (VEP). Since the VEP does not use INFO fields to annotate, but rather puts all its annotations in one big string inside a "CSQ" INFO tag it is necessary to normalize it.

Installation

VepNormalizer requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of VepNormalizer here. To generate the usage run:

java -jar <VepNormalizer_jar> --help

Manual

This tool modifies a VCF file annotated with the Variant Effect Predictor (VEP). Since the VEP does not use INFO fields to annotate, but rather puts all its annotations in one big string inside a "CSQ" INFO tag it is necessary to normalize it.

The tool will parse the information in the CSQ header to create INFO fields for each annotation field. The tool has two modes: standard and explode.

The standard mode will produce a VCF according to the VCF specification. This means that every VEP INFO tag will consist of the comma-separated list of values for each transcript. In case the value is empty, the VEP INFO tag will not be shown for that specific record.

Mode explode will, on the other hand, create a new VCF record for each transcript it encounters. This, thus, means each VEP INFO tag will consist of a single value (if present at all). This can be useful if one must work on a per-transcript basis. Please note, however, that this means records may seem to be "duplicated".

The CSQ tag is by default removed from the output VCF file. If one wishes to retain it, one can set the --do-not-remove option.

Example

An input file, output file and mode are required. Optionally the CSQ tag can be kept.

java -jar <VepNormalizer_jar> \
-I input.vcf \
-O output.vcf \
-m standard \
--do-not-remove

Usage

Usage for VepNormalizer:

Option	Required	Can occur multiple times	Description
--log_level, -l	no	no	Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h	no	no	Print usage
--version, -v	no	no	Print version
--InputFile, -I	yes	no	Input VCF file. Required.
--OutputFile, -O	yes	no	Output VCF file. Required.
--mode, -m	yes	no	Mode. Can choose between (generates standard vcf) and (generates new record for each transcript). Required.
--do-not-remove	no	no	Do not remove CSQ tag. Optional

About

VepNormalizer is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of BIOPET pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of VepNormalizer can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

VepNormalizer is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on VepNormalizer. We have had good results with this IDE.

Contact

For any question related to VepNormalizer, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.