This tool takes an input VCF file and outputs a VCF file with renamed contigs. For example chr1 -> 1. This can be useful in a pipeline where tools have different naming standards for contigs.
ReplaceContigsVcfFile requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of ReplaceContigsVcfFile here. To generate the usage run:
java -jar <ReplaceContigsVcfFile_jar> --help
ReplaceContigsVcfFile needs a reference fasta file and an input VCF file. The reference fasta is needed to validate the contigs. The renaming of contigs can be specified in a contig mapping file. The contig mapping file should be in the following format.
chr1 1;I;one
chr2 2;II;two
Any contigs found in the input VCF that have a contig name in the second column will be renamed with the contig name in the corresponding first column.
Alternatively, options can be specified on the command line. For example '1=chr1' will convert all contigs named '1' to 'chr1'.
Mappings are NOT case sensitive by default. If you need case sensitivity use the --caseSensitive
flag.
To convert the contig names in a vcf file with case sensitivity run:
java -jar <ReplaceContigsVcfFile_jar> \
-I input.vcf \
-o output.vcf \
-R reference.fasta \
--contigMappingFile contignames.tsv \
--caseSensitive
To convert the contig names using command line options, similar to the example contig mapping file given in the manual:
java -jar <ReplaceContigsVcfFile_jar> \
-I input.vcf \
-o output.vcf \
-R reference.fasta \
--contig 1=chr1 \
--contig I=chr1 \
--contig one=chr1 \
--contig 2=chr2 \
--contig II=chr2 \
--contig two=chr2
A contig mapping file and contigs can be used together:
java -jar <ReplaceContigsVcfFile_jar> \
-I input.vcf \
-o output.vcf \
-R reference.fasta \
--contigMappingFile contignames.tsv \
--contig 3=chr3 \
--contig III=chr3
Usage for ReplaceContigsVcfFile:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--input, -I | yes | no | Input vcf file |
--output, -o | yes | no | Output vcf file |
--referenceFile, -R | yes | no | Reference fasta file |
--contig | no | yes (unlimited) | Specify contig mappings on the command line. Example '1=chr1' will convert contig '1' to 'chr1' |
--contigMappingFile | no | no | File how to map contig names, first column is the new name, second column is semicolon separated list of alternative names |
--caseSensitive | no | no | If set the tool does not try to match case differences, example: chr1_gl000191_random will not match to chr1_GL000191_random |
ReplaceContigsVcfFile is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of ReplaceContigsVcfFile can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
ReplaceContigsVcfFile is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on ReplaceContigsVcfFile. We have had
good results with this IDE.
For any question related to ReplaceContigsVcfFile, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.