Description

This tool takes an input VCF file and outputs a VCF file with renamed contigs. For example chr1 -> 1. This can be useful in a pipeline where tools have different naming standards for contigs.

Installation

ReplaceContigsVcfFile requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of ReplaceContigsVcfFile here. To generate the usage run:

java -jar <ReplaceContigsVcfFile_jar> --help

Manual

ReplaceContigsVcfFile needs a reference fasta file and an input VCF file. The reference fasta is needed to validate the contigs. The renaming of contigs can be specified in a contig mapping file. The contig mapping file should be in the following format.

chr1    1;I;one
chr2    2;II;two

Any contigs found in the input VCF that have a contig name in the second column will be renamed with the contig name in the corresponding first column.

Alternatively, options can be specified on the command line. For example '1=chr1' will convert all contigs named '1' to 'chr1'.

Mappings are NOT case sensitive by default. If you need case sensitivity use the --caseSensitive flag.

Example

To convert the contig names in a vcf file with case sensitivity run:

java -jar <ReplaceContigsVcfFile_jar> \
-I input.vcf \
-o output.vcf \
-R reference.fasta \
--contigMappingFile contignames.tsv \
--caseSensitive

To convert the contig names using command line options, similar to the example contig mapping file given in the manual:

java -jar <ReplaceContigsVcfFile_jar> \
-I input.vcf \
-o output.vcf \
-R reference.fasta \
--contig 1=chr1 \
--contig I=chr1 \
--contig one=chr1 \
--contig 2=chr2 \
--contig II=chr2 \
--contig two=chr2


A contig mapping file and contigs can be used together:

java -jar <ReplaceContigsVcfFile_jar> \
-I input.vcf \
-o output.vcf \
-R reference.fasta \
--contigMappingFile contignames.tsv \
--contig 3=chr3 \
--contig III=chr3

Usage

Usage for ReplaceContigsVcfFile:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--input, -I yes no Input vcf file
--output, -o yes no Output vcf file
--referenceFile, -R yes no Reference fasta file
--contig no yes (unlimited) Specify contig mappings on the command line. Example '1=chr1' will convert contig '1' to 'chr1'
--contigMappingFile no no File how to map contig names, first column is the new name, second column is semicolon separated list of alternative names
--caseSensitive no no If set the tool does not try to match case differences, example: chr1_gl000191_random will not match to chr1_GL000191_random

About

ReplaceContigsVcfFile is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of BIOPET pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of ReplaceContigsVcfFile can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

ReplaceContigsVcfFile is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on ReplaceContigsVcfFile. We have had good results with this IDE.

Contact

For any question related to ReplaceContigsVcfFile, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.