Description

This tool takes an input GTF file and outputs a GTF file with renamed contigs. For example chr1 -> 1. This can be useful in a pipeline where tools have different naming standards for contigs.

Installation

ReplaceContigsGtfFile requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of ReplaceContigsGtfFile here. To generate the usage run:

java -jar <ReplaceContigsGtfFile_jar> --help

Manual

ReplaceContigsGtfFile needs a reference fasta file and an input GTF file. The reference fasta is needed to validate the contigs. The renaming of contigs can be specified in a contig mapping file. The contig mapping file should be in the following format.

chr1    1;I;one
chr2    2;II;two

Any contigs found in the input VCF that have a contig name in the second column will be renamed with the contig name in the corresponding first column.

Alternatively, options can be specified on the command line. For example '1=chr1' will convert all contigs named '1' to 'chr1'.

Mappings are NOT case sensitive by default. If you need case sensitivity use the --caseSensitive flag.

The output can also be a GFF file with the --writeAsGff flag.

Example

To convert the contig names in a gtf file with case sensitivity and output as GFF run:

java -jar <ReplaceContigsGtfFile_jar> \
-I input.gtf \
-o output.gtf \
-R reference.fasta \
--contigMappingFile contignames.tsv \
--caseSensitive \
--writeAsGff

To convert the contig names using command line options, similar to the example contig mapping file given in the manual:

java -jar <ReplaceContigsGtfFile_jar> \
-I input.gtf \
-o output.gtf \
-R reference.fasta \
--contig 1=chr1 \
--contig I=chr1 \
--contig one=chr1 \
--contig 2=chr2 \
--contig II=chr2 \
--contig two=chr2


A contig mapping file and contigs can be used together:

java -jar <ReplaceContigsGtfFile_jar> \
-I input.gtf \
-o output.gtf \
-R reference.fasta \
--contigMappingFile contignames.tsv \
--contig 3=chr3 \
--contig III=chr3

Usage

Usage for ReplaceContigsGtfFile:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--input, -I yes no Input GTF file
--output, -o yes no Output GTF file
--referenceFile, -R yes no Reference fasta file
--contig no yes (unlimited) Specify contig mappings on the command line. Example '1=chr1' will convert contig '1' to 'chr1'
--writeAsGff no no Write as GFF file instead of GTF file.
--contigMappingFile no no File how to map contig names, first column is the new name, second column is semicolon separated list of alternative names
--caseSensitive no no If set the tool does not try to match case differences, example: chr1_gl000191_random will not match to chr1_GL000191_random

About

ReplaceContigsGtfFile is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of ReplaceContigsGtfFile can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

ReplaceContigsGtfFile is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on ReplaceContigsGtfFile. We have had good results with this IDE.

Contact

For any question related to ReplaceContigsGtfFile, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.