This tool enables a user to extract a VCF file out a mpileup file generated from the BAM file using samtools mpileup, for instance. The tool can also stream through STDin so that it is not necessary to store the mpileup file on disk. Mpileup files can to be very large because they describe each covered base position in the genome on a per read basis, so it is not desired to store them.
MpileupToVcf requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of MpileupToVcf here. To generate the usage run:
java -jar <MpileupToVcf_jar> --help
MpileupToVcf comes with various options. See the usage for more details. The tool can stream from stdin or accept a mpileup file. An output file and the name of the sample are always required.
To convert a mpileup file to vcf from a haploid organism and an expected sequencing error rate of 0.010"
java -jar <MpileupToVcf_jar> \
-I input.mpileup \
-o output.vcf \
--sample Yeast5302 \
--ploidy 1 \
--seqError 0.010
To convert a mpileup directly from standard out:
samtools mpileup <bam> | java -jar <MpileupToVcf_jar> \
-o <output_vcf> \
--sample E.coli243
Usage for MpileupToVcf:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--input, -I | no | no | input, default is stdin |
--output, -o | yes | no | output file (required) |
--sample, -s | yes | no | Sample name in the vcf file |
--minDP | no | no | Minimal total depth |
--minAP | no | no | Minimal alternative depth |
--homoFraction | no | no | If alleles are above this fraction it's being seen as homozygous. Default if 0.8 |
--ploidy | no | no | Specify the ploidy as a number: '1' for haploid, '2' for diploid etc. |
--seqError | no | no | Expected sequencing error rate, default is 0.005 |
--refCalls | no | no | If set refcalls are also writen. Warning: This will results in a very large vcf file |
MpileupToVcf is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of BIOPET pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of MpileupToVcf can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
MpileupToVcf is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on MpileupToVcf. We have had
good results with this IDE.
For any question related to MpileupToVcf, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.