Description

BamStats is a package that contains tools to generate stats from a BAM file, merge those stats for multiple samples, and validate the generated stats files.

Mode - Generate

Generate reports clipping stats, flag stats, insert size and mapping quality on a BAM file. It outputs a JSON file, but can optionally also output in TSV format.

The output of the JSON file is organized in a sample - library - readgroup tree structure. If readgroups in the BAM file are not annotated with sample (SM) and library (LB) tags an error will be thrown. This can be fixed by using samtools addreplacerg or picard AddOrReplaceReadGroups.

Mode - Merge

This module will merge bamstats files together and keep the sample/library/readgroup structure. Values for the same readgroups will be added. It will also validate the resulting file.

Mode - Validate

Validates a BamStats file. If aggregation values can not be regenerated the file is considered corrupt. This should only happen when the file has been manually edited.

Installation

BamStats requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of BamStats here. To generate the usage run:

java -jar <BamStats_jar> --help

Manual

Mode - Generate

Generate requires a BAM file and an output directory for its stats. Optionally a reference fasta file can be added against which the BAM file will be validated. There is a flag to also output in TSV format.

Generate requires BAM files that have all the @RG groups annotated with ID, SM and LB otherwise an error is thrown.

Mode - Merge

When merging the files BamStats will validate the input files and the output files. If aggregation values can not be regenerated the file is considered corrupt.

Mode - Validate

See example.

Example

Mode - Generate

To generate stats from file.bam:

java -jar <Generate_jar> \
-b file.bam \
-o output_dir

To generate stats from file.bam, and output the result also as TSV:

java -jar <Generate_jar> \
-o output_dir \
-b file.bam \
--tsvOutputs

To generate stats from certain regions in file.bam, validate the regions and bam with reference.fa and also include unmapped reads:

java -jar <Generate_jar> \
-R reference.fa \
-o output_dir \
-b file.bam \
--bedFile regions.bed

Mode - Merge

Merging multiple files and writing the results to an output file.

java -jar <BamStats_jar> merge \
-i <bamstats file> \
-i <bamstats file> \
-o <output file>

Mode - Validate

To validate a bamstats.json file:

java -jar <BamStats_jar> validate \
-i <input file>

Usage

Usage for BamStats:

Option	Required	Can occur multiple times	Description
--log_level, -l	no	no	Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h	no	no	Print usage
--version, -v	no	no	Print version
toolName	no	no	Name of the tool to execute
tool args	no	yes (unlimited)	Arguments for the tool

About

BamStats is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of BamStats can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

BamStats is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on BamStats. We have had good results with this IDE.

Contact

For any question related to BamStats, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.