BamStats is a package that contains tools to generate stats from a BAM file, merge those stats for multiple samples, and validate the generated stats files.
Generate reports clipping stats, flag stats, insert size and mapping quality on a BAM file. It outputs a JSON file, but can optionally also output in TSV format.
The output of the JSON file is organized in a sample - library - readgroup tree structure.
If readgroups in the BAM file are not annotated with sample (SM
) and library (LB
) tags
an error will be thrown.
This can be fixed by using samtools addreplacerg
or picard AddOrReplaceReadGroups
.
This module will merge bamstats files together and keep the sample/library/readgroup structure. Values for the same readgroups will be added. It will also validate the resulting file.
Validates a BamStats file. If aggregation values can not be regenerated the file is considered corrupt. This should only happen when the file has been manually edited.
BamStats requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of BamStats here. To generate the usage run:
java -jar <BamStats_jar> --help
Generate requires a BAM file and an output directory for its stats. Optionally a reference fasta file can be added against which the BAM file will be validated. There is a flag to also output in TSV format.
Generate requires BAM files that have all the @RG
groups annotated with ID
, SM
and LB
otherwise
an error is thrown.
When merging the files BamStats will validate the input files and the output files. If aggregation values can not be regenerated the file is considered corrupt.
See example.
To generate stats from file.bam
:
java -jar <Generate_jar> \
-b file.bam \
-o output_dir
To generate stats from file.bam
, and output the result also as TSV:
java -jar <Generate_jar> \
-o output_dir \
-b file.bam \
--tsvOutputs
To generate stats from certain regions in file.bam
,
validate the regions and bam with reference.fa
and also include unmapped reads:
java -jar <Generate_jar> \
-R reference.fa \
-o output_dir \
-b file.bam \
--bedFile regions.bed
Merging multiple files and writing the results to an output file.
java -jar <BamStats_jar> merge \
-i <bamstats file> \
-i <bamstats file> \
-o <output file>
To validate a bamstats.json
file:
java -jar <BamStats_jar> validate \
-i <input file>
Usage for BamStats:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
toolName | no | no | Name of the tool to execute |
tool args | no | yes (unlimited) | Arguments for the tool |
BamStats is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of BamStats can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
BamStats is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on BamStats. We have had
good results with this IDE.
For any question related to BamStats, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.