This tool extracts reads from a BAM file based on alignment intervals. E.g if one is interested in a specific location this tool extracts the full reads from the location. The tool is also very useful to create test data sets.
ExtractAlignedFastq requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of ExtractAlignedFastq here. To generate the usage run:
java -jar <ExtractAlignedFastq_jar> --help
This tool creates FASTQ file(s) containing reads mapped to the given alignment intervals. A set of FASTQ files that was used in creating the BAM file is also required since this is used for retrieving full sequences of FASTQ records which map to the given region. This is useful since some of the records may have undergone modifications such as quality trimming before alignment. In this case, retrieving the aligned SAM records will only give the modified sequence.
To extract reads from myBam.bam
that originate from myFastq_r1.fasta
and align to chr5 at positions 100-200:
java -jar <ExtractAlignedFastq_jar> \
--input_file myBam.bam \
--in1 myFastq_R1.fastq \
--interval chr5:100-200 \
--out1 output.fastq
Usage for ExtractAlignedFastq:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--input_file, -I | yes | no | Input BAM file |
--interval, -r | yes | yes (unlimited) | Interval strings (e.g. chr1:1-100) |
--in1, -i | yes | no | Input FASTQ file 1 |
--in2, -j | no | no | Input FASTQ file 2 (default: none) |
--out1, -o | yes | no | Output FASTQ file 1 |
--out2, -p | no | no | Output FASTQ file 2 (default: none) |
--min_mapq, -Q | no | no | Minimum MAPQ of reads in target region to remove (default: 0) |
--read_suffix_length, -s | no | no | Length of suffix mark from each read pair (default: 0). This is used for distinguishing read pairs with different suffices. For example, if your FASTQ records end with `/1` for the first pair and `/2` for the second pair, the value of `read_suffix_length` should be 2." |
no | no | This tool creates FASTQ file(s) containing reads mapped to the given alignment intervals. |
ExtractAlignedFastq is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of ExtractAlignedFastq can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
ExtractAlignedFastq is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on ExtractAlignedFastq. We have had
good results with this IDE.
For any question related to ExtractAlignedFastq, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.