This tool downloads an assembly FASTA file from NCBI given an assembly report file. Columns can be filtered for regexes to exist or not exist. Contig name style can be selected.
DownloadNcbiAssembly requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.
Download the latest version of DownloadNcbiAssembly here. To generate the usage run:
java -jar <DownloadNcbiAssembly_jar> --help
DownloadNcbiAssembly requires an assembly report to download an assembly sequence. It will output the assembly in FASTA format. For filtering, check the usage for more details.
For downloading an assembly using the information from an assembly report:
java -jar <DownloadNcbiAssembly_jar> \
-a assemblyReport \
-o outputFile
For downloading an assembly and naming the contigs UCSC style:
java -jar <DownloadNcbiAssembly_jar> \
-a assemblyReport \
-o outputFile \
--nameHeader UCSC-style-name
Usage for DownloadNcbiAssembly:
Option | Required | Can occur multiple times | Description |
---|---|---|---|
--log_level, -l | no | no | Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error' |
--help, -h | no | no | Print usage |
--version, -v | no | no | Print version |
--assembly_report, -a | yes | no | refseq ID from NCBI |
--output, -o | yes | no | output Fasta file |
--report | no | no | where to write report from ncbi |
--nameHeader | no | no | What column to use from the NCBI report for the name of the contigs. All columns in the report can be used but this are the most common field to choose from: - 'Sequence-Name': Name of the contig within the assembly - 'UCSC-style-name': Name of the contig used by UCSC ( like hg19 ) - 'RefSeq-Accn': Unique name of the contig at RefSeq (default for NCBI) |
--mustHaveOne | no | yes (unlimited) | This can be used to filter based on the NCBI report, multiple conditions can be given, at least 1 should be true |
--mustNotHave | no | yes (unlimited) | This can be used to filter based on the NCBI report, multiple conditions can be given, all should be false |
DownloadNcbiAssembly is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of BIOPET pipelines.
All tools in the BIOPET tool suite are Free/Libre and Open Source Software.
The source code of DownloadNcbiAssembly can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.
DownloadNcbiAssembly is build using sbt. Before submitting a pull request, make sure all tests can be passed by
running sbt test
from the project's root. We recommend using an IDE to work on DownloadNcbiAssembly. We have had
good results with this IDE.
For any question related to DownloadNcbiAssembly, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.