Description

Tools - ExtractTsv

This mean can extract samples, libraries and readgroups from a sample config file. This meant as a supporting tool inside wdl pipelines. It can also output a single layer as tsv file.

Tools - ReadFromTsv

This tool enables a user to create a full sample sheet in JSON format or YAML format, suitable for all Biopet Queue pipelines, from TSV file(s).

Tools - CromwellArrays

This tool will convert the sample configs to a array based format that can be used inside wdl pipelines. This tool is only to support biowdl pipelines.

Tools - CaseControl

This tool will extract the case-control pairs from a sample config file. It will read the headers of the bam files to confirm that samples do exist.

Installation

SampleConfig requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of SampleConfig here. To generate the usage run:

java -jar <SampleConfig_jar> --help

Manual

Tools - ExtractTsv

This tool can support multiple sample config files, this files will be merged into 1 large config. Depending if sample/library if given the list is given, see also the examples

Tools - ReadFromTsv

A user provides a TAB separated file (TSV) with sample specific properties which are parsed into JSON format by the tool. For example, a user wants to add certain properties to the description of a sample, such as the treatment a sample received. Then a TSV file with an extra column called treatment is provided. The resulting file will have the 'treatment' property in it as well. The order of the columns is not relevant to the end result

The tag files works the same only the value is prefixed in the key tags.

Tools - CromwellArrays

For this tool to work the sample yml / json should look like this: ` samples: sampleName: key1: value1 libraries: libraryName: key1: value1 readgroups: readgroupName: key1: value1 key2: value2`

Tools - CaseControl

Each that has a control need to have the control tag inside the config file. The tool will automatically finds the combination of bam file needed, for this the readgroups should be setup correctly.

Example

Tools - ExtractTsv

Extracting samples, list goes to stdout:

java -jar <SampleConfig_jar> ExtractTsv \
-i <input config>

Extracting libraries, list go to stdout:

java -jar <SampleConfig_jar> ExtractTsv \
-i <input config> \
--sample <sample name>

Extracting readgroups, list go to stdout:

java -jar <SampleConfig_jar> ExtractTsv \
-i <input config> \
--sample <sample name> \
--library <library name>

Tools - ReadFromTsv

Sample definition

To get the below example out of the tool one should provide 2 TSV files as follows:

sample library bam
Sample_ID_1 Lib_ID_1 MyFirst.bam
Sample_ID_2 Lib_ID_2 MySecond.bam

The second TSV file can contain as much properties as you would like. Possible option would be: gender, age and family. Basically anything you want to pass to your pipeline is possible.

sample treatment
Sample_ID_1 heatshock
Sample_ID_2 heatshock

Example

Yaml
samples:
  Sample_ID_1:
    treatment: heatshock
   libraries:
     Lib_ID_1:
        bam: MyFirst.bam
 Sample_ID_2:
    treatment: heatshock
   libraries:
      Lib_ID_2:
        bam: MySecond.bam
Json
{
  "samples" : {
    "Sample_ID_1" : {
      "treatment" : "heatshock",
      "libraries" : {
        "Lib_ID_1" : {
          "bam" : "MyFirst.bam"
        }
      }
    },
    "Sample_ID_2" : {
      "treatment" : "heatshock",
      "libraries" : {
        "Lib_ID_2" : {
          "bam" : "MySecond.bam"
        }
      }
    }
  }
}

Tools - CromwellArrays

Default run, output to stdout:

java -jar <SampleConfig_jar> CromwellArrays \
-i <input config>

Default run, output to file:

java -jar <SampleConfig_jar> CromwellArrays \
-i <input config> \
-o <output file>

Multiple configs:

java -jar <SampleConfig_jar> CromwellArrays \
-i <input config> \
-i <input config>

Tools - CaseControl

Default run, 2 bam files:

java -jar <SampleConfig_jar> CaseControl \
-i <bam file> \
-i <bam file> \
-s <sample config file> \
-o <output file>

Usage

Usage for SampleConfig:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
toolName no no Name of the tool to execute
tool args no yes (unlimited) Arguments for the tool

About

SampleConfig is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of SampleConfig can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

SampleConfig is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on SampleConfig. We have had good results with this IDE.

Contact

For any question related to SampleConfig, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.