Description

FindOverlapMatch looks for overlapping samples in a sample matrix. An example of a tool that creates such a matrix is VcfStats.

It compares samples and lists similar samples based on a cutoff point. It can also check if columns in a sample matrix match a certain regex.

Installation

FindOverlapMatch requires Java 8 to be installed on your device. Download Java 8 here or install via your distribution's package manager.

Download the latest version of FindOverlapMatch here. To generate the usage run:

java -jar <FindOverlapMatch_jar> --help

Manual

Input can be a text file like the following input file:

          sample1   sample2 sample3
sample1 1.0     0.5     0.9
sample2 0.5     1.0     0.5
sample3 0.9     0.5     1.0

Example

To check above example with threshold 0.9:

java -jar <FindOverlapMatch_jar> \
-i input.txt \
-c 0.9 \
-o output.txt

Will yield the following file:

sample1 (sample3,0.9)
sample2
sample3 (sample1,0.9)

With --use_same_names set it should be:

sample1 (sample1,1.0)   (sample3,0.9)
sample2 (sample2,1.0)
sample3 (sample1,0.9)   (sample3,1.0)

Usage

Usage for FindOverlapMatch:

Option Required Can occur multiple times Description
--log_level, -l no no Level of log information printed. Possible levels: 'debug', 'info', 'warn', 'error'
--help, -h no no Print usage
--version, -v no no Print version
--input, -i yes no Input should be a table where the first row and column have the ID's, those can be different
--shouldMatchRegexFile no no File with regexes what should be the correct matches. first column is the row samples regex, second column the column regex. When no second column given first column is used.
--output, -o no no default to stdout
--cutoff, -c yes no minimum value to report it as pair
--use_same_names no no Do not compare samples with the same name
--showBestMatch no no Show best match, even when it's below cutoff
--rowSampleRegex no no Samples in the row should match this regex
--columnSampleRegex no no Samples in the column should match this regex

About

FindOverlapMatch is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contributing

The source code of FindOverlapMatch can be found here. We welcome any contributions. Bug reports, feature requests and feedback can be submitted at our issue tracker.

FindOverlapMatch is build using sbt. Before submitting a pull request, make sure all tests can be passed by running sbt test from the project's root. We recommend using an IDE to work on FindOverlapMatch. We have had good results with this IDE.

Contact

For any question related to FindOverlapMatch, please use the github issue tracker or contact the SASC team directly at: sasc@lumc.nl.