Running the pipeline¶

Requirements¶

Recommended hardware

CPU: >10 cores per sample
Memory: 6GB per core
Storage: >75GB per sample

Note: Running the pipeline with less resources may work, but has not been tested.

Software

python, version 3.9 or newer
pip3
virtuelenv
singularity

Nice to have

DRMAA compatible scheduler

Installation¶

A list of releases of Poirot can be found at: Releases.

Clone the Poirot git repo¶

We recommend that the repository is cloned to your working directory.

# Set up a working directory path
WORKING_DIRECTORY="/path_working_to_directory"

Fetch pipeline

# Set version
VERSION="v0.5.1"

# Clone selected version
git clone --branch ${VERSION} https://github.com/clinical-genomics-uppsala/poirot_rd_wgs.git ${WORKING_DIRECTORY}

Create python environment¶

To run the Poirot pipeline a python virtual environment is needed.

# Enter working directory
cd ${WORKING_DIRECTORY}

# Create a new virtual environment
python3 -m venv ${WORKING_DIRECTORY}/virtual/environment

Install pipeline requirements¶

Activate the virtual environment and install pipeline requirements specified in requirements.txt.

# Enter working directory
cd ${WORKING_DIRECTORY}

# Activate python environment
source environment/bin/activate

# Install requirements
pip install -r requirements.txt

Input sample files¶

The pipeline uses sample input files (samples.tsv and units.tsv) with information regarding sample information, sequencing meta information as well as the location of the fastq-files. Specification for the input files can be found at Poirot schemas. Using the python virtual environment created above it is possible to generate these files automatically using hydra-genetics create-input-files:

hydra-genetics create-input-files -d path/to/fastq-files/

Configuration¶

All the non-default parameter settings for the pipeline are set in the config.yaml file found under the config folder. A separate config_refs.yaml is also required to provide paths to all the required reference files. These can be specified in the profiles config or on the command line

Run command¶

Using the activated python virtual environment created above, this is a basic command for running the pipeline with a profile:

snakemake --profile profiles/NAME_OF_PROFILE -s workflow/Snakefile

If the configs are not given in the profile they can be specified on the command line:

snakemake --profile profiles/NAME_OF_PROFILE -s workflow/Snakefile --configfiles config/config.yaml config/config_refs.yaml

The are many additional snakemake running options some of which is listed below. However, options that are always used should be put in the profile.

--notemp - Saves all intermediate files. Good for development and testing different options.
--until - Runs only rules dependent on the specified rule.

Note: Remember to have singularity and drmaa available on the system where the pipeline will be run.

Running with VEP annotation¶

By default the config.yaml file is setup to not run VEP annotation of the SV vcf or SNV & Indels vcf. VEP annotation can be activated by changing vep_annotation=True in the conig or on the command line:

snakemake --profile profiles/NAME_OF_PROFILE -s workflow/Snakefile --config vep_annotation=True