Rules specific to Poirot

create_cov_excel

Script that creates the gene panel coverage excel file for poirot.

🐍 Rule

rule create_cov_excel:
    input:
        bedfile=config["reference"]["coverage_bed"],
        cov_regions="qc/mosdepth_bed/{sample}_{type}.regions.bed.gz",
        cov_thresh="qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz",
        duplication_file="qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt",
        genepanels=config["reference"]["genepanels"],
        low_cov="qc/mosdepth_bed/{sample}_{type}.mosdepth.lowCov.regions.txt",
        summary="qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt",
    output:
        out=temp("qc/create_cov_excel/{sample}_{type}.coverage.xlsx"),
    log:
        "qc/create_cov_excel/{sample}_{type}.log",
    benchmark:
        repeat(
            "qc/create_cov_excel/create_cov_excel_{sample}_{type}.benchmark.tsv",
            config.get("create_cov_excel", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("create_cov_excel", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("create_cov_excel", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("create_cov_excel", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("create_cov_excel", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("create_cov_excel", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("create_cov_excel", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("create_cov_excel", {}).get("container", config["default_container"])
    message:
        "{rule}: Get coverage analysis per gene into excel, with tab for each panel and one for all genes in bed"
    script:
        "../scripts/create_excel.py"

↔ input / output files


deepvariant_add_ref

Add the reference genome path to the deepvariant vcf header

🐍 Rule

rule deepvariant_add_ref:
    input:
        vcf="snv_indels/vcf_final/{sample}_{type}.fix_af.vcf.gz",
        ref=config["reference"]["fasta"],
    output:
        vcf=temp("snv_indels/vcf_final/{sample}_{type}_ref.vcf"),
    log:
        "snv_indels/vcf_final/{sample}_{type}_ref.log",
    benchmark:
        repeat(
            "snv_indels/vcf_final/{sample}_{type}_ref.vcf.benchmark.tsv",
            config.get("deepvariant_add_ref", {}).get("benchmark_repeats", 1),
        )
    resources:
        mem_mb=config.get("deepvariant_add_ref", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("deepvariant_add_ref", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("deepvariant_add_ref", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("deepvariant_add_ref", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("deepvariant_add_ref", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("deepvariant_add_ref", {}).get("container", config["default_container"])
    message:
        "{rule}: Add reference to the header of the deepvariant vcf: {input.vcf}"
    script:
        "../scripts/ref_vcf.py"

↔ input / output files

| Rule parameters | Key | Value | Description | | --- | --- | --- | --- | | input | vcf |"snv_indels/vcf_final/{sample}{type}.fix_af.vcf.gz"| deepvariant vcf where reference genome version should be added to VCF header | | _ _ | ref |config["reference"]["fasta"]| The fasta reference used. | | output | vcf |"snv_indels/vcf_final/{sample}_ref.vcf"| deepvariant vcf to which the reference genome version has been added to vcf header |

🔧 Configuration

Software settings (config.yaml)

| Key | Type | Description | | --- | --- | --- | | benchmark_repeats | integer | set number of times benchmark should be repeated | | container | string | name or path to docker/singularity container | | extra | string | parameters that should be forwarded |

Resources settings (resources.yaml)

| Key | Type | Description | | --- | --- | --- | | mem_mb | integer | max memory in MB to be available | | mem_per_cpu | integer | memory in MB used per cpu | | partition | string | partition to use on cluster | | threads | integer | number of threads to be available | | time | string | max execution time |


[filter_par_dups]

A custom python script to filter DUP calls in male sample chrX PAR regions in cnvpytor vcf files.

🐍 Rule

rule filter_par_dups:
    input:
        vcf=get_cnvpytor_male_input,
        bed=config["filter_par_dups"]["bed"],
    output:
        vcf="cnv_sv/cnvpytor/{sample}_{type}.par_dups_filtered.vcf.gz",
    params:
        extra=config.get("filter_par_dups", {}).get("extra", ""),
    log:
        "cnv_sv/cnvpytor/{sample}_{type}.par_dups_filtered.vcf.gz.log",
    benchmark:
        repeat(
            "cnv_sv/cnvpytor/{sample}_{type}.par_dups_filtered.vcf.gz.benchmark.tsv",
            config.get("filter_par_dups", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("filter_par_dups", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("filter_par_dups", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("filter_par_dups", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("filter_par_dups", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("filter_par_dups", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("filter_par_dups", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("filter_par_dups", {}).get("container", config["default_container"])
    message:
        "{rule}: filter cnvpytor DUP calls in {input.vcf} located in for {input.bed}"
    script:
        "../scripts/filter_bed_cnvs.py"

↔ input / output files

| Rule parameters | Key | Value | Description | | --- | --- | --- | --- | | input | vcf |get_cnvpytor_male_input| cnvpytor vcf to which the filter for par dups should be applied | | _ _ | bed |config["filter_par_dups"]["bed"]| bed file with par dup regions to be filtered out | | output | vcf |"cnv_sv/cnvpytor/{sample}_{type}.par_dups_filtered.vcf.gz"| filtered cnvpytor vcf with par dups filtered out |

🔧 Configuration

Software settings (config.yaml)

| Key | Type | Description | | --- | --- | --- | | benchmark_repeats | integer | set number of times benchmark should be repeated | | container | string | name or path to docker/singularity container | | extra | string | parameters that should be forwarded |

Resources settings (resources.yaml)

| Key | Type | Description | | --- | --- | --- | | mem_mb | integer | max memory in MB to be available | | mem_per_cpu | integer | memory in MB used per cpu | | partition | string | partition to use on cluster | | threads | integer | number of threads to be available | | time | string | max execution time |


fix_mt_gt

A script that postprocesses the GATK mitochondrial normalised mutect2 VCF. It looks for GT fields that have more than two entries (e.g. '0/././1, or '0/1/./.' etc) and converts them to '0/1' as some tools can not parse the vcf when the GT field has missing alleles and has more than two allele fields.

🐍 Rule

rule fix_mt_gt:
    input:
        vcf="mitochondrial/gatk_select_variants_final/{sample}_{type}.fix_af.vcf",
    output:
        vcf="mitochondrial/gatk_select_variants_final/{sample}_{type}.fix_gt.vcf",
    params:
        extra=config.get("fix_mt_gt", {}).get("extra", ""),
    log:
        "mitochondrial/gatk_select_variants_final/{sample}_{type}.fix_af.gt_fixed.vcf",
    benchmark:
        repeat(
            "mitochondrial/gatk_select_variants_final/{sample}_{type}.fix_af.gt_fixed.vcf.benchmark.tsv",
            config.get("fix_mt_gt", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("fix_mt_gt", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("fix_mt_gt", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("fix_mt_gt", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("fix_mt_gt", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("fix_mt_gt", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("fix_mt_gt", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("fix_mt_gt", {}).get("container", config["default_container"])
    message:
        "{rule}: fix GT fields with >2 alleles in {input.vcf} after GATK multiallelic splitting (e.g., '0/././1')."
    script:
        "../scripts/fix_mt_gt.py"

↔ input / output files

| Rule parameters | Key | Value | Description | | --- | --- | --- | --- | | input | vcf |"mitochondrial/gatk_select_variants_final/{sample}{type}.fix_af.vcf"| filtered mutect2 vcf with mitochondrial variant calls | | output | vcf |"mitochondrial/gatk_select_variants_final/{sample}.fix_gt.vcf"| VCF with MT genotypes at split multiallelic sites fixed |

🔧 Configuration

Software settings (config.yaml)

| Key | Type | Description | | --- | --- | --- | | benchmark_repeats | integer | set number of times benchmark should be repeated | | container | string | name or path to docker/singularity container | | extra | string | parameters that should be forwarded |

Resources settings (resources.yaml)

| Key | Type | Description | | --- | --- | --- | | mem_mb | integer | max memory in MB to be available | | mem_per_cpu | integer | memory in MB used per cpu | | partition | string | partition to use on cluster | | threads | integer | number of threads to be available | | time | string | max execution time |


svdb_add_ref

Add the reference genome path to svdb merge vcf header

🐍 Rule

rule svdb_add_ref:
    input:
        vcf="cnv_sv/svdb_query/{sample}_{type}.merged.svdb_query.vcf.gz",
        ref=config["reference"]["fasta"],
    output:
        vcf="cnv_sv/svdb_query/{sample}_{type}.merged.svdb_query_ref.vcf",
    params:
        extra=config.get("svdb_add_ref", {}).get("extra", ""),
    log:
        "cnv_sv/svdb_query/{sample}_{type}.add_ref.log",
    benchmark:
        repeat(
            "cnv_sv/svdb_merge/{sample}_{type}.output.benchmark.tsv", config.get("svdb_add_ref", {}).get("benchmark_repeats", 1)
        )
    threads: config.get("svdb_add_ref", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("svdb_add_ref", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("svdb_add_ref", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("svdb_add_ref", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("svdb_add_ref", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("svdb_add_ref", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("svdb_add_ref", {}).get("container", config["default_container"])
    message:
        "{rule}: Add reference to the header of the svdb vcf: {input.vcf}"
    script:
        "../scripts/ref_vcf.py"

↔ input / output files

| Rule parameters | Key | Value | Description | | --- | --- | --- | --- | | input | vcf |"cnv_sv/svdb_query/{sample}{type}.merged.svdb_query.vcf.gz"| SVDB merged vcf to which the reference genome version should be added to vcf header. | | _ _ | ref |config["reference"]["fasta"]| The fasta reference used | | output | vcf |"cnv_sv/svdb_query/{sample}.merged.svdb_query_ref.vcf"| SVDB merged vcf where the reference genome version has been added to vcf header. |

🔧 Configuration

Software settings (config.yaml)

| Key | Type | Description | | --- | --- | --- | | benchmark_repeats | integer | set number of times benchmark should be repeated | | container | string | name or path to docker/singularity container | | extra | string | parameters that should be forwarded |

Resources settings (resources.yaml)

| Key | Type | Description | | --- | --- | --- | | mem_mb | integer | max memory in MB to be available | | mem_per_cpu | integer | memory in MB used per cpu | | partition | string | partition to use on cluster | | threads | integer | number of threads to be available | | time | string | max execution time |


tiddit_add_ref

Add the reference genome path to tiddit vcf header

🐍 Rule

rule tiddit_add_ref:
    input:
        vcf="cnv_sv/tiddit/{sample}_{type}.vcf.gz",
        ref=config["reference"]["fasta"],
    output:
        vcf="cnv_sv/tiddit/{sample}_{type}_ref.vcf",
    params:
        extra=config.get("tiddit_add_ref", {}).get("extra", ""),
    log:
        "cnv_sv/tiddit/{sample}_{type}.add_ref.log",
    benchmark:
        repeat(
            "cnv_sv/tiddit/{sample}_{type}.add_ref.benchmark.tsv",
            config.get("tiddit_add_ref", {}).get("benchmark_repeats", 1),
        )
    threads: config.get("tiddit_add_ref", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("tiddit_add_ref", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("tiddit_add_ref", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("tiddit_add_ref", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("tiddit_add_ref", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("tiddit_add_ref", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("tiddit_add_ref", {}).get("container", config["default_container"])
    message:
        "{rule}: Add reference to the header of the tiddit vcf: {input.vcf}"
    script:
        "../scripts/ref_vcf.py"

↔ input / output files

| Rule parameters | Key | Value | Description | | --- | --- | --- | --- | | input | vcf |"cnv_sv/tiddit/{sample}{type}.vcf.gz"| Tiddit vcf to which the reference genome version should be added to vcf header. | | _ _ | ref |config["reference"]["fasta"]| The fasta reference used. | | output | vcf |"cnv_sv/tiddit/{sample}_ref.vcf"| Tiddit vcf where the reference genome version has been added to vcf header. |

🔧 Configuration

Software settings (config.yaml)

| Key | Type | Description | | --- | --- | --- | | benchmark_repeats | integer | set number of times benchmark should be repeated | | container | string | name or path to docker/singularity container | | extra | string | parameters that should be forwarded |

Resources settings (resources.yaml)

| Key | Type | Description | | --- | --- | --- | | mem_mb | integer | max memory in MB to be available | | mem_per_cpu | integer | memory in MB used per cpu | | partition | string | partition to use on cluster | | threads | integer | number of threads to be available | | time | string | max execution time |


vcf_to_aed

Conversion of cnvpytor vcf to AED file format. The AED file can be read by Chromosome Analysis Suite (ChAS).

🐍 Rule

rule vcf_to_aed:
    input:
        vcf="cnv_sv/cnvpytor/{sample}_{type}.vcf",
    output:
        aed="cnv_sv/cnvpytor/{sample}_{type}.aed",
    params:
        extra=config.get("vcf_to_aed", {}).get("extra", ""),
    log:
        "cnv_sv/cnvpytor/{sample}_{type}.aed.log",
    benchmark:
        repeat("cnv_sv/cnvpytor/{sample}_{type}.aed.benchmark.tsv", config.get("vcf_to_aed", {}).get("benchmark_repeats", 1))
    threads: config.get("vcf_to_aed", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("vcf_to_aed", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("vcf_to_aed", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("vcf_to_aed", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("vcf_to_aed", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("vcf_to_aed", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("vcf_to_aed", {}).get("container", config["default_container"])
    message:
        "{rule}: convert {input.vcf} to AED format"
    script:
        "../scripts/cnvpytor_vcf_to_aed.py"

↔ input / output files

| Rule parameters | Key | Value | Description | | --- | --- | --- | --- | | input | vcf |"cnv_sv/cnvpytor/{sample}{type}.vcf"| VCF with CNVpytor calls | | output | aed |"cnv_sv/cnvpytor/{sample}.aed"| CNVpytor calls in Affymetrix Extensible Data format. |

🔧 Configuration

Software settings (config.yaml)

| Key | Type | Description | | --- | --- | --- | | benchmark_repeats | integer | set number of times benchmark should be repeated | | container | string | name or path to docker/singularity container | | extra | string | parameters that should be forwarded |

Resources settings (resources.yaml)

| Key | Type | Description | | --- | --- | --- | | mem_mb | integer | max memory in MB to be available | | mem_per_cpu | integer | memory in MB used per cpu | | partition | string | partition to use on cluster | | threads | integer | number of threads to be available | | time | string | max execution time |


vcf_to_aed_filtered

Conversion of the filtered cnvpytor vcf to AED file format. The AED file can be read by Chromosome Analysis Suite (ChAS).

🐍 Rule

rule vcf_to_aed_filtered:
    input:
        vcf="cnv_sv/cnvpytor/{sample}_{type}.hardfiltered.vcf",
    output:
        aed="cnv_sv/cnvpytor/{sample}_{type}_filtered.aed",
    params:
        extra=config.get("vcf_to_aed", {}).get("extra", ""),
    log:
        "cnv_sv/cnvpytor/{sample}_{type}.aed.log",
    benchmark:
        repeat("cnv_sv/cnvpytor/{sample}_{type}.aed.benchmark.tsv", config.get("vcf_to_aed", {}).get("benchmark_repeats", 1))
    threads: config.get("vcf_to_aed", {}).get("threads", config["default_resources"]["threads"])
    resources:
        mem_mb=config.get("vcf_to_aed", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("vcf_to_aed", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("vcf_to_aed", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("vcf_to_aed", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("vcf_to_aed", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("vcf_to_aed", {}).get("container", config["default_container"])
    message:
        "{rule}: convert {input.vcf} to AED format"
    script:
        "../scripts/cnvpytor_vcf_to_aed.py"

↔ input / output files

| Rule parameters | Key | Value | Description | | --- | --- | --- | --- | | input | vcf |"cnv_sv/cnvpytor/{sample}{type}.hardfiltered.vcf"| VCF with filtered CNVpytor calls. | | output | aed |"cnv_sv/cnvpytor/{sample}_filtered.aed"| Filtered CNVpytor calls in Affymetrix Extensible Data format. |

🔧 Configuration

Software settings (config.yaml)

| Key | Type | Description | | --- | --- | --- | | benchmark_repeats | integer | set number of times benchmark should be repeated | | container | string | name or path to docker/singularity container | | extra | string | parameters that should be forwarded |

Resources settings (resources.yaml)

| Key | Type | Description | | --- | --- | --- | | mem_mb | integer | max memory in MB to be available | | mem_per_cpu | integer | memory in MB used per cpu | | partition | string | partition to use on cluster | | threads | integer | number of threads to be available | | time | string | max execution time |


create_somalier_mqc_tsv

Create MultiQC custom content TSV files from Somalier output. This script processes somalier relatedness and sex check data to create custom tables similar to Peddy tables, with Pass/Fail QC checks.

🐍 Rule

rule create_somalier_mqc_tsv:
    input:
        pairs="qc/somalier_trio/somalier_relate.pairs.tsv",
        samples="qc/somalier_trio/somalier_relate.samples.tsv",
        ped="qc/somalier_trio/somalier_all.ped",
    output:
        rel_check_mqc=temp("qc/somalier_trio/somalier_rel_check_mqc.tsv"),
        sex_check_mqc=temp("qc/somalier_trio/somalier_sex_check_mqc.tsv"),
        general_stats_mqc=temp("qc/somalier_trio/somalier_general_stats_mqc.tsv"),
    params:
        script=f"{workflow.basedir}/scripts/create_somalier_mqc_config.py",
        mqc_config=config.get("somalier_trio_mqc", {}).get("mqc_config", ""),
        config_arg=lambda w, params: (f"--config {params.mqc_config}" if params.mqc_config else ""),
    log:
        "qc/somalier_trio_mqc/somalier_mqc.log",
    benchmark:
        repeat(
            "qc/somalier_trio/create_somalier_mqc_tsv.benchmark.tsv",
            config.get("create_somalier_mqc_tsv", {}).get("benchmark_repeats", 1),
        )
    resources:
        mem_mb=config.get("create_somalier_mqc_tsv", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
        mem_per_cpu=config.get("create_somalier_mqc_tsv", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
        partition=config.get("create_somalier_mqc_tsv", {}).get("partition", config["default_resources"]["partition"]),
        threads=config.get("create_somalier_mqc_tsv", {}).get("threads", config["default_resources"]["threads"]),
        time=config.get("create_somalier_mqc_tsv", {}).get("time", config["default_resources"]["time"]),
    container:
        config.get("create_somalier_mqc_tsv", {}).get("container", config["default_container"])
    message:
        "{rule}: Create multiqc custom content embedded config tsv files from somalier sex_check and relatedness files"
    shell:
        """
        exec &> {log}
        set -ex

        echo "Starting Somalier MultiQC TSV creation"

        python3 {params.script} \
            --pairs {input.pairs} \
            --samples {input.samples} \
            --ped {input.ped} \
            {params.config_arg} \
            --rel-check-mqc {output.rel_check_mqc} \
            --sex-check-mqc {output.sex_check_mqc} \
            --general-stats-mqc {output.general_stats_mqc}
        """

↔ input / output files

| Rule parameters | Key | Value | Description | | --- | --- | --- | --- | | input | pairs |"qc/somalier_trio/somalier_relate.pairs.tsv"| somalier pairs.tsv file with relatedness data | | | samples |"qc/somalier_trio/somalier_relate.samples.tsv"| somalier samples.tsv file with sex check data | | _ _ | ped |"qc/somalier_trio/somalier_all.ped"| pedigree file with family information | | output | rel_check_mqc |"qc/somalier_trio/somalier_rel_check_mqc.tsv"| MultiQC custom content TSV for relatedness check | | | sex_check_mqc |"qc/somalier_trio/somalier_sex_check_mqc.tsv"| MultiQC custom content TSV for sex check | | _ _ | general_stats_mqc |"qc/somalier_trio/somalier_general_stats_mqc.tsv"| MultiQC custom content TSV for general statistics table |

🔧 Configuration

Software settings (config.yaml)

| Key | Type | Description | | --- | --- | --- | | benchmark_repeats | integer | set number of times benchmark should be repeated | | container | string | name or path to docker/singularity container |

Resources settings (resources.yaml)

| Key | Type | Description | | --- | --- | --- | | mem_mb | integer | max memory in MB to be available | | mem_per_cpu | integer | memory in MB used per cpu | | partition | string | partition to use on cluster | | threads | integer | number of threads to be available | | time | string | max execution time |