Rules specific to Poirot¶
create_cov_excel¶
Script that creates the gene panel coverage excel file for poirot.
Rule¶
rule create_cov_excel:
input:
bedfile=config["reference"]["coverage_bed"],
cov_regions="qc/mosdepth_bed/{sample}_{type}.regions.bed.gz",
cov_thresh="qc/mosdepth_bed/{sample}_{type}.thresholds.bed.gz",
duplication_file="qc/picard_collect_duplication_metrics/{sample}_{type}.duplication_metrics.txt",
genepanels=config["reference"]["genepanels"],
low_cov="qc/mosdepth_bed/{sample}_{type}.mosdepth.lowCov.regions.txt",
summary="qc/mosdepth_bed/{sample}_{type}.mosdepth.summary.txt",
output:
out=temp("qc/create_cov_excel/{sample}_{type}.coverage.xlsx"),
log:
"qc/create_cov_excel/{sample}_{type}.log",
benchmark:
repeat(
"qc/create_cov_excel/create_cov_excel_{sample}_{type}.benchmark.tsv",
config.get("create_cov_excel", {}).get("benchmark_repeats", 1),
)
threads: config.get("create_cov_excel", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("create_cov_excel", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("create_cov_excel", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("create_cov_excel", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("create_cov_excel", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("create_cov_excel", {}).get("time", config["default_resources"]["time"]),
container:
config.get("create_cov_excel", {}).get("container", config["default_container"])
message:
"{rule}: Get coverage analysis per gene into excel, with tab for each panel and one for all genes in bed"
script:
"../scripts/create_excel.py"
input / output files¶
deepvariant_add_ref¶
Add the reference genome path to the deepvariant vcf header
Rule¶
rule deepvariant_add_ref:
input:
vcf="snv_indels/vcf_final/{sample}_{type}.fix_af.vcf.gz",
ref=config["reference"]["fasta"],
output:
vcf=temp("snv_indels/vcf_final/{sample}_{type}_ref.vcf"),
log:
"snv_indels/vcf_final/{sample}_{type}_ref.log",
benchmark:
repeat(
"snv_indels/vcf_final/{sample}_{type}_ref.vcf.benchmark.tsv",
config.get("deepvariant_add_ref", {}).get("benchmark_repeats", 1),
)
resources:
mem_mb=config.get("deepvariant_add_ref", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("deepvariant_add_ref", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("deepvariant_add_ref", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("deepvariant_add_ref", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("deepvariant_add_ref", {}).get("time", config["default_resources"]["time"]),
container:
config.get("deepvariant_add_ref", {}).get("container", config["default_container"])
message:
"{rule}: Add reference to the header of the deepvariant vcf: {input.vcf}"
script:
"../scripts/ref_vcf.py"
input / output files¶
| Rule parameters | Key | Value | Description |
| --- | --- | --- | --- |
| input | vcf |"snv_indels/vcf_final/{sample}{type}.fix_af.vcf.gz"| deepvariant vcf where reference genome version should be added to VCF header |
| _ _ | ref |config["reference"]["fasta"]| The fasta reference used. |
| output | vcf |"snv_indels/vcf_final/{sample}_ref.vcf"| deepvariant vcf to which the reference genome version has been added to vcf header |
Configuration¶
Software settings (config.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| benchmark_repeats | integer | set number of times benchmark should be repeated |
| container | string | name or path to docker/singularity container |
| extra | string | parameters that should be forwarded |
Resources settings (resources.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| mem_mb | integer | max memory in MB to be available |
| mem_per_cpu | integer | memory in MB used per cpu |
| partition | string | partition to use on cluster |
| threads | integer | number of threads to be available |
| time | string | max execution time |
[filter_par_dups]¶
A custom python script to filter DUP calls in male sample chrX PAR regions in cnvpytor vcf files.
Rule¶
rule filter_par_dups:
input:
vcf=get_cnvpytor_male_input,
bed=config["filter_par_dups"]["bed"],
output:
vcf="cnv_sv/cnvpytor/{sample}_{type}.par_dups_filtered.vcf.gz",
params:
extra=config.get("filter_par_dups", {}).get("extra", ""),
log:
"cnv_sv/cnvpytor/{sample}_{type}.par_dups_filtered.vcf.gz.log",
benchmark:
repeat(
"cnv_sv/cnvpytor/{sample}_{type}.par_dups_filtered.vcf.gz.benchmark.tsv",
config.get("filter_par_dups", {}).get("benchmark_repeats", 1),
)
threads: config.get("filter_par_dups", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("filter_par_dups", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("filter_par_dups", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("filter_par_dups", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("filter_par_dups", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("filter_par_dups", {}).get("time", config["default_resources"]["time"]),
container:
config.get("filter_par_dups", {}).get("container", config["default_container"])
message:
"{rule}: filter cnvpytor DUP calls in {input.vcf} located in for {input.bed}"
script:
"../scripts/filter_bed_cnvs.py"
input / output files¶
| Rule parameters | Key | Value | Description |
| --- | --- | --- | --- |
| input | vcf |get_cnvpytor_male_input| cnvpytor vcf to which the filter for par dups should be applied |
| _ _ | bed |config["filter_par_dups"]["bed"]| bed file with par dup regions to be filtered out |
| output | vcf |"cnv_sv/cnvpytor/{sample}_{type}.par_dups_filtered.vcf.gz"| filtered cnvpytor vcf with par dups filtered out |
Configuration¶
Software settings (config.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| benchmark_repeats | integer | set number of times benchmark should be repeated |
| container | string | name or path to docker/singularity container |
| extra | string | parameters that should be forwarded |
Resources settings (resources.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| mem_mb | integer | max memory in MB to be available |
| mem_per_cpu | integer | memory in MB used per cpu |
| partition | string | partition to use on cluster |
| threads | integer | number of threads to be available |
| time | string | max execution time |
fix_mt_gt¶
A script that postprocesses the GATK mitochondrial normalised mutect2 VCF. It looks for GT fields that have more than two entries (e.g. '0/././1, or '0/1/./.' etc) and converts them to '0/1' as some tools can not parse the vcf when the GT field has missing alleles and has more than two allele fields.
Rule¶
rule fix_mt_gt:
input:
vcf="mitochondrial/gatk_select_variants_final/{sample}_{type}.fix_af.vcf",
output:
vcf="mitochondrial/gatk_select_variants_final/{sample}_{type}.fix_gt.vcf",
params:
extra=config.get("fix_mt_gt", {}).get("extra", ""),
log:
"mitochondrial/gatk_select_variants_final/{sample}_{type}.fix_af.gt_fixed.vcf",
benchmark:
repeat(
"mitochondrial/gatk_select_variants_final/{sample}_{type}.fix_af.gt_fixed.vcf.benchmark.tsv",
config.get("fix_mt_gt", {}).get("benchmark_repeats", 1),
)
threads: config.get("fix_mt_gt", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("fix_mt_gt", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("fix_mt_gt", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("fix_mt_gt", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("fix_mt_gt", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("fix_mt_gt", {}).get("time", config["default_resources"]["time"]),
container:
config.get("fix_mt_gt", {}).get("container", config["default_container"])
message:
"{rule}: fix GT fields with >2 alleles in {input.vcf} after GATK multiallelic splitting (e.g., '0/././1')."
script:
"../scripts/fix_mt_gt.py"
input / output files¶
| Rule parameters | Key | Value | Description |
| --- | --- | --- | --- |
| input | vcf |"mitochondrial/gatk_select_variants_final/{sample}{type}.fix_af.vcf"| filtered mutect2 vcf with mitochondrial variant calls |
| output | vcf |"mitochondrial/gatk_select_variants_final/{sample}.fix_gt.vcf"| VCF with MT genotypes at split multiallelic sites fixed |
Configuration¶
Software settings (config.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| benchmark_repeats | integer | set number of times benchmark should be repeated |
| container | string | name or path to docker/singularity container |
| extra | string | parameters that should be forwarded |
Resources settings (resources.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| mem_mb | integer | max memory in MB to be available |
| mem_per_cpu | integer | memory in MB used per cpu |
| partition | string | partition to use on cluster |
| threads | integer | number of threads to be available |
| time | string | max execution time |
svdb_add_ref¶
Add the reference genome path to svdb merge vcf header
Rule¶
rule svdb_add_ref:
input:
vcf="cnv_sv/svdb_query/{sample}_{type}.merged.svdb_query.vcf.gz",
ref=config["reference"]["fasta"],
output:
vcf="cnv_sv/svdb_query/{sample}_{type}.merged.svdb_query_ref.vcf",
params:
extra=config.get("svdb_add_ref", {}).get("extra", ""),
log:
"cnv_sv/svdb_query/{sample}_{type}.add_ref.log",
benchmark:
repeat(
"cnv_sv/svdb_merge/{sample}_{type}.output.benchmark.tsv", config.get("svdb_add_ref", {}).get("benchmark_repeats", 1)
)
threads: config.get("svdb_add_ref", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("svdb_add_ref", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("svdb_add_ref", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("svdb_add_ref", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("svdb_add_ref", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("svdb_add_ref", {}).get("time", config["default_resources"]["time"]),
container:
config.get("svdb_add_ref", {}).get("container", config["default_container"])
message:
"{rule}: Add reference to the header of the svdb vcf: {input.vcf}"
script:
"../scripts/ref_vcf.py"
input / output files¶
| Rule parameters | Key | Value | Description |
| --- | --- | --- | --- |
| input | vcf |"cnv_sv/svdb_query/{sample}{type}.merged.svdb_query.vcf.gz"| SVDB merged vcf to which the reference genome version should be added to vcf header. |
| _ _ | ref |config["reference"]["fasta"]| The fasta reference used |
| output | vcf |"cnv_sv/svdb_query/{sample}.merged.svdb_query_ref.vcf"| SVDB merged vcf where the reference genome version has been added to vcf header. |
Configuration¶
Software settings (config.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| benchmark_repeats | integer | set number of times benchmark should be repeated |
| container | string | name or path to docker/singularity container |
| extra | string | parameters that should be forwarded |
Resources settings (resources.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| mem_mb | integer | max memory in MB to be available |
| mem_per_cpu | integer | memory in MB used per cpu |
| partition | string | partition to use on cluster |
| threads | integer | number of threads to be available |
| time | string | max execution time |
tiddit_add_ref¶
Add the reference genome path to tiddit vcf header
Rule¶
rule tiddit_add_ref:
input:
vcf="cnv_sv/tiddit/{sample}_{type}.vcf.gz",
ref=config["reference"]["fasta"],
output:
vcf="cnv_sv/tiddit/{sample}_{type}_ref.vcf",
params:
extra=config.get("tiddit_add_ref", {}).get("extra", ""),
log:
"cnv_sv/tiddit/{sample}_{type}.add_ref.log",
benchmark:
repeat(
"cnv_sv/tiddit/{sample}_{type}.add_ref.benchmark.tsv",
config.get("tiddit_add_ref", {}).get("benchmark_repeats", 1),
)
threads: config.get("tiddit_add_ref", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("tiddit_add_ref", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("tiddit_add_ref", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("tiddit_add_ref", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("tiddit_add_ref", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("tiddit_add_ref", {}).get("time", config["default_resources"]["time"]),
container:
config.get("tiddit_add_ref", {}).get("container", config["default_container"])
message:
"{rule}: Add reference to the header of the tiddit vcf: {input.vcf}"
script:
"../scripts/ref_vcf.py"
input / output files¶
| Rule parameters | Key | Value | Description |
| --- | --- | --- | --- |
| input | vcf |"cnv_sv/tiddit/{sample}{type}.vcf.gz"| Tiddit vcf to which the reference genome version should be added to vcf header. |
| _ _ | ref |config["reference"]["fasta"]| The fasta reference used. |
| output | vcf |"cnv_sv/tiddit/{sample}_ref.vcf"| Tiddit vcf where the reference genome version has been added to vcf header. |
Configuration¶
Software settings (config.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| benchmark_repeats | integer | set number of times benchmark should be repeated |
| container | string | name or path to docker/singularity container |
| extra | string | parameters that should be forwarded |
Resources settings (resources.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| mem_mb | integer | max memory in MB to be available |
| mem_per_cpu | integer | memory in MB used per cpu |
| partition | string | partition to use on cluster |
| threads | integer | number of threads to be available |
| time | string | max execution time |
vcf_to_aed¶
Conversion of cnvpytor vcf to AED file format. The AED file can be read by Chromosome Analysis Suite (ChAS).
Rule¶
rule vcf_to_aed:
input:
vcf="cnv_sv/cnvpytor/{sample}_{type}.vcf",
output:
aed="cnv_sv/cnvpytor/{sample}_{type}.aed",
params:
extra=config.get("vcf_to_aed", {}).get("extra", ""),
log:
"cnv_sv/cnvpytor/{sample}_{type}.aed.log",
benchmark:
repeat("cnv_sv/cnvpytor/{sample}_{type}.aed.benchmark.tsv", config.get("vcf_to_aed", {}).get("benchmark_repeats", 1))
threads: config.get("vcf_to_aed", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("vcf_to_aed", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("vcf_to_aed", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("vcf_to_aed", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("vcf_to_aed", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("vcf_to_aed", {}).get("time", config["default_resources"]["time"]),
container:
config.get("vcf_to_aed", {}).get("container", config["default_container"])
message:
"{rule}: convert {input.vcf} to AED format"
script:
"../scripts/cnvpytor_vcf_to_aed.py"
input / output files¶
| Rule parameters | Key | Value | Description |
| --- | --- | --- | --- |
| input | vcf |"cnv_sv/cnvpytor/{sample}{type}.vcf"| VCF with CNVpytor calls |
| output | aed |"cnv_sv/cnvpytor/{sample}.aed"| CNVpytor calls in Affymetrix Extensible Data format. |
Configuration¶
Software settings (config.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| benchmark_repeats | integer | set number of times benchmark should be repeated |
| container | string | name or path to docker/singularity container |
| extra | string | parameters that should be forwarded |
Resources settings (resources.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| mem_mb | integer | max memory in MB to be available |
| mem_per_cpu | integer | memory in MB used per cpu |
| partition | string | partition to use on cluster |
| threads | integer | number of threads to be available |
| time | string | max execution time |
vcf_to_aed_filtered¶
Conversion of the filtered cnvpytor vcf to AED file format. The AED file can be read by Chromosome Analysis Suite (ChAS).
Rule¶
rule vcf_to_aed_filtered:
input:
vcf="cnv_sv/cnvpytor/{sample}_{type}.hardfiltered.vcf",
output:
aed="cnv_sv/cnvpytor/{sample}_{type}_filtered.aed",
params:
extra=config.get("vcf_to_aed", {}).get("extra", ""),
log:
"cnv_sv/cnvpytor/{sample}_{type}.aed.log",
benchmark:
repeat("cnv_sv/cnvpytor/{sample}_{type}.aed.benchmark.tsv", config.get("vcf_to_aed", {}).get("benchmark_repeats", 1))
threads: config.get("vcf_to_aed", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("vcf_to_aed", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("vcf_to_aed", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("vcf_to_aed", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("vcf_to_aed", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("vcf_to_aed", {}).get("time", config["default_resources"]["time"]),
container:
config.get("vcf_to_aed", {}).get("container", config["default_container"])
message:
"{rule}: convert {input.vcf} to AED format"
script:
"../scripts/cnvpytor_vcf_to_aed.py"
input / output files¶
| Rule parameters | Key | Value | Description |
| --- | --- | --- | --- |
| input | vcf |"cnv_sv/cnvpytor/{sample}{type}.hardfiltered.vcf"| VCF with filtered CNVpytor calls. |
| output | aed |"cnv_sv/cnvpytor/{sample}_filtered.aed"| Filtered CNVpytor calls in Affymetrix Extensible Data format. |
Configuration¶
Software settings (config.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| benchmark_repeats | integer | set number of times benchmark should be repeated |
| container | string | name or path to docker/singularity container |
| extra | string | parameters that should be forwarded |
Resources settings (resources.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| mem_mb | integer | max memory in MB to be available |
| mem_per_cpu | integer | memory in MB used per cpu |
| partition | string | partition to use on cluster |
| threads | integer | number of threads to be available |
| time | string | max execution time |
create_somalier_mqc_tsv¶
Create MultiQC custom content TSV files from Somalier output. This script processes somalier relatedness and sex check data to create custom tables similar to Peddy tables, with Pass/Fail QC checks.
Rule¶
rule create_somalier_mqc_tsv:
input:
pairs="qc/somalier_trio/somalier_relate.pairs.tsv",
samples="qc/somalier_trio/somalier_relate.samples.tsv",
ped="qc/somalier_trio/somalier_all.ped",
output:
rel_check_mqc=temp("qc/somalier_trio/somalier_rel_check_mqc.tsv"),
sex_check_mqc=temp("qc/somalier_trio/somalier_sex_check_mqc.tsv"),
general_stats_mqc=temp("qc/somalier_trio/somalier_general_stats_mqc.tsv"),
params:
script=f"{workflow.basedir}/scripts/create_somalier_mqc_config.py",
mqc_config=config.get("somalier_trio_mqc", {}).get("mqc_config", ""),
config_arg=lambda w, params: (f"--config {params.mqc_config}" if params.mqc_config else ""),
log:
"qc/somalier_trio_mqc/somalier_mqc.log",
benchmark:
repeat(
"qc/somalier_trio/create_somalier_mqc_tsv.benchmark.tsv",
config.get("create_somalier_mqc_tsv", {}).get("benchmark_repeats", 1),
)
resources:
mem_mb=config.get("create_somalier_mqc_tsv", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("create_somalier_mqc_tsv", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("create_somalier_mqc_tsv", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("create_somalier_mqc_tsv", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("create_somalier_mqc_tsv", {}).get("time", config["default_resources"]["time"]),
container:
config.get("create_somalier_mqc_tsv", {}).get("container", config["default_container"])
message:
"{rule}: Create multiqc custom content embedded config tsv files from somalier sex_check and relatedness files"
shell:
"""
exec &> {log}
set -ex
echo "Starting Somalier MultiQC TSV creation"
python3 {params.script} \
--pairs {input.pairs} \
--samples {input.samples} \
--ped {input.ped} \
{params.config_arg} \
--rel-check-mqc {output.rel_check_mqc} \
--sex-check-mqc {output.sex_check_mqc} \
--general-stats-mqc {output.general_stats_mqc}
"""
input / output files¶
| Rule parameters | Key | Value | Description |
| --- | --- | --- | --- |
| input | pairs |"qc/somalier_trio/somalier_relate.pairs.tsv"| somalier pairs.tsv file with relatedness data |
| | samples |"qc/somalier_trio/somalier_relate.samples.tsv"| somalier samples.tsv file with sex check data |
| _ _ | ped |"qc/somalier_trio/somalier_all.ped"| pedigree file with family information |
| output | rel_check_mqc |"qc/somalier_trio/somalier_rel_check_mqc.tsv"| MultiQC custom content TSV for relatedness check |
| | sex_check_mqc |"qc/somalier_trio/somalier_sex_check_mqc.tsv"| MultiQC custom content TSV for sex check |
| _ _ | general_stats_mqc |"qc/somalier_trio/somalier_general_stats_mqc.tsv"| MultiQC custom content TSV for general statistics table |
Configuration¶
Software settings (config.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| benchmark_repeats | integer | set number of times benchmark should be repeated |
| container | string | name or path to docker/singularity container |
Resources settings (resources.yaml)¶
| Key | Type | Description |
| --- | --- | --- |
| mem_mb | integer | max memory in MB to be available |
| mem_per_cpu | integer | memory in MB used per cpu |
| partition | string | partition to use on cluster |
| threads | integer | number of threads to be available |
| time | string | max execution time |