Poirot RD WGS

Mapping reads and BAM file processing

When GPUs are available Poirot can be configured to use Nvidia's Parabricks for read mapping using fq2bam tool. This tool performs read mapping with a GPU-accelerated version of BWA-mem, sorting and marking of duplicates. See the alignment hydra-genetics module or parabricks hydra-genetics module documentation for more details on the softwares. Default hydra-genetics settings/resources are used if no configuration is specified.

When only CPUs are available Poirot can be configured perform the read mapping, sorting and duplicate marking on CPU.

Variant Calling

See the snv_indels hydra-genetics module or parabricks hydra-genetics module documentation for more details on the softwares for variant calling, annotation hydra-genetics module for annotation, filtering hydra-genetics module for filtering and cnv hydra-genetics module for more details on the softwares for cnv calling. Default hydra-genetics settings/resources are used if no configuration is specified.

Annotation of variant calls

Variant calls for both SNVs and indels, and SVs can be performed by Ensembl's VEP tool. Howver this is optional and can be set in the config. See the section on running the pipeline for details.

SNV and INDELs

Mitochondrial short variants

CNVs and SVs

  • CNV callers

  • Structural variant callers

  • Mobile elements

    • MELT that call ALU, HERVK, LINE1 and SVA mobile elements.
  • Merging and filtering of SV VCF files

Repeat expansions

Regions Of Homozygosity

SMN Copy Number

  • SMNCopyNumberCaller and [Hydra genetics documentation](https://hydra-genetics-cnv-sv.readthedocs.io/en/latest/softwares/#smncopynumbercaller

UniParental Disomy

  • upd and hydra genetics documentation upd

QC

See the qc hydra-genetics module documentation for more details on the softwares for the quality control. Default hydra-genetics settings/resources are used if no configuration is specified.

Poirot produces a MultiQC-report for the entire sequencing run to enable easier QC tracking. The report starts with a general statistics table showing the most important QC-values followed by additional QC data and diagrams. The entire MultiQC html-file is interactive and you can filter, highlight, hide or export data using the ToolBox at the right edge of the report.

Coverage for genes and gene panels.

Results written to an excel spreadsheet with a tab for each gene panel.



To implement

  • GATK CNV germline caller
  • Continued work on SV calling and filtering
  • Mobile elements
  • Several sex-checks
  • samtools idxstats helps with determining sex, can see XXY and females with highly homozygote chrX (make a table with predicted sex based on this)