Skip to content

theiagen/mycosnp-wdl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

136 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MycoSNP-WDL Workflow Series

Quick Facts

Workflow Type Applicable Kingdom MycoSNP-WDL version MycoSNP-NF Version Command-line Compatibility Workflow Level
mycosnp_variants Fungi v1.6.2-wdl v1.6.3 Yes Sample-level
mycosnp_tree Fungi v1.6.2-wdl v1.6.3 Yes Set-level

MycoSNP-WDL

MycoSNP-WDL is a repository comprised of WDL wrappers of CDCGov/mycosnp-nf to enable MycoSNP use on Terra.bio. These workflows conduct Candiozyma (Candida) auris variant calling and subsequent single nucleotide polymorphism (SNP) phylogenetic tree reconstruction.

MycoSNP-WDL v. MycoSNP-NF

MycoSNP-NF is the source Nextflow code for analysis. MycoSNP-WDL's version naming scheme intends to remain concordant with the MycoSNP-NF version contained ONLY to the minor release version. Patch release versions may be discrepant. For example, MycoSNP-WDL v1.6.2 contains source code MycoSNP-NF v1.6.3. Please see the table above for version coordination.


wf_mycosnp_variants.wdl

mycosnp_variants calls variants for inputted reads referencing the C. auris Clade I B11205 assembly accession GCA_016772135 by default. Users can optionally reference a separate C. auris clade data directory, FASTA, or directory as described below.

Note that mycosnp_tree requires at least 4 genomes that reference the same reference in mycosnp_variants.

Inputs

Inputting a reference genome is required to run mycosnp_variants. By default, the reference will be set to "Clade1". The reference can be set by one of the following inputs:

  • reference optionally takes a presupplied reference clade directory depicted here. These references are derived from GenBank assemblies:
Reference Input Assembly Accession
Clade1 GCA_016772135.1
Clade2 GCA_003013715.1
Clade3 GCA_002775015.1
Clade4 GCA_003014415.1
Clade5 GCA_016809505.1
Clade6 GCA_032714025.1
  • ref_fasta optionally takes a reference FASTA (requires suffix .fa) that will be indexed via BWA to generate a reference directory.
  • ref_tar optionally takes a gzipped tarchive (.tar.gz) with the same directory structure as the provided reference clades:
data/reference
├── Clade1
│   ├── bwa
|   |   ├── bwa                 # BWA index for alignment 
|   |   |   ├── reference.am
|   |   |   ├── reference.ann
|   |   |   ├── reference.bwt
|   |   |   ├── reference.pac
|   |   |   └── reference.sa
│   ├── dict                    
|   |   └── reference.dict      # Picard dictionary
│   ├── fai                     
|   |   └── reference.fa.fai    # FASTA index file
│   ├── masked                  
|   |   └── reference.fa        # Masked reference sequence
│   └── Clade1.fasta
├── Clade2
├── Clade3
├── Clade4
├── Clade5
├── Clade6
Terra Task Name Variable Type Description Default Value Terra Status
mycosnp_variants read1 File Illumina forward read file in FASTQ format (compression optional) Required
mycosnp_variants read2 File Illumina reverse read file in FASTQ format (compression optional) Required
mycosnp_variants samplename String Name of sample to be analyzed Required
mycosnp coverage Int Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage 0 Optional
mycosnp cpu Int Number of CPUs to allocate to the task 8 Optional
mycosnp debug Boolean If true, keeps .nextflow/ and work/ directories false Optional
mycosnp disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
mycosnp docker String The Docker container to use for the task "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp-wdl:1.6.2" Optional
mycosnp memory Int Amount of memory/RAM (in GB) to allocate to the task 64 Optional
mycosnp min_depth Int Min depth for a base to be called as the consensus sequence, otherwise it will be called as an N; set to 0 to disable 10 Optional
mycosnp reference String Reference clade "Clade1" Optional
mycosnp ref_fasta File Reference FASTA file Optional
mycosnp ref_tar File Reference gzipped compressed tarchive Optional
mycosnp sample_ploidy Int 1 Ploidy of sample (GATK) Optional
version_capture timezone String Alternative timezone Optional

Outputs

Variable Type Description
analysis_date String Date of the analysis
assembly_size Int Size of the assembly
average_q_score_after_trimming Float Average quality score after trimming
average_q_score_before_trimming Float Average quality score before trimming
consensus_n_variant_min_depth Int Minimum depth for consensus N variant
full_results File Full results file
gc_after_trimming Float GC content after trimming
gc_before_trimming Float GC content before trimming
mean_coverage_depth Float Mean coverage depth
multiqc File MultiQC report
myco_bam File BAM file
myco_bam_bai File BAM index file
mycosnp_docker String Docker image used for MycoSNP
mycosnp_variants_analysis_date String Date of the MycoSNP variants analysis
mycosnp_variants_version String Version of the MycoSNP variants
mycosnp_version String Version of MycoSNP
number_n Int Number of N bases
paired_reads_after_trimming Int Number of paired reads after trimming
paired_reads_after_trimming_percent String Percentage of paired reads after trimming
percent_reference_coverage Float Percentage of reference coverage
reads_after_trimming Int Number of reads after trimming
reads_after_trimming_percent String Percentage of reads after trimming
reads_before_trimming Int Number of reads before trimming
reads_mapped Int Number of reads mapped
reference_length_coverage_after_trimming Float Reference length coverage after trimming
reference_length_coverage_before_trimming Float Reference length coverage before trimming
reference_name String Name of the reference clade/input used
unpaired_reads_after_trimming Int Number of unpaired reads after trimming
unpaired_reads_after_trimming_percent String Percentage of unpaired reads after trimming
vcf File Compressed variant call format (VCF) file depicting SNPs
vcf_index File Compressed index file for the VCF

wf_mycosnp_tree.wdl

mycosnp_tree reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade6 C. auris. VCF data generated from wf_mycosnp_variants.wdl are used as inputs.

NOTE: At least four samples, including reference, are required

Inputs

Inputting a reference genome is required to run mycosnp_tree. By default, the reference will be set to "Clade1". The reference can be set by one of the following inputs:

  • reference optionally takes a presupplied reference clade directory delineated here.
  • ref_fasta optionally takes a reference FASTA (requires suffix .fa) that will be indexed via BWA and generate a reference directory.
Terra Task Name Variable Type Description Default Value Terra Status
mycosnp_tree vcf Array[File] VCF files (.vcf.gz) containing SNP data for phylogenetic analysis. These files can be generated from wf_mycosnp_variants.wdl Required
mycosnp_tree vcf_index Array[File] Index files for the VCF files Required
mycosnptree cpu Int Number of CPUs to allocate to the task 8 Optional
mycosnptree disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
mycosnptree docker String The Docker container to use for the task "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp-wdl:1.6.2" Optional
mycosnptree memory Int Amount of memory/RAM (in GB) to allocate to the task 64 Optional
mycosnptree reference String Preexisting reference directory "Clade1" Optional
mycosnptree ref_fasta File Reference FASTA input Optional
version_capture timezone String Alternative timezone Optional

Outputs

Variable Type Description
mycosnp_alignment File Concatenated SNP alignment file
mycosnp_docker String Docker image used for MycoSNP
mycosnp_fastree_tree File Phylogenetic tree inferred using FastTree (heuristic maximum likelihood)
mycosnp_iqtree_tree File Phylogenetic tree inferred using IQ-TREE (high quality maximum likelihood)
mycosnp_rapidnj_tree File Phylogenetic tree inferred using RapidNJ (neighbor-joining method)
mycosnp_tree_analysis_date String Date of the analysis
mycosnp_tree_full_results File Full results file
mycosnp_tree_vcf_csv File SNP variants formatted as a CSV table
mycosnp_tree_version String Version of the mycosnp_tree WDL workflow
mycosnp_version String Version of MycoSNP
mycosnptree_snpdists File SNP distances file
reference_name String Name of the reference clade/input used

About

A WDL wrapper of CDCGov/mycosnp-nf for Terra.bio

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors