| Workflow Type | Applicable Kingdom | MycoSNP-WDL version | MycoSNP-NF Version | Command-line Compatibility | Workflow Level |
|---|---|---|---|---|---|
| mycosnp_variants | Fungi | v1.6.2-wdl | v1.6.3 | Yes | Sample-level |
| mycosnp_tree | Fungi | v1.6.2-wdl | v1.6.3 | Yes | Set-level |
MycoSNP-WDL is a repository comprised of WDL wrappers of CDCGov/mycosnp-nf to enable MycoSNP use on Terra.bio. These workflows conduct Candiozyma (Candida) auris variant calling and subsequent single nucleotide polymorphism (SNP) phylogenetic tree reconstruction.
MycoSNP-NF is the source Nextflow code for analysis. MycoSNP-WDL's version naming scheme intends to remain concordant with the MycoSNP-NF version contained ONLY to the minor release version. Patch release versions may be discrepant. For example, MycoSNP-WDL v1.6.2 contains source code MycoSNP-NF v1.6.3. Please see the table above for version coordination.
mycosnp_variants calls variants for inputted reads referencing the C. auris Clade I B11205 assembly accession GCA_016772135 by default. Users can optionally reference a separate C. auris clade data directory, FASTA, or directory as described below.
Note that mycosnp_tree requires at least 4 genomes that reference the same reference in mycosnp_variants.
Inputting a reference genome is required to run mycosnp_variants. By default, the reference will be set to "Clade1". The reference can be set by one of the following inputs:
- reference optionally takes a presupplied reference clade directory depicted here. These references are derived from GenBank assemblies:
| Reference Input | Assembly Accession |
|---|---|
| Clade1 | GCA_016772135.1 |
| Clade2 | GCA_003013715.1 |
| Clade3 | GCA_002775015.1 |
| Clade4 | GCA_003014415.1 |
| Clade5 | GCA_016809505.1 |
| Clade6 | GCA_032714025.1 |
- ref_fasta optionally takes a reference FASTA (requires suffix
.fa) that will be indexed via BWA to generate a reference directory. - ref_tar optionally takes a gzipped tarchive (
.tar.gz) with the same directory structure as the provided reference clades:
data/reference
├── Clade1
│ ├── bwa
| | ├── bwa # BWA index for alignment
| | | ├── reference.am
| | | ├── reference.ann
| | | ├── reference.bwt
| | | ├── reference.pac
| | | └── reference.sa
│ ├── dict
| | └── reference.dict # Picard dictionary
│ ├── fai
| | └── reference.fa.fai # FASTA index file
│ ├── masked
| | └── reference.fa # Masked reference sequence
│ └── Clade1.fasta
├── Clade2
├── Clade3
├── Clade4
├── Clade5
├── Clade6
| Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
|---|---|---|---|---|---|
| mycosnp_variants | read1 | File | Illumina forward read file in FASTQ format (compression optional) | Required | |
| mycosnp_variants | read2 | File | Illumina reverse read file in FASTQ format (compression optional) | Required | |
| mycosnp_variants | samplename | String | Name of sample to be analyzed | Required | |
| mycosnp | coverage | Int | Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage | 0 | Optional |
| mycosnp | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
| mycosnp | debug | Boolean | If true, keeps .nextflow/ and work/ directories |
false | Optional |
| mycosnp | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
| mycosnp | docker | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp-wdl:1.6.2" | Optional |
| mycosnp | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional |
| mycosnp | min_depth | Int | Min depth for a base to be called as the consensus sequence, otherwise it will be called as an N; set to 0 to disable | 10 | Optional |
| mycosnp | reference | String | Reference clade | "Clade1" | Optional |
| mycosnp | ref_fasta | File | Reference FASTA file | Optional | |
| mycosnp | ref_tar | File | Reference gzipped compressed tarchive | Optional | |
| mycosnp | sample_ploidy | Int | 1 | Ploidy of sample (GATK) | Optional |
| version_capture | timezone | String | Alternative timezone | Optional |
| Variable | Type | Description |
|---|---|---|
| analysis_date | String | Date of the analysis |
| assembly_size | Int | Size of the assembly |
| average_q_score_after_trimming | Float | Average quality score after trimming |
| average_q_score_before_trimming | Float | Average quality score before trimming |
| consensus_n_variant_min_depth | Int | Minimum depth for consensus N variant |
| full_results | File | Full results file |
| gc_after_trimming | Float | GC content after trimming |
| gc_before_trimming | Float | GC content before trimming |
| mean_coverage_depth | Float | Mean coverage depth |
| multiqc | File | MultiQC report |
| myco_bam | File | BAM file |
| myco_bam_bai | File | BAM index file |
| mycosnp_docker | String | Docker image used for MycoSNP |
| mycosnp_variants_analysis_date | String | Date of the MycoSNP variants analysis |
| mycosnp_variants_version | String | Version of the MycoSNP variants |
| mycosnp_version | String | Version of MycoSNP |
| number_n | Int | Number of N bases |
| paired_reads_after_trimming | Int | Number of paired reads after trimming |
| paired_reads_after_trimming_percent | String | Percentage of paired reads after trimming |
| percent_reference_coverage | Float | Percentage of reference coverage |
| reads_after_trimming | Int | Number of reads after trimming |
| reads_after_trimming_percent | String | Percentage of reads after trimming |
| reads_before_trimming | Int | Number of reads before trimming |
| reads_mapped | Int | Number of reads mapped |
| reference_length_coverage_after_trimming | Float | Reference length coverage after trimming |
| reference_length_coverage_before_trimming | Float | Reference length coverage before trimming |
| reference_name | String | Name of the reference clade/input used |
| unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming |
| unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming |
| vcf | File | Compressed variant call format (VCF) file depicting SNPs |
| vcf_index | File | Compressed index file for the VCF |
mycosnp_tree reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade6 C. auris. VCF data generated from wf_mycosnp_variants.wdl are used as inputs.
NOTE: At least four samples, including reference, are required
Inputting a reference genome is required to run mycosnp_tree. By default, the reference will be set to "Clade1". The reference can be set by one of the following inputs:
- reference optionally takes a presupplied reference clade directory delineated here.
- ref_fasta optionally takes a reference FASTA (requires suffix
.fa) that will be indexed via BWA and generate a reference directory.
| Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
|---|---|---|---|---|---|
| mycosnp_tree | vcf | Array[File] | VCF files (.vcf.gz) containing SNP data for phylogenetic analysis. These files can be generated from wf_mycosnp_variants.wdl |
Required | |
| mycosnp_tree | vcf_index | Array[File] | Index files for the VCF files | Required | |
| mycosnptree | cpu | Int | Number of CPUs to allocate to the task | 8 | Optional |
| mycosnptree | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
| mycosnptree | docker | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp-wdl:1.6.2" | Optional |
| mycosnptree | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional |
| mycosnptree | reference | String | Preexisting reference directory | "Clade1" | Optional |
| mycosnptree | ref_fasta | File | Reference FASTA input | Optional | |
| version_capture | timezone | String | Alternative timezone | Optional |
| Variable | Type | Description |
|---|---|---|
| mycosnp_alignment | File | Concatenated SNP alignment file |
| mycosnp_docker | String | Docker image used for MycoSNP |
| mycosnp_fastree_tree | File | Phylogenetic tree inferred using FastTree (heuristic maximum likelihood) |
| mycosnp_iqtree_tree | File | Phylogenetic tree inferred using IQ-TREE (high quality maximum likelihood) |
| mycosnp_rapidnj_tree | File | Phylogenetic tree inferred using RapidNJ (neighbor-joining method) |
| mycosnp_tree_analysis_date | String | Date of the analysis |
| mycosnp_tree_full_results | File | Full results file |
| mycosnp_tree_vcf_csv | File | SNP variants formatted as a CSV table |
| mycosnp_tree_version | String | Version of the mycosnp_tree WDL workflow |
| mycosnp_version | String | Version of MycoSNP |
| mycosnptree_snpdists | File | SNP distances file |
| reference_name | String | Name of the reference clade/input used |