Tools to create:
- a .png image file describing all variants (obtained from vardict-java variant caller) alongside a genome/assembly (to provide) with their proportion (ordinates), with CDS descriptions (obtained from vadr annotator). At the top of the figure can be displayed the coverage depth repartition (if
-o cov_depth_foption is provided). - a .tsv file describing all details of significant variants (according to the proportion threshold chosen by the user, default: 7 percents)
- [optional] a .vcf file describing all significant variants (according to the proportion threshold)
Python/R scripts and Galaxy wrapper to use them.
It uses the results of:
- vadr >= 1.4.1 for annotation (of reference/assembly, tested with vadr 1.6.4 too)
- vardict-java 1.8.3 for variant calling (of BAM alignement using reference/assembly and reads)
-
vvv2_display.py: main script running each step of analyses This script can be run independently, once vvv2 conda environment is installed and activated. Type./vvv2_display.pythen enter to get help on how to use it. -
PYTHON_SCRIPTS/convert_tbl2json.py: Convertvadrannotation output .tbl file to json -
PYTHON_SCRIPTS/convert_vcffile_to_readablefile.py: Convertvardict-javavariant calling vcf file to human readable txt file -
PYTHON_SCRIPTS/correct_multicontig_vardict_vcf.py: Correctvadrannotation output .tbl file for contigs positions when the assembly provided is composed of more than one contig.
R_SCRIPTS/visualize_snp_v4.R: Create a .png file showing on the same png figure:- coverage depth repartition alongside the genome/assembly (if
-o cov_depth_doption provided) - variant proportions alongside the genome/assembly and CDS positions.
- coverage depth repartition alongside the genome/assembly (if
Use conda environment:
conda create -n vvv2_display -y
conda activate vvv2_display
mamba/conda install -c bioconda -c conda-forge vvv2_display
Prefer mamba installation if completely new conda environments (faster). Do not mix mamba and conda.
Description:
vvv2_display.py -h
Typical usage:
vvv2_display.py -p res_vadr_pass.tsv -f res_vadr_fail.tsv -s res_vadr_seqstat.txt -n res_vardict_all.vcf -r res_vvv2_display.png -u res_vvv2_display_snp_summary.tsv -o cov_depth_f.txt -y -w 10 -x res_vvv2_display_snp_summary.vcfwhere:
res_vadr_pass.tsvis the 'pass' file of vadr annotation program run on the genome/assembly (input)res_vadr_fail.tsvis the 'fail' file of vadr annotation program (input)res_vadr_seqstat.txtis the 'seqstat' file of vadr annotation program (input)res_vardict_all.vcfis the result of vardict-java variant caller (input)res_vvv2_display.pngis the name of the main output file (will be created) (main output)res_vvv2_display_snp_summary.tsvis the name of the main output file (will be always created, this option allow to choose its name) (main output)cov_depth_f.txtis the coverage depth by position, provided bysamtools depthrun on the bam alignement file (optional input)-ytells to display coverage depth in linear scale (default log10 scale) (optional input)-w 10tells to set var significant threshold at 10% (default 7%): graphics display all variants, tsv summary will keep only significant ones (representation higher than this threshold) (optional input)res_vvv2_display_snp_summary.vcfis the summary of significatn variants in vcf format (optional output)
All other options are for Galaxy wrapper compatibility (these are intermediate temporary files that must appear as parameter for Galaxy wrapper but are not used in a usual command line call)
Minimal usage:
vvv2_display.py -p res_vadr_pass.tsv -f res_vadr_fail.tsv -s res_vadr_seqstat.txt -n res_vardict_all.vcf -r res_vvv2_display.png [-o cov_depth_f.txt]Example is obtained on Turkey Coronavirus sequencing data, with as reference, the first draft assembly.
- png file:
Dotted vertical dash lines are contig boundaries.
- tsv summary file:
indice position position_ori ref alt freq gene prot lseq rseq isHomo*
1 6388 6388 A G 0.1429 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3 putative papain-like protease GTATGGTCATCAAAATACAT GTATTGTAGAAATTGTGATG no
2 6622 6622 A G 0.0833 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3 putative papain-like protease GGAAGCATTGAAATGTGAAC GAAGAAAGCTGTTTTTCTTA no
3 6838 6838 A G 0.1429 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3 putative papain-like protease TATAATTTCTGTAGATACTG AGTTTGTGACATTTTGTCTA no
4 7014 7014 R A 0.8824 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3 putative papain-like protease CTGATAAATTAACACCTCGT TACCGTCATATGGTATAGAC no
5 7833 7833 G A 0.0909 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP4 ATGCACCTGGAGCTTTACCA ATTGTTTTAATGGTGATAAT no
6 8110 8110 T A 0.0833 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP4 TAGTACATTCTTTACTGGTG AGAACTTATGTTTAATATGG no
7 9328 9328 A G 0.1034 1a ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP5 putative 3C-like proteinase CCTACATGGTGAGTTCTATG TGCATTACACACTGGAACGG no
8 13404 48 A C 0.1429 intergene intergene TTTAGTTGATCTTAGAACGT GTTAGTGGGAACATCCAATA no
9 15255 1358 A T 0.0882 1ab similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase GTTGTCAATACCGTTAGTAT CTGTGGTAATCATAAACCAA no
10 15319 1422 C T 0.0769 1ab similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase AGCGAAAATGTTGATGATTT TACAGGGCTAATTGTGCTGG no
11 15326 1429 A G 0.08 1ab similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase ATGTTGATGATTTTAATCAA CTAATTGTGCTGGCAGCGAA no
12 19937 6040 G A 0.0714 1ab similar to ORF1ab polyprotein,similar to NSP16:GBSEP:putative 2-O-ribose methyltransferase AAAATTTATATGACATTGCA TAACAGAGACAAGTTGGCAC no
13 21092 7195 T C 0.0811 S similar to spike protein GTTTCTTATGATTATCAGTG TTACGTGGTGATAACACTGG no
14 25794 11897 TT AA 0.0838 5b 5b protein CTTAACAAAGCAGGACAAGC AGGATTAGATTGTGTTTACT no
*NB: an homopolymer region is set to 'yes' if there is a succession of at least 3 identical nucleotides.
it looks like a restrictive measure, but Ion Torrent and Nanopore sequencing are very bad on such region, so make sure you verify these variants.
Input data files to test the program are provided in the test-data directory when you clone the repository of vvv2_display program.
Then you can run one of the following command depending on your expected graphical output.
- if you don't want coverage depth graphical display in the picture or do not have coverage depth informations of your sample:
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv
- if you want coverage depth graphical display in the picture (log scale)
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -o test-data/res2_covdepth.txt -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv
- if you want coverage depth graphical display in the picture (normal scale)
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -o test-data/res2_covdepth.txt -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv -y
Please, if you use vvv2_display and publish results, cite:
- The article: Flageul, Alexandre, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, and Fabrice Touzain. "vvv2_align_SE, vvv2_align_PE / vvv2_display: Galaxy-Based Workflows and Tool Designed to Perform, Summarize and Visualize Variant Calling and Annotation in Viral Genome Assemblies". Viruses. 2025;17:1385. https://doi.org/10.3390/v17101385.
And for vardict-java and vadr, respectively:
- Lai, Zhongwu, Aleksandra Markovets, Miika Ahdesmaki, Brad Chapman, Oliver Hofmann, Robert McEwen, Justin Johnson, Brian Dougherty, J. Carl Barrett, and Jonathan R. Dry. “VarDict: A Novel and Versatile Variant Caller for next-Generation Sequencing in Cancer Research.” Nucleic Acids Research 44, no. 11 (June 20, 2016): e108–e108. https://doi.org/10.1093/nar/gkw227.
- Schäffer, Alejandro A., Eneida L. Hatcher, Linda Yankie, Lara Shonkwiler, J. Rodney Brister, Ilene Karsch-Mizrachi, and Eric P. Nawrocki. “VADR: Validation and Annotation of Virus Sequence Submissions to GenBank.” BMC Bioinformatics 21, no. 1 (December 2020): 211. https://doi.org/10.1186/s12859-020-3537-3.
vvv2_display.xml: Allow Galaxy integration ofvvv2_display.py. vvv2_display can be used in Galaxy pipelines.
it can be found in the Galaxy toolshed at https://toolshed.g2.bx.psu.edu/repository
- with bwa-mem2 alignment of Illumina paired-end sequencing data (Mi-seq, Nextseq, Novaseq, Hiseq, Iseq): https://workflowhub.eu/workflows/1738
- with bwa-mem2 alignment of Illumina or Proton single-end sequencing data: https://workflowhub.eu/workflows/1739
- with bwa-mem2 alignment of Nanopore sequencing data (MinION, PromethION, GridION): https://workflowhub.eu/workflows/1740
- with minimap2 alignment of Pacbio sequencing data (high quality long reads): https://workflowhub.eu/workflows/1741
-
Poster of the program accepted in JOBIM 2025 conference in Bordeaux (France, July 2025), can be found here: doi: 10.5281/zenodo.16918391 or accessed using these QRcode (A0 pdf, 2.7 MB):
-
Additional vadr database for specific viruses:
- Porcin Circo Virus: doi: 10.5281/zenodo.15065124
- EMERGEN/EMERGEN2 ANR project involving:
- Agence Nationale de Sécurité Sanitaire de l'Alimentation, de l'Environnement et du Travail
- Santé Publique France
- Conseil régional de Bretagne

