Skip to content

ANSES-Ploufragan/vvv2_display

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

306 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vvv2_display

Description

Tools to create:

  • a .png image file describing all variants (obtained from vardict-java variant caller) alongside a genome/assembly (to provide) with their proportion (ordinates), with CDS descriptions (obtained from vadr annotator). At the top of the figure can be displayed the coverage depth repartition (if -o cov_depth_f option is provided).
  • a .tsv file describing all details of significant variants (according to the proportion threshold chosen by the user, default: 7 percents)
  • [optional] a .vcf file describing all significant variants (according to the proportion threshold)

Python/R scripts and Galaxy wrapper to use them.

It uses the results of:

  • vadr >= 1.4.1 for annotation (of reference/assembly, tested with vadr 1.6.4 too)
  • vardict-java 1.8.3 for variant calling (of BAM alignement using reference/assembly and reads)

Programs

  • vvv2_display.py: main script running each step of analyses This script can be run independently, once vvv2 conda environment is installed and activated. Type ./vvv2_display.py then enter to get help on how to use it.

  • PYTHON_SCRIPTS/convert_tbl2json.py: Convert vadr annotation output .tbl file to json

  • PYTHON_SCRIPTS/convert_vcffile_to_readablefile.py: Convert vardict-java variant calling vcf file to human readable txt file

  • PYTHON_SCRIPTS/correct_multicontig_vardict_vcf.py: Correct vadr annotation output .tbl file for contigs positions when the assembly provided is composed of more than one contig.

  • R_SCRIPTS/visualize_snp_v4.R: Create a .png file showing on the same png figure:
    • coverage depth repartition alongside the genome/assembly (if -o cov_depth_d option provided)
    • variant proportions alongside the genome/assembly and CDS positions.

Installation

Use conda environment:

conda create -n vvv2_display -y
conda activate vvv2_display
mamba/conda install -c bioconda -c conda-forge vvv2_display

Prefer mamba installation if completely new conda environments (faster). Do not mix mamba and conda.

Description:

vvv2_display.py -h

Typical usage:

vvv2_display.py -p res_vadr_pass.tsv -f res_vadr_fail.tsv -s res_vadr_seqstat.txt -n res_vardict_all.vcf -r res_vvv2_display.png -u res_vvv2_display_snp_summary.tsv -o cov_depth_f.txt -y -w 10 -x res_vvv2_display_snp_summary.vcf

where:

  • res_vadr_pass.tsv is the 'pass' file of vadr annotation program run on the genome/assembly (input)
  • res_vadr_fail.tsv is the 'fail' file of vadr annotation program (input)
  • res_vadr_seqstat.txt is the 'seqstat' file of vadr annotation program (input)
  • res_vardict_all.vcf is the result of vardict-java variant caller (input)
  • res_vvv2_display.png is the name of the main output file (will be created) (main output)
  • res_vvv2_display_snp_summary.tsv is the name of the main output file (will be always created, this option allow to choose its name) (main output)
  • cov_depth_f.txt is the coverage depth by position, provided by samtools depth run on the bam alignement file (optional input)
  • -y tells to display coverage depth in linear scale (default log10 scale) (optional input)
  • -w 10 tells to set var significant threshold at 10% (default 7%): graphics display all variants, tsv summary will keep only significant ones (representation higher than this threshold) (optional input)
  • res_vvv2_display_snp_summary.vcf is the summary of significatn variants in vcf format (optional output)

All other options are for Galaxy wrapper compatibility (these are intermediate temporary files that must appear as parameter for Galaxy wrapper but are not used in a usual command line call)

Minimal usage:

vvv2_display.py -p res_vadr_pass.tsv -f res_vadr_fail.tsv -s res_vadr_seqstat.txt -n res_vardict_all.vcf -r res_vvv2_display.png [-o cov_depth_f.txt]

Output example

Example is obtained on Turkey Coronavirus sequencing data, with as reference, the first draft assembly.

  • png file:

img/res_vvv2.png

Dotted vertical dash lines are contig boundaries.

  • tsv summary file:
indice	position	position_ori	ref	alt	freq	gene	prot	lseq	rseq	isHomo*
1	6388	6388	A	G	0.1429	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3  putative papain-like protease	GTATGGTCATCAAAATACAT	GTATTGTAGAAATTGTGATG	no
2	6622	6622	A	G	0.0833	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3  putative papain-like protease	GGAAGCATTGAAATGTGAAC	GAAGAAAGCTGTTTTTCTTA	no
3	6838	6838	A	G	0.1429	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3  putative papain-like protease	TATAATTTCTGTAGATACTG	AGTTTGTGACATTTTGTCTA	no
4	7014	7014	R	A	0.8824	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3  putative papain-like protease	CTGATAAATTAACACCTCGT	TACCGTCATATGGTATAGAC	no
5	7833	7833	G	A	0.0909	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP4	ATGCACCTGGAGCTTTACCA	ATTGTTTTAATGGTGATAAT	no
6	8110	8110	T	A	0.0833	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP4	TAGTACATTCTTTACTGGTG	AGAACTTATGTTTAATATGG	no
7	9328	9328	A	G	0.1034	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP5  putative 3C-like proteinase	CCTACATGGTGAGTTCTATG	TGCATTACACACTGGAACGG	no
8	13404	48	A	C	0.1429	intergene	intergene	TTTAGTTGATCTTAGAACGT	GTTAGTGGGAACATCCAATA	no
9	15255	1358	A	T	0.0882	1ab	similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase	GTTGTCAATACCGTTAGTAT	CTGTGGTAATCATAAACCAA	no
10	15319	1422	C	T	0.0769	1ab	similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase	AGCGAAAATGTTGATGATTT	TACAGGGCTAATTGTGCTGG	no
11	15326	1429	A	G	0.08	1ab	similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase	ATGTTGATGATTTTAATCAA	CTAATTGTGCTGGCAGCGAA	no
12	19937	6040	G	A	0.0714	1ab	similar to ORF1ab polyprotein,similar to NSP16:GBSEP:putative 2-O-ribose methyltransferase	AAAATTTATATGACATTGCA	TAACAGAGACAAGTTGGCAC	no
13	21092	7195	T	C	0.0811	S	similar to spike protein	GTTTCTTATGATTATCAGTG	TTACGTGGTGATAACACTGG	no
14	25794	11897	TT	AA	0.0838	5b	5b protein	CTTAACAAAGCAGGACAAGC	AGGATTAGATTGTGTTTACT	no

*NB: an homopolymer region is set to 'yes' if there is a succession of at least 3 identical nucleotides.
     it looks like a restrictive measure, but Ion Torrent and Nanopore sequencing are very bad on such region, so make sure you verify these variants.

Test set

Input data files to test the program are provided in the test-data directory when you clone the repository of vvv2_display program.

Then you can run one of the following command depending on your expected graphical output.

  • if you don't want coverage depth graphical display in the picture or do not have coverage depth informations of your sample:
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv
  • if you want coverage depth graphical display in the picture (log scale)
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -o test-data/res2_covdepth.txt -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv
  • if you want coverage depth graphical display in the picture (normal scale)
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -o test-data/res2_covdepth.txt -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv -y

Citation

Please, if you use vvv2_display and publish results, cite:

  • The article: Flageul, Alexandre, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, and Fabrice Touzain. "vvv2_align_SE, vvv2_align_PE / vvv2_display: Galaxy-Based Workflows and Tool Designed to Perform, Summarize and Visualize Variant Calling and Annotation in Viral Genome Assemblies". Viruses. 2025;17:1385. https://doi.org/10.3390/v17101385.

And for vardict-java and vadr, respectively:

  • Lai, Zhongwu, Aleksandra Markovets, Miika Ahdesmaki, Brad Chapman, Oliver Hofmann, Robert McEwen, Justin Johnson, Brian Dougherty, J. Carl Barrett, and Jonathan R. Dry. “VarDict: A Novel and Versatile Variant Caller for next-Generation Sequencing in Cancer Research.” Nucleic Acids Research 44, no. 11 (June 20, 2016): e108–e108. https://doi.org/10.1093/nar/gkw227.
  • Schäffer, Alejandro A., Eneida L. Hatcher, Linda Yankie, Lara Shonkwiler, J. Rodney Brister, Ilene Karsch-Mizrachi, and Eric P. Nawrocki. “VADR: Validation and Annotation of Virus Sequence Submissions to GenBank.” BMC Bioinformatics 21, no. 1 (December 2020): 211. https://doi.org/10.1186/s12859-020-3537-3.

Galaxy wrapper

  • vvv2_display.xml: Allow Galaxy integration of vvv2_display.py. vvv2_display can be used in Galaxy pipelines.

it can be found in the Galaxy toolshed at https://toolshed.g2.bx.psu.edu/repository

Related Galaxy workflows on workflowhub

Additional informations / data for upstream programs

  • Poster of the program accepted in JOBIM 2025 conference in Bordeaux (France, July 2025), can be found here: doi: 10.5281/zenodo.16918391 or accessed using these QRcode (A0 pdf, 2.7 MB):

    QRcode_poster

  • Additional vadr database for specific viruses:

Fundings

  • EMERGEN/EMERGEN2 ANR project involving:
    • Agence Nationale de Sécurité Sanitaire de l'Alimentation, de l'Environnement et du Travail
    • Santé Publique France
  • Conseil régional de Bretagne

About

Create png file of variants proportions in an assembly/reference including CDS positions using output of vadr 1.5.1 (annotation) and vardict-java 1.8.3 (variant calling)

Topics

Resources

License

Stars

Watchers

Forks

Packages