Hello!
I was trying SCEVAN on the paired samples and got multiple errors while trying to run multiSampleComparisonClonalCN.
First, I've set up paths, load and subset my SCE:
library(SingleCellExperiment)
library(SCEVAN)
library(tidyverse)
sce_input_path <- "/mnt/data/SCE/annotated_tumor.sce"
path_output <- "/mnt/data/SCEVAN"
tumor_cells_label <- "MCL cells"
patient <- "P009"
path_output_dg <- file.path(path_output, "output", patient, "DG")
path_output_rel <- file.path(path_output, "output", patient, "REL")
path_output_comparison <- file.path(path_output, "output", patient, "DGandREL")
sce <- readRDS(sce_input_path)
sce <- sce[, sce$Patient == patient]
sce
class: SingleCellExperiment
dim: 36601 5364
metadata(91): Samples Sample ... Replicate scDblFinder.threshold
assays(2): counts logcounts
rownames(36601): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
rowData names(3): ID Symbol Type
colnames(5364): P009_REL-AAACCCAAGAGCCTGA P009_REL-AAACGAACATGGGCAA ... P009_DG-TTTGTTGTCGCCTTGT P009_DG-TTTGTTGTCGGAATGG
colData names(40): Sample Barcode ... cell_type_main cell_type_manual
reducedDimNames(3): PCA UMAP tricycleEmbedding
mainExpName: NULL
altExpNames(0):
And prepare SCEVAN inputs for both diagnosis and relapse cells:
sce_dg <- sce[, sce$Timepoint == "DG"]
sample_dg <- unique(sce_dg$Sample)
norm_cells_dg <- colnames(sce_dg[, sce_dg$cell_type_manual != tumor_cells_label])
rownames(sce_dg) <- rowData(sce_dg)[["ID"]]
count_matrix_dg <- counts(sce_dg)
sce_rel <- sce[, sce$Timepoint == "REL"]
sample_rel <- unique(sce_rel$Sample)
norm_cells_rel <- colnames(sce_rel[, sce_rel$cell_type_manual != tumor_cells_label])
rownames(sce_rel) <- rowData(sce_rel)[["ID"]]
count_matrix_rel <- counts(sce_rel)
When I run pipelineCNA for a single timepoint it works as expected:
diagnosis_results <- pipelineCNA(
count_mtx = count_matrix_dg,
sample = sample_dg,
FIXED_NORMAL_CELLS = TRUE,
norm_cell = norm_cells_dg,
organism = "human",
SUBCLONES = TRUE,
ClonalCN = TRUE,
output_dir = path_output_dg,
par_cores = 16
)
However, running multiSampleComparisonClonalCN results in a file read error:
results <- multiSampleComparisonClonalCN(
listCountMtx = list("Diagnosis" = count_matrix_dg, "Relapse" = count_matrix_rel),
listNormCells = list("Diagnosis" = norm_cells_dg, "Relapse" = norm_cells_rel),
analysisName = patient,
organism = "human",
par_cores = 16,
output_dir = path_output_comparison
)
[1] " raw data - genes: 36601 cells: 2780"
[1] "1) Filter: cells > 200 genes"
[1] "2) Filter: genes > 10% of cells"
[1] "7923 genes past filtering"
[1] "3) Annotations gene coordinates"
[1] "7636 genes annotated"
[1] "4) Filter: genes involved in the cell cycle"
[1] "7178 genes past filtering "
[1] "5) Filter: cells > 5genes per chromosome "
[1] "6) Log Freeman Turkey transformation"
[1] "A total of 2779 cells, 7178 genes after preprocessing"
[1] "7) Measuring baselines (confident normal cells)"
[1] "8) Smoothing data"
[1] "9) Segmentation (VegaMC)"
[1] "10) Adjust baseline"
[1] "11) plot heatmap"
[1] "found 2089 tumor cells"
[1] "time classify tumor cells: 1.13240625858307"
Error in file(file, "rt") : cannot open the connection
8. file(file, "rt")
7. read.table(file.path(path, paste0(sample, "_Clonal_CN.seg")),
sep = "\t", header = TRUE, as.is = TRUE)
6. getScevanCNVfinal(sample)
5. plotCNclonal(sample, ClonalCN, organism, output_dir = output_dir)
4. pipelineCNA(listCountMtx[[x]], norm_cell = listNormCells[[x]],
sample = x, SUBCLONES = FALSE, ClonalCN = TRUE, par_cores = par_cores,
organism = organism, output_dir = output_dir)
3. FUN(X[[i]], ...)
2. lapply(names(listCountMtx), function(x) {
pipelineCNA(listCountMtx[[x]], norm_cell = listNormCells[[x]],
sample = x, SUBCLONES = FALSE, ClonalCN = TRUE, par_cores = par_cores,
organism = organism, output_dir = output_dir) ...
1. multiSampleComparisonClonalCN(listCountMtx = list(Diagnosis = count_matrix_dg,
Relapse = count_matrix_rel), listNormCells = list(Diagnosis = norm_cells_dg,
Relapse = norm_cells_rel), analysisName = patient, organism = "human",
par_cores = 16, output_dir = path_output_comparison)
Same error happens when trying to run pipelineCNA with SUBCLONES = FALSE. I tried forking and changing subclones to TRUE in pipelineCNA calls of multiSampleComparisonClonalCN, but that did not fix the issue. And I'm not sure if it is supposed to be on here.
However, looking through the code, I found unparametrized "./output" directories in some functions. Running multiSampleComparisonClonalCN with fixed output path allows me to progress further but I get another error ultimately:
setwd(path_output_comparison)
multiSampleComparisonClonalCN(
listCountMtx = list("Diagnosis" = count_matrix_dg, "Relapse" = count_matrix_rel),
listNormCells = list("Diagnosis" = norm_cells_dg, "Relapse" = norm_cells_rel),
analysisName = patient,
organism = "human",
par_cores = 16,
output_dir = "./output"
)
Error in mtx[!Dupl_GeneName, ] : incorrect number of dimensions
2. annotateGenes(genesMtx)
1. multiSampleComparisonClonalCN(listCountMtx = list(Diagnosis = count_matrix_dg,
Relapse = count_matrix_rel), listNormCells = list(Diagnosis = norm_cells_dg,
Relapse = norm_cells_rel), analysisName = patient, organism = "human",
par_cores = 16, output_dir = "./output")
I tried inspecting annotateGenes code in hopes of patching it, and got confused about where does EnsDB_Hsapiens_v86 and EnsDb_Mmusculus_v79 data come from. Weirdly enough, annotateGenes is called in pipelineCNA where it works normally.
Could you please look into it? I really like the speed that SCEVAN provides for CNV inference, and, aside from these multisample issues, I think it works great and I'd really love to use it in my research.
Unrelated to these technical issues, do I understand it correctly, that running multiSampleComparisonClonalCN would allow me to see which subclone were identified in both timepoints and plot a comprehensive phylogeny for the relapsed patient? Or, rather, would it just compare the samples without intersecting their subclonal structure (hence SUBCLONES = FALSE in intrafunction pipelineCNA calls)?
Best regards
Dmitrij
My session info just in case:
R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] fgsea_1.32.4 ggrepel_0.9.6 ggtree_3.14.0 ape_5.8-1 tidytree_0.4.6
[6] Rtsne_0.17 igraph_2.1.4 scran_1.34.0 scuttle_1.16.0 lubridate_1.9.4
[11] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
[16] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.2 tidyverse_2.0.0 SCEVAN_1.0.3
[21] SingleCellExperiment_1.28.1 SummarizedExperiment_1.36.0 Biobase_2.66.0 GenomicRanges_1.58.0 GenomeInfoDb_1.42.3
[26] IRanges_2.40.1 S4Vectors_0.44.0 BiocGenerics_0.52.0 MatrixGenerics_1.18.1 matrixStats_1.5.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_2.0.0 magrittr_2.0.3 farver_2.1.2 rmarkdown_2.29
[7] fs_1.6.5 zlibbioc_1.52.0 ragg_1.3.3 vctrs_0.6.5 memoise_2.0.1 htmltools_0.5.8.1
[13] S4Arrays_1.6.0 usethis_3.1.0 curl_6.2.2 BiocNeighbors_2.0.1 gridGraphics_0.5-1 SparseArray_1.6.2
[19] sass_0.4.9 bslib_0.9.0 htmlwidgets_1.6.4 desc_1.4.3 cachem_1.1.0 mime_0.13
[25] lifecycle_1.0.4 pkgconfig_2.0.3 rsvd_1.0.5 Matrix_1.7-2 R6_2.6.1 fastmap_1.2.0
[31] GenomeInfoDbData_1.2.13 shiny_1.10.0 aplot_0.2.5 digest_0.6.37 colorspace_2.1-1 patchwork_1.3.0
[37] ps_1.9.0 dqrng_0.4.1 irlba_2.3.5.1 pkgload_1.4.0 textshaping_1.0.0 beachmat_2.22.0
[43] labeling_0.4.3 timechange_0.3.0 httr_1.4.7 abind_1.4-8 compiler_4.4.3 remotes_2.5.0
[49] withr_3.0.2 BiocParallel_1.40.2 pkgbuild_1.4.7 DelayedArray_0.32.0 sessioninfo_1.2.3 bluster_1.16.0
[55] tools_4.4.3 httpuv_1.6.15 glue_1.8.0 callr_3.7.6 nlme_3.1-167 promises_1.3.2
[61] grid_4.4.3 cluster_2.1.8 generics_0.1.3 gtable_0.3.6 tzdb_0.5.0 data.table_1.17.0
[67] hms_1.1.3 BiocSingular_1.22.0 ScaledMatrix_1.14.0 metapod_1.14.0 XVector_0.46.0 pillar_1.10.2
[73] yulab.utils_0.2.0 limma_3.62.2 later_1.4.2 treeio_1.30.0 lattice_0.22-6 tidyselect_1.2.1
[79] locfit_1.5-9.12 miniUI_0.1.1.1 knitr_1.50 edgeR_4.4.2 xfun_0.52 statmod_1.5.0
[85] devtools_2.4.5 pheatmap_1.0.12 stringi_1.8.7 UCSC.utils_1.2.0 ggfun_0.1.8 lazyeval_0.2.2
[91] yaml_2.3.10 evaluate_1.0.3 codetools_0.2-20 ggplotify_0.1.2 cli_3.6.4 RcppParallel_5.1.10
[97] xtable_1.8-4 systemfonts_1.2.2 munsell_0.5.1 processx_3.8.6 jquerylib_0.1.4 Rcpp_1.0.14
[103] ellipsis_0.3.2 profvis_0.4.0 urlchecker_1.0.1 parallelDist_0.2.6 scales_1.3.0 crayon_1.5.3
[109] rlang_1.1.5 fastmatch_1.1-6 cowplot_1.1.3
Hello!
I was trying SCEVAN on the paired samples and got multiple errors while trying to run
multiSampleComparisonClonalCN.First, I've set up paths, load and subset my SCE:
And prepare SCEVAN inputs for both diagnosis and relapse cells:
When I run
pipelineCNAfor a single timepoint it works as expected:However, running
multiSampleComparisonClonalCNresults in a file read error:Same error happens when trying to run
pipelineCNAwithSUBCLONES = FALSE. I tried forking and changing subclones toTRUEinpipelineCNAcalls of multiSampleComparisonClonalCN, but that did not fix the issue. And I'm not sure if it is supposed to be on here.However, looking through the code, I found unparametrized "./output" directories in some functions. Running multiSampleComparisonClonalCN with fixed output path allows me to progress further but I get another error ultimately:
I tried inspecting
annotateGenescode in hopes of patching it, and got confused about where doesEnsDB_Hsapiens_v86andEnsDb_Mmusculus_v79data come from. Weirdly enough,annotateGenesis called inpipelineCNAwhere it works normally.Could you please look into it? I really like the speed that SCEVAN provides for CNV inference, and, aside from these multisample issues, I think it works great and I'd really love to use it in my research.
Unrelated to these technical issues, do I understand it correctly, that running
multiSampleComparisonClonalCNwould allow me to see which subclone were identified in both timepoints and plot a comprehensive phylogeny for the relapsed patient? Or, rather, would it just compare the samples without intersecting their subclonal structure (henceSUBCLONES = FALSEin intrafunctionpipelineCNAcalls)?Best regards
Dmitrij
My session info just in case: