Errors in multiSampleComparisonClonalCN

Hello!

I was trying SCEVAN on the paired samples and got multiple errors while trying to run `multiSampleComparisonClonalCN`.

First, I've set up paths, load and subset my SCE:
```r
library(SingleCellExperiment)
library(SCEVAN)
library(tidyverse)

sce_input_path <- "/mnt/data/SCE/annotated_tumor.sce"
path_output <- "/mnt/data/SCEVAN"

tumor_cells_label <- "MCL cells"
patient <- "P009"

path_output_dg <- file.path(path_output, "output", patient, "DG")
path_output_rel <- file.path(path_output, "output", patient, "REL")
path_output_comparison <- file.path(path_output, "output", patient, "DGandREL")

sce <- readRDS(sce_input_path)

sce <- sce[, sce$Patient == patient]
sce
```
```
class: SingleCellExperiment 
dim: 36601 5364 
metadata(91): Samples Sample ... Replicate scDblFinder.threshold
assays(2): counts logcounts
rownames(36601): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
rowData names(3): ID Symbol Type
colnames(5364): P009_REL-AAACCCAAGAGCCTGA P009_REL-AAACGAACATGGGCAA ... P009_DG-TTTGTTGTCGCCTTGT P009_DG-TTTGTTGTCGGAATGG
colData names(40): Sample Barcode ... cell_type_main cell_type_manual
reducedDimNames(3): PCA UMAP tricycleEmbedding
mainExpName: NULL
altExpNames(0):
```

And prepare SCEVAN inputs for both diagnosis and relapse cells:
```r
sce_dg <- sce[, sce$Timepoint == "DG"]
sample_dg <- unique(sce_dg$Sample)
norm_cells_dg <- colnames(sce_dg[, sce_dg$cell_type_manual != tumor_cells_label])
rownames(sce_dg) <- rowData(sce_dg)[["ID"]]
count_matrix_dg <- counts(sce_dg)

sce_rel <- sce[, sce$Timepoint == "REL"]
sample_rel <- unique(sce_rel$Sample)
norm_cells_rel <- colnames(sce_rel[, sce_rel$cell_type_manual != tumor_cells_label])
rownames(sce_rel) <- rowData(sce_rel)[["ID"]]
count_matrix_rel <- counts(sce_rel)
```

When I run `pipelineCNA` for a single timepoint it works as expected:
```r
diagnosis_results <- pipelineCNA(
    count_mtx = count_matrix_dg,
    sample = sample_dg, 
    FIXED_NORMAL_CELLS = TRUE,
    norm_cell = norm_cells_dg,
    organism = "human",
    SUBCLONES = TRUE,
    ClonalCN = TRUE,
    output_dir = path_output_dg,
    par_cores = 16
)
```

However, running `multiSampleComparisonClonalCN` results in a file read error:
```r
results <- multiSampleComparisonClonalCN(
  listCountMtx = list("Diagnosis" = count_matrix_dg, "Relapse" = count_matrix_rel),
  listNormCells = list("Diagnosis" = norm_cells_dg, "Relapse" = norm_cells_rel),
  analysisName = patient,
  organism = "human",
  par_cores = 16,
  output_dir = path_output_comparison
)
```
```
[1] " raw data - genes: 36601 cells: 2780"
[1] "1) Filter: cells > 200 genes"
[1] "2) Filter: genes > 10% of cells"
[1] "7923 genes past filtering"
[1] "3) Annotations gene coordinates"
[1] "7636 genes annotated"
[1] "4) Filter: genes involved in the cell cycle"
[1] "7178 genes past filtering "
[1] "5)  Filter: cells > 5genes per chromosome "
[1] "6) Log Freeman Turkey transformation"
[1] "A total of 2779 cells, 7178 genes after preprocessing"
[1] "7) Measuring baselines (confident normal cells)"
[1] "8) Smoothing data"
[1] "9) Segmentation (VegaMC)"
[1] "10) Adjust baseline"
[1] "11) plot heatmap"
[1] "found 2089 tumor cells"
[1] "time classify tumor cells:  1.13240625858307"
Error in file(file, "rt") : cannot open the connection
8. file(file, "rt")
7. read.table(file.path(path, paste0(sample, "_Clonal_CN.seg")), 
    sep = "\t", header = TRUE, as.is = TRUE)
6. getScevanCNVfinal(sample)
5. plotCNclonal(sample, ClonalCN, organism, output_dir = output_dir)
4. pipelineCNA(listCountMtx[[x]], norm_cell = listNormCells[[x]], 
    sample = x, SUBCLONES = FALSE, ClonalCN = TRUE, par_cores = par_cores, 
    organism = organism, output_dir = output_dir)
3. FUN(X[[i]], ...)
2. lapply(names(listCountMtx), function(x) {
    pipelineCNA(listCountMtx[[x]], norm_cell = listNormCells[[x]], 
        sample = x, SUBCLONES = FALSE, ClonalCN = TRUE, par_cores = par_cores, 
        organism = organism, output_dir = output_dir) ...
1. multiSampleComparisonClonalCN(listCountMtx = list(Diagnosis = count_matrix_dg, 
    Relapse = count_matrix_rel), listNormCells = list(Diagnosis = norm_cells_dg, 
    Relapse = norm_cells_rel), analysisName = patient, organism = "human", 
    par_cores = 16, output_dir = path_output_comparison)
```

Same error happens when trying to run `pipelineCNA` with `SUBCLONES = FALSE`. I tried forking and changing subclones to `TRUE` in `pipelineCNA` calls of multiSampleComparisonClonalCN, but that did not fix the issue. And I'm not sure if it is supposed to be on here.

However, looking through the code, I found unparametrized "./output" directories in some functions. Running multiSampleComparisonClonalCN with fixed output path allows me to progress further but I get another error ultimately:
```r
setwd(path_output_comparison)

multiSampleComparisonClonalCN(
  listCountMtx = list("Diagnosis" = count_matrix_dg, "Relapse" = count_matrix_rel),
  listNormCells = list("Diagnosis" = norm_cells_dg, "Relapse" = norm_cells_rel),
  analysisName = patient,
  organism = "human",
  par_cores = 16,
  output_dir = "./output"
)
```
```
 Error in mtx[!Dupl_GeneName, ] : incorrect number of dimensions
2. annotateGenes(genesMtx)
1. multiSampleComparisonClonalCN(listCountMtx = list(Diagnosis = count_matrix_dg, 
    Relapse = count_matrix_rel), listNormCells = list(Diagnosis = norm_cells_dg, 
    Relapse = norm_cells_rel), analysisName = patient, organism = "human", 
    par_cores = 16, output_dir = "./output")
```
I tried inspecting `annotateGenes` code in hopes of patching it, and got confused about where does `EnsDB_Hsapiens_v86` and `EnsDb_Mmusculus_v79` data come from. Weirdly enough, `annotateGenes` is called in `pipelineCNA` where it works normally.

Could you please look into it? I really like the speed that SCEVAN provides for CNV inference, and, aside from these multisample issues, I think it works great and I'd really love to use it in my research.

Unrelated to these technical issues, do I understand it correctly, that running `multiSampleComparisonClonalCN` would allow me to see which subclone were identified in both timepoints and plot a comprehensive phylogeny for the relapsed patient? Or, rather, would it just compare the samples without intersecting their subclonal structure (hence `SUBCLONES = FALSE` in intrafunction `pipelineCNA` calls)?

Best regards
Dmitrij



My session info just in case:
```
R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fgsea_1.32.4                ggrepel_0.9.6               ggtree_3.14.0               ape_5.8-1                   tidytree_0.4.6             
 [6] Rtsne_0.17                  igraph_2.1.4                scran_1.34.0                scuttle_1.16.0              lubridate_1.9.4            
[11] forcats_1.0.0               stringr_1.5.1               dplyr_1.1.4                 purrr_1.0.4                 readr_2.1.5                
[16] tidyr_1.3.1                 tibble_3.2.1                ggplot2_3.5.2               tidyverse_2.0.0             SCEVAN_1.0.3               
[21] SingleCellExperiment_1.28.1 SummarizedExperiment_1.36.0 Biobase_2.66.0              GenomicRanges_1.58.0        GenomeInfoDb_1.42.3        
[26] IRanges_2.40.1              S4Vectors_0.44.0            BiocGenerics_0.52.0         MatrixGenerics_1.18.1       matrixStats_1.5.0          

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3      rstudioapi_0.17.1       jsonlite_2.0.0          magrittr_2.0.3          farver_2.1.2            rmarkdown_2.29         
  [7] fs_1.6.5                zlibbioc_1.52.0         ragg_1.3.3              vctrs_0.6.5             memoise_2.0.1           htmltools_0.5.8.1      
 [13] S4Arrays_1.6.0          usethis_3.1.0           curl_6.2.2              BiocNeighbors_2.0.1     gridGraphics_0.5-1      SparseArray_1.6.2      
 [19] sass_0.4.9              bslib_0.9.0             htmlwidgets_1.6.4       desc_1.4.3              cachem_1.1.0            mime_0.13              
 [25] lifecycle_1.0.4         pkgconfig_2.0.3         rsvd_1.0.5              Matrix_1.7-2            R6_2.6.1                fastmap_1.2.0          
 [31] GenomeInfoDbData_1.2.13 shiny_1.10.0            aplot_0.2.5             digest_0.6.37           colorspace_2.1-1        patchwork_1.3.0        
 [37] ps_1.9.0                dqrng_0.4.1             irlba_2.3.5.1           pkgload_1.4.0           textshaping_1.0.0       beachmat_2.22.0        
 [43] labeling_0.4.3          timechange_0.3.0        httr_1.4.7              abind_1.4-8             compiler_4.4.3          remotes_2.5.0          
 [49] withr_3.0.2             BiocParallel_1.40.2     pkgbuild_1.4.7          DelayedArray_0.32.0     sessioninfo_1.2.3       bluster_1.16.0         
 [55] tools_4.4.3             httpuv_1.6.15           glue_1.8.0              callr_3.7.6             nlme_3.1-167            promises_1.3.2         
 [61] grid_4.4.3              cluster_2.1.8           generics_0.1.3          gtable_0.3.6            tzdb_0.5.0              data.table_1.17.0      
 [67] hms_1.1.3               BiocSingular_1.22.0     ScaledMatrix_1.14.0     metapod_1.14.0          XVector_0.46.0          pillar_1.10.2          
 [73] yulab.utils_0.2.0       limma_3.62.2            later_1.4.2             treeio_1.30.0           lattice_0.22-6          tidyselect_1.2.1       
 [79] locfit_1.5-9.12         miniUI_0.1.1.1          knitr_1.50              edgeR_4.4.2             xfun_0.52               statmod_1.5.0          
 [85] devtools_2.4.5          pheatmap_1.0.12         stringi_1.8.7           UCSC.utils_1.2.0        ggfun_0.1.8             lazyeval_0.2.2         
 [91] yaml_2.3.10             evaluate_1.0.3          codetools_0.2-20        ggplotify_0.1.2         cli_3.6.4               RcppParallel_5.1.10    
 [97] xtable_1.8-4            systemfonts_1.2.2       munsell_0.5.1           processx_3.8.6          jquerylib_0.1.4         Rcpp_1.0.14            
[103] ellipsis_0.3.2          profvis_0.4.0           urlchecker_1.0.1        parallelDist_0.2.6      scales_1.3.0            crayon_1.5.3           
[109] rlang_1.1.5             fastmatch_1.1-6         cowplot_1.1.3          
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors in multiSampleComparisonClonalCN #142

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Errors in multiSampleComparisonClonalCN #142

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions