Skip to content

Thoughts on algorithmic improvements? (comparison to latest Rspectra) #79

@LTLA

Description

@LTLA

I was toying around with RSpectra, and damn, its truncated SVD got pretty good since you did your comparison a decade-ish ago.

y <- Matrix::rsparsematrix(10000, 5000, density=0.1)

system.time(iout <- irlba::irlba(y, nv=20, nu=20))
##    user  system elapsed 
##   7.862   0.004   7.866 

system.time(sout <- RSpectra::svds(y, k=20))
##    user  system elapsed 
##   4.656   0.000   4.657 

str(iout)
## List of 5
##  $ d    : num [1:20] 53.9 53.8 53.8 53.8 53.7 ...
##  $ u    : num [1:10000, 1:20] -0.02029 0.00435 0.01116 -0.01035 -0.0011 ...
##  $ v    : num [1:5000, 1:20] -0.02178 -0.0053 -0.0031 -0.00887 0.00627 ...
##  $ iter : int 172
##  $ mprod: int 1200

str(sout)
## List of 5
##  $ d    : num [1:20] 53.9 53.8 53.8 53.8 53.7 ...
##  $ u    : num [1:10000, 1:20] -0.02029 0.00435 0.01116 -0.01035 -0.0011 ...
##  $ v    : num [1:5000, 1:20] -0.02178 -0.0053 -0.0031 -0.00887 0.00627 ...
##  $ niter: num 26
##  $ nops : num 834

Almost twice as fast, and pretty much the same results once you take out the indeterminate sign. AFAICT this isn't a quality-of-implementation issue as my C++ port is about the same speed as your original R package and it uses different code/libraries for every step (matrix multiplication, the internal SVD, etc.). So there is some algorithmic improvement in Spectra's svds() that causes it to converge much faster, as evidenced by the lower niter and nops.

Is there any scope for similar improvements to IRLBA? It's not a big deal, but I'd like to try to squeeze a bit more performance out of my C++ library if I can. I don't know enough math to have any good ideas here (my best guess would be to put a cap on the number of iterations in the Lancoz process to avoid a quadratic increase in the number of matrix-vector multiplications with increasing nv or nu) but if you know of any relevant advancements, I'd be happy to implement them and try them out.

Session information
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so 
LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so;  LAPACK version 3.12.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Sydney
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.6.0  Matrix_1.7-5    tools_4.6.0     Rcpp_1.1.1     
[5] RSpectra_0.16-2 grid_4.6.0      irlba_2.3.7     lattice_0.22-9 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions