singlet: Fast, scalable, interpretable, in-core analysis of big single-cell data

singlet: Fast, scalable, interpretable, in-core analysis of big single-cell data


Author(s): Zachary DeBruine,Tim Triche

Affiliation(s): Grand Valley State University



Singlet is an R package designed for fast analysis of big and small single-cell data. It provides very fast dimension reduction methods for large datasets using non-negative matrix factorization (NMF) and PCA, both methods faster than the widely used PCA method in Seurat and SCE. This means that Singlet can process millions of single-cell transcriptomes in just a few hours, a task that would be impossible for other software due to limitations in runtime and data structure. In addition to standard dimension reduction, Singlet also provides spatially-aware dimension reduction using Graph-Convolutional NMF (GCNMF). All operations are fully parallelized for maximum efficiency and implemented using the Eigen C++ linear algebra library, with zero-copy Rcpp bindings. Singlet uses a dynamic sparse matrix data structure that occupies 2-8x less memory than R's dgCMatrix. Singlet objects can be converted to Seurat or SingleCellExperiment objects at any time. Many of the machine learning routines are based on lower-level methods from the RcppML R package, also on CRAN. Singlet also ships with a pre-trained NMF model of the Chan Zuckerberg Institute's CellCensus, which contains 15 million 10x 3' transcriptomes, making it easy to, for example, annotate cells using built-in transfer learning utilities (L2-regularized NNLS). Singlet v1.0 will soon be available on CRAN and makes it easy to analyze even very big collections of single-cell datasets in-core.