Model-based Dimensionality Reduction for Single-cell RNA-seq with Generalized Bilinear Models

Model-based Dimensionality Reduction for Single-cell RNA-seq with Generalized Bilinear Models


Author(s): Phillip Nicol,Jeffrey W. Miller

Affiliation(s): Department of Biostatistics, Harvard University

Social media: https://twitter.com/PhillipNicol

Dimensionality reduction is a critical step in the analysis of single-cell RNA-seq (scRNA-seq) data. The standard approach is to apply a transformation to the count matrix followed by principal component analysis (PCA). However, this approach can induce spurious heterogeneity and mask true biological variability. An alternative approach is to directly model the counts, but existing methods tend to be computationally intractable on large datasets and do not quantify uncertainty in the low-dimensional representation. To address these shortcomings, we develop scGBM, a novel method for model-based dimensionality reduction of scRNA-seq data. scGBM employs a scalable algorithm based on weighted low rank approximations to fit a Poisson bilinear model to datasets with millions of cells. Furthermore, scGBM quantifies the uncertainty in each cell’s latent position and leverages these uncertainties to assess the confidence associated with a given cell clustering. On real and simulated single-cell data, we find that scGBM produces low-dimensional embeddings that better capture relevant biological information while removing unwanted variation. scGBM is implemented as an R package and will be made available to the public shortly.