demuxSNP: supervised demultiplexing of scRNAseq data using cell hashing and SNPs

demuxSNP: supervised demultiplexing of scRNAseq data using cell hashing and SNPs


Author(s): Michael P Lynch,Laurent Gatto,Aedin C Culhane

Affiliation(s): University of Limerick

Social media: https://twitter.com/AedinCulhane

Sequencing at a single-cell resolution allows unprecedented understanding of biologically relevant differences between individual cells compared to previous bulk methods. Though the cost of sequencing has dropped considerably, multiplexing, that is loading multiple biological samples into each sequencing lane, is widely used to further reduce costs. The obtained sequencing reads must then be demultiplexed or assigned to their respective biological sample. Experimental and computational methods have been proposed to facilitate demultiplexing. We present our approach and its corresponding Bioconductor package ‘demuxSNP’ which overcomes current challenges in demultiplexing scRNAseq reads from genetically distinct biological samples. Demultiplexing is usually done through cell tagging or exploiting genetic differences between donor groups using SNPs. Tagging methods work by experimentally tagging cells in a biological sample with a different HTO (hashtag oligonucleotide) or LMO (lipid modified oligonucleotide) tag prior to sequencing. These tags are then sequenced to form a counts matrix. Due to non-specific binding, counts of a given cell tag form a bimodal distribution consisting of a higher signal distribution and lower background distribution. The performance of these algorithms is highly dependent on the tagging quality. Lower tagging quality is associated with greater overlap in these bimodal distributions and poorer demultiplexing performance. Alternatively, single nucleotide polymorphisms (SNPs) variation between biological samples can be used to perform computational demultiplexing with genotype information (Demuxlet) or genotype free (Vireo, Souporcell). SNPs methods require no cell tagging but require genetically distinct samples. Supervised methods perform well but require the genotype is known which has an associated cost. To address this, genotype free methods were developed which do not require prior knowledge of the SNPs in a biological sample, however struggle to identify rare samples. Performance of SNPs methods are dependent on sequencing depth and decrease with a high presence of ambient RNA. We propose a method utilising data from both tags and SNPs to increase the performance of cell tagging methods. Using cell tagging methods we can confidently demultiplex some but not all cells due to issues with tagging quality. We can train a classifier using the SNP profiles of singlet cells assigned with high confidence from the genetically distinct biological samples. In addition to high confidence singlets, demuxSNP combines singlet SNP profiles from different singlet groups to simulate doublets and includes these in the training data. We can then assign low confidence cells (doublets or singlets which we could not confidently call using cell tagging methods). demuxSNP uses a subset of SNPs in a computationally efficient and cell-type unbiased algorithm. This method overcomes several limitations of current methods. Unlike Demuxlet, there is no additional genotyping required. Unlike genotype free SNPs methods, rare samples can be assigned provided a proportion of the group has adequate tagging quality. The usage of a subset of SNPs reduces computational cost relative to other SNP based methods. The demuxSNP package is submitted to Bioconductor.