Injecting rigor and reproducibility into CITE-seq workflows: decontamination and in silico gating approaches

Injecting rigor and reproducibility into CITE-seq workflows: decontamination and in silico gating approaches


Author(s): Jae Min Park,Ava Jensen,Santiago Carmona,Joshua Campbell,Tim Triche

Affiliation(s): Van Andel Institute



Single-cell transcriptomic and proteomic assays have added substantial breadth and depth to our understanding of cellular phenotypes and interactions. Particularly in the study of cellular immunity, the recent CITE-seq and REAP-seq protocols (which simultaneously assay hundreds of cell surface proteins alongside thousands of mRNA transcripts) have provided a robust and scalable means to dissect tissue- and condition-specific roles of individual cells. However, the most appropriate means to preprocess these assays remains an open research topic with substantial implications for harmonized atlases of cell states and fates. Moreover, the majority of single-cell transcriptomic discoveries are evaluated against flow cytometric and functional characterization. Here we present a comparative evaluation of in silico and flow cytometric gating approaches for analyzing CITE-seq data. We investigate the relative strengths of decontPro and dsb as decontamination tools, and employ the scGate package to simulate in silico gating to allow interpretation of the downstream consequences. Importantly, when isotype controls and mRNA UMI counts are available, conclusions can be substantially affected by decisions to use or ignore these modalities in normalization, decontamination, and clustering. DecontX (which implements DecontPro) is a Bioconductor package that identifies and removes potential cell doublets and contaminating cells from single-cell data. DSB, hosted on CRAN is another package that normalizes and denoises antibody derived tag data from CITE-seq datasets, and pioneered the use of isotype controls for background normalization. scGate (hosted on CRAN) employs the UCell Bioconductor package to enable a reproducible, semi-supervised, in silico gating approach akin to more traditional flow cytometric gating. In conjunction with contemporary preprocessing and clustering-based workflows for CITE-seq data, scGate allows us to compare the outcomes of in silico gating on properly preprocessed CITE-seq data against flow cytometric counts of cells prepared via enrichment protocols. This provides a lens to judge the relative merits of decontamination workflows. Finally, we apply our findings from the above benchmarking experiments to a primary dataset of human bone marrow samples from healthy donors and pediatric leukemia patients. The results hold implications for clinical translation of multimodal single-cell profiling of patients and extensions to patient care in high-risk applications with no standard of care.