DESpace: a novel analysis framework to discover spatially variable genes
Author(s): Peiying Cai,Mark Robinson,Simone Tiberi
Affiliation(s): University of Zurich
Background Spatially resolved transcriptomics (SRT) technologies allow measuring gene expression profiles, while also retaining information of the spatial tissue. SRT technologies have led to the release of novel methods that take advantage of the joint availability of mRNA abundance and spatial information. Notably, several computational tools have been developed to identify spatially variable genes (SVGs), i.e., genes whose expression profiles vary across tissue. Nonetheless, current approaches for SVG detection present some limitations; in particular: i) most methods are computationally intensive; ii) biological replicates are not allowed; iii) information about known spatial structures (usually) cannot be incorporated ; iv) testing cannot be performed on specific regions of interest (e.g., white matter in brain cortex). Methodology We propose DESpace, an intuitive framework for identifying SVGs based on differential testing across spatial clusters. These clusters represent spatially neighbouring cells with similar expression profiles, and can be obtained via spatial clustering tools (e.g., BayesSpace, StLearn, Giotto and PRECAST), or via pathologists’ annotations. We use these clusters as a proxy for the actual spatial information. We then employ edgeR, a popular tool for differential expression analyses, to perform differential testing across spatial clusters. Intuitively, if the mRNA abundance of a gene is significantly associated to the spatial clusters, then it varies across the tissue, which indicates a SVG. Clearly, our framework relies on spatial clusters being available and summarizing the main spatial features of the data. Nonetheless, even in the absence of pre-computed annotations, spatially resolved clustering tools allow generating clusters that accurately summarize the spatial structure of gene expression. Additionally, DESpace presents some unique features compared to currently available SVG tools; in fact, our framework: i) can model multiple samples, reducing the uncertainty that characterizes inference performed from individual samples, and identifying genes with coherent spatial patterns across biological replicates; ii) allows identifying the key areas of the tissue affected by SVG, testing if the average expression in a particular region of interest (e.g., cancer tissue) is significantly higher or lower than the average expression of the remaining tissue (e.g., non-cancer tissue), hence enabling scientists to investigate changes in mRNA abundance in specific areas which may be of particular interest. Finally, our method is flexible, and can input any type of SRT data. Benchmarking We performed extensive benchmarks of our approach and various competitors (MERINGUE, nnSVG, SpaGCN, SPARK, SPARK-X, SpatialDE, SpatialDE2, and trendsceek). In particular, starting from three real spatial omics datasets as anchor data, we generated various semi-simulated datasets, with a wide variety of spatial patterns. Our approach displays well calibrated false discovery rates, and higher true positive rate than all competitors considered. Furthermore, when analyzing real data, the genes identified by DESpace are more coherent across replicates, than those detected by other SVG methods. Availability DESpace is implemented as an R package, currently available on GitHub: https://github.com/peicai/DESpace DESpace should appear on Bioconductor in a few weeks. A pre-print (in preparation) will also follow in the coming weeks.