Bridges from python to Bioconductor: applications in genetics and single-cell genomics**
Author(s): Vincent James Carey
Affiliation(s): Channing Division of Network Medicine, Harvard Medical School
Multilingual data science strategies can increase efficiency of discovery by taking advantage of diverse data management and analysis strategies. In this workshop we will examine interplay between R, Python, and Apache Spark in genetic and single-cell applications. CITE-seq studies simultaneously quantify surface protein and mRNA abundance in single cells. We will use scviR to compare interpretations based on deep learning and sequential component-specific methods. The UK Biobank is the foundation of thousands of genome-wide association studies. The Telomere-to-Telomere project produced the first gapless human reference genome. Both of these resources will be explored using BiocHail. Workshop attendees will acquire an understanding of Aaron Lun's basilisk package and its use in isolating specific collections of python modules, the anndata representations and scvi-tools analyses of CITE-seq data, and the hail.is approach to structuring and analyzing massive genetics data resources using Spark Resilient Distributed Data. All programming will be carried out in R; quarto documents that mix R and python will also be illustrated.
Source code