Short talk

Visualizing genomic characteristics across an RNA-Seq based reference landscape of normal and neoplastic brain

Visualizing genomic characteristics across an RNA-Seq based reference landscape of normal and neoplastic brain Author(s): Sonali Arora Affiliation(s): Fred Hutch Cancer Center Social media: https://twitter.com/Sonali_bioc In order to better understand the relationship between normal and neoplastic brain, we combined five publicly available large-scale datasets, correcting for batch effects and applying Uniform Manifold Approximation and Projection (UMAP) to RNA-Seq data. We assembled a reference Brain-UMAP including 702 adult gliomas, 802 pediatric tumors and 1409 healthy normal brain samples, which can be utilized to investigate the wealth of information obtained from combining several publicly available datasets to study a single organ site.

Continue reading

Unraveling Immunogenomic Diversity in Single-Cell Data

Unraveling Immunogenomic Diversity in Single-Cell Data Author(s): Ahmad Al Ajami,Annekathrin Silvia Ludt,Federico Marini,Katharina Imkeller Affiliation(s): Neurological Institute (Edinger Institute), University Hospital Frankfurt, Goethe University, Germany Immune molecules such as B and T cell receptors, human leukocyte antigens (HLAs), or killer Ig-like receptors (KIRs) are encoded in the most genetically diverse loci of the human genome. Many of these immune genes are hyperpolymorphic – showing high allelic diversity across human populations.

Continue reading

Tree-based differential testing using inferential replicate counts for RNASeq

Tree-based differential testing using inferential replicate counts for RNASeq Author(s): Noor Pratap Singh,Michael I Love,Rob Patro Affiliation(s): University of Maryland - College Park The discovery of differentially expressed transcripts is an important but challenging problem in transcriptomics. There is substantial uncertainty associated with the abundance estimates of transcripts which, if ignored, can lead to exaggeration of false positives and, if included, may lead to reduced power for transcripts that have high uncertainty.

Continue reading

The Mutational Signature Comprehensive Analysis Toolkit (musicatk) for the Discovery, Prediction, and Exploration of Mutational Signatures

The Mutational Signature Comprehensive Analysis Toolkit (musicatk) for the Discovery, Prediction, and Exploration of Mutational Signatures Author(s): Natasha Gurevich,Aaron Chevalier,Joshua Campbell Affiliation(s): Boston University Mutational signatures are patterns of somatic alterations in the genome caused by carcinogenic exposures or aberrant cellular processes. To provide a comprehensive workflow for preprocessing, analysis, and visualization of mutational signatures, we created the Mutational Signature Comprehensive Analysis Toolkit (musicatk) package. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels.

Continue reading

The crisprVerse: a comprehensive Bioconductor ecosystem for the design of CRISPR guide RNAs across nucleases and technologies

The crisprVerse: a comprehensive Bioconductor ecosystem for the design of CRISPR guide RNAs across nucleases and technologies Author(s): Jean-Philippe Fortin,Luke Hoberecht,Pirunthan Perampalam,Aaron Lun Affiliation(s): Genentech Social media: https://twitter.com/Jaypykw The success of CRISPR-mediated gene perturbation studies is highly dependent on the quality of gRNAs, and several tools have been developed to enable optimal gRNA design. However, these tools are not all adaptable to the latest CRISPR modalities or nucleases, nor do they offer comprehensive annotation methods for advanced CRISPR applications.

Continue reading

Statistical methods for spatial Multiplexed Ion-Beam Imaging data analysis

Statistical methods for spatial Multiplexed Ion-Beam Imaging data analysis Author(s): Shiheng Huang,Pratheepa Jeganathan,Jamie McNicol Affiliation(s): McMaster University This workshop aims to enable users to configure and follow the reproducible research workflow of statistical methods in characterizing the microenvironment of pathological tissues at the molecular level. In this context, an important goal is to explain the heterogeneity observed in disease specimens through investigating the spatial compartments present in the tissue. We will demonstrate the use of latent Dirichlet allocation (LDA) to identify latent cell phenotype communities as well as tessellation to investigate cell phenotype spatial dependence.

Continue reading

Statistical method to rank spatially variable genes adjusted for mean-variance relationship

Statistical method to rank spatially variable genes adjusted for mean-variance relationship Author(s): Kinnary Shah,Boyi Guo,Stephanie Hicks Affiliation(s): Johns Hopkins Bloomberg School of Public Health Social media: https://twitter.com/kinnaryhshah Recently, spatially resolved transcriptomics technologies have emerged that allow us to measure full transcriptome-wide expression in two-dimensional space. Current approaches rank spatially variable genes based on either p-values or some effect size, such as the proportion of spatially variable genes. However, previous work in RNA-seq has shown that a technical bias, referred to as the 'mean-variance relationship”, exists in these data in that the gene-level variance is correlated with mean RNA expression.

Continue reading

Standardization of cell-free methylated DNA immunoprecipitation (cfMeDIP) results for population-scale inference

Standardization of cell-free methylated DNA immunoprecipitation (cfMeDIP) results for population-scale inference Author(s): Nicholas Cheng,Althaf Singhawansa,Sasha Main,Ming Han,Tim Triche,Michael M. Hoffman,Samantha L Wilson,Daniel de Carvalho,Emma Bell Affiliation(s): University Healthcare Network, Toronto, ON, CA Minimally invasive diagnostic procedures using small quantities of biofluids have rapidly gained attention due to their low risk and high information yield. Traditional tissue biopsies not only present higher risk of infection, pain, and bleeding, but also assess a mixture of dormant and active diseased tissue which may or may not represent cells of interest.

Continue reading

Spatial Multi-omic Profiling of Alzheimer’s Disease in the Human Inferior Temporal Cortex

Spatial Multi-omic Profiling of Alzheimer’s Disease in the Human Inferior Temporal Cortex Author(s): Sowmya Parthiban,Sang Ho Kwon,Madhavi Tippani,Heena R Divecha,Jashandeep S Lobana,Stephen Williams,Michelle Mark,Guixia Yu,Julianna Avalos-Gracia,Rahul A Bharadwaj,Joel E Klenman,Thomas M Hyde,Stephanie C Page,Stephanie Hicks,Keri Martinowich,Kristen R Maynard,Leonardo Collado Torres Affiliation(s): Johns Hopkins School of Public Health, Department of Biostatistics Social media: https://twitter.com/sowmyapartybun The Visium SPG (Spatial Proteogenomics) platform, namely 10x Visium coupled with IF (immunofluorescence) protein detection, answers biological questions about spatial association between RNA and protein expression within identical tissue sections.

Continue reading

SPAMMER: Spatial Analysis of Multi-omics Measurements in R

SPAMMER: Spatial Analysis of Multi-omics Measurements in R Author(s): Harkirat Kaur Sohi,Jason E. McDermott,Tong Zhang,Tujin Shi,Sara Jane Gosline Affiliation(s): Pacific Northwest National Labs Most omics-level technologies fail to properly characterize intra-tissue heterogeneity, as sample processing steps require homogenization that confounds spatial signatures. While there are existing tools for general multi-omics data processing, tools for spatially resolved omics data are limited. Further, such tools are almost non-existent for spatial proteomics data.

Continue reading

Slicing and dicing aligned genomic and transcriptomic reads for genetic epidemiology

Slicing and dicing aligned genomic and transcriptomic reads for genetic epidemiology Author(s): Peter Yizhou Huang,Lauren Marie Harmon,Xiaotu Ma,Tim Triche Affiliation(s): Van Andel Institute Rare diseases and conditions present special difficulties for genetic epidemiology. For example, all childhood cancers are rare diseases, as are almost all cancer predisposition syndromes, the majority of primary immunodeficiency conditions, and most chromosomal birth defects. In aggregate, however, rare diseases are not rare; approximately 20% of medical consultations for an identifiable syndromic condition will eventually resolve to a rare genetic condition.

Continue reading

singlet: Fast, scalable, interpretable, in-core analysis of big single-cell data

singlet: Fast, scalable, interpretable, in-core analysis of big single-cell data Author(s): Zachary DeBruine,Tim Triche Affiliation(s): Grand Valley State University Singlet is an R package designed for fast analysis of big and small single-cell data. It provides very fast dimension reduction methods for large datasets using non-negative matrix factorization (NMF) and PCA, both methods faster than the widely used PCA method in Seurat and SCE. This means that Singlet can process millions of single-cell transcriptomes in just a few hours, a task that would be impossible for other software due to limitations in runtime and data structure.

Continue reading

Simplify your BSgenome workflow: Creating BSgenome Data Packages With BSgenomeForge

Simplify your BSgenome workflow: Creating BSgenome Data Packages With BSgenomeForge Author(s): Atuhurira Kirabo Kakopo BSgenome is a Bioconductor package that is used primarily to 'forge', or create BSgenome data packages. BSgenome data packages have many real world applications, such as studying the COVID-19 genomic sequence. However, the current process of forging a BSgenome data package is quite long and confusing, with a few redundant processes like creating a seed file before calling the `forgeBSgenomeDataPkg` function on the seed file to create the package.

Continue reading

SCArray.sat – Large-scale single-cell RNA-seq data analysis using GDS files and Seurat

SCArray.sat – Large-scale single-cell RNA-seq data analysis using GDS files and Seurat Author(s): Xiuwen Zheng,Damian Stichel,Alice Wan Affiliation(s): Genomics Research Center, AbbVie Inc., 1 North Waukegan Rd., North Chicago, IL 60064, US Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of gene expression heterogeneity within complex biological systems. As scRNA-seq technology becomes increasingly accessible and cost-effective, experiments are generating data from larger and larger numbers of cells. However, the analysis of large scRNA-seq data remains a challenge, particularly in terms of scalability.

Continue reading

rworkflows: taming the Wild West of R packages

rworkflows: taming the Wild West of R packages Author(s): Brian Schilder,Alan E Murphy Affiliation(s): Imperial College London Social media: https://twitter.com/BMSchilder Despite calls to improve reproducibility in science, achieving this goal remains elusive even within computational fields. >50% of R packages are currently distributed exclusively through GitHub, a code repository that does not require software to adhere to any coding standards or even run at all. This has contributed to the scientific landscape becoming the “Wild West” in terms of code usability and reproducibility.

Continue reading

Preprocessing and analysis of microRNA-seq data

Preprocessing and analysis of microRNA-seq data Author(s): Matthew Nicholson McCall,Sami Leon,Andrea Baran Affiliation(s): University of Rochester Medical Center Social media: https://twitter.com/matthewnmccall MicroRNAs are a class of small (18-24 nucleotide) RNAs that are essential regulators of gene expression, which act within the RNA-induced silencing complex (RISC) to bind mRNAs and suppress translation (Valencia-Sanchez et al., 2006). Alterations in microRNA expression have been shown to disrupt entire cellular pathways, substantially contributing to a variety of human diseases (Mendell and Olson, 2012).

Continue reading

Practical tools for quantitative deidentification and return of results

Practical tools for quantitative deidentification and return of results Author(s): Lauren Marie Harmon,Wanding Zhou,Samantha Lent,Tim Triche Affiliation(s): Van Andel Institute Epigenomic data, whether from chromatin immunoprecipitation assays, DNA methylation microarrays, or protocols such as ATAC-seq produce results that inhabit a continuum from completely deidentified (and completely opaque, often to study participants as well as investigators) to completely identifiable (as with release of raw sequencing reads). DNA methylation arrays are particularly nettlesome; manufacturers typically include high minor allele frequency SNP probes to help detect sample swaps, but the arrays are implemented as a specific type of genotyping array, which incidentally detect genetic variation at cytosines assayed for methylation.

Continue reading

Outreachy overview

Outreachy overview Author(s): Outreachy team In December 2022, the Bioconductor community began offering internships through Outreachy (outreachy.org), an organization that works with open source and open science communities to provide internships to individuals impacted by systemtic bias and underrepresentation in technology. In this session, Bioconductor's interns from the December 2022 - May 2023 cohort will share their work and experiences. Prospective mentors within the Bioconductor community who are interested in participating in Outreachy can also learn how to submit open source projects.

Continue reading

Orchestrating microbiome multi-omics with R/Bioconductor

Orchestrating microbiome multi-omics with R/Bioconductor Author(s): Leo M Lahti,Tuomas Borman,Sudarshan A Shetty Affiliation(s): Department of Computing, University of Turku, Finland Social media: https://twitter.com/antagomir A number of R/Bioconductor packages for microbiome data science have been released in the recent years. The majority of the existing frameworks and packages focus on the analysis of taxonomic profiling data generated by phylogenetic microarrays, 16S amplicon sequencing, or metagenome analysis. However, there is an increasing need to integrate taxonomic profiles with other measurement types, such as transcriptomics, metabolomics, host genomics, cross-kingdom analysis, and hierarchical side information of the features and samples.

Continue reading

Optimizing signal and correcting for between-cell-type biases in heterogenous spatial and single-cell RNA-seq

Optimizing signal and correcting for between-cell-type biases in heterogenous spatial and single-cell RNA-seq Author(s): Jared T Brown,Lingxin Cheng,Dylan Cable,Zijian Ni,Chitrasen Mohanty,Matthew Bernstein,Christina Kendziorski Newton,Rafael Irizarry Affiliation(s): Department of Data Science, Dana Farber Cancer Institute Proper normalization is an integral step in any RNA-seq preprocessing or analysis pipeline. While methods are well studied for older sequencing technologies, more recent developments still present significant challenges. Of note, the cell-type and tissue heterogeneity represented by these datasets has increased dramatically; both in single-cell and especially in spatial sequencing.

Continue reading

ontoProc - Ontology interfaces for Bioconductor

ontoProc - Ontology interfaces for Bioconductor Author(s): Sara H Stankiewicz,Vincent James Carey Affiliation(s): Brigham and Women's Hospital , Channing Division of Network Medicine Social media: https://www.linkedin.com/in/sara-stankiewicz-bb992135/ The goal of the ontoProc package is to make progress in the adoption and application of ontological discipline in Bioconductor-oriented data analysis. The ontoProc package currently provides 14 formal ontologies in RDA format. The ontologies include vocabularies and concept relationships in the domains of human anatomy, proteins, human diseases, chemical ontologies, and more.

Continue reading

On the Dependency Heaviness of CRAN/Bioconductor Ecosystem

On the Dependency Heaviness of CRAN/Bioconductor Ecosystem Author(s): Zuguang Gu Affiliation(s): German Cancer Research Center The R package ecosystem is expanding fast and dependencies among packages are becoming more complex in the ecosystem. I explored the package dependencies from a new aspect with a new metric named “dependency heaviness”, which measures the number of additional strong dependencies that a package uniquely contributes to its child or downstream packages. I systematically studied how the dependency heaviness spreads from parent to child packages, and how it further spreads to remote downstream packages in the CRAN/Bioconductor ecosystem.

Continue reading

Noninvasive, low-cost RNA-sequencing enhances discovery potential of transcriptome studies

Noninvasive, low-cost RNA-sequencing enhances discovery potential of transcriptome studies Author(s): Molly Martorella,Renee Garcia-Flores,Tuuli Lappalainen Affiliation(s): New York Genome Center Social media: https://twitter.com/SubmarineGene The study of large-scale transcriptomics has great potential in elucidating the effects of genetic variation and has a vital role in achieving the objectives of precision medicine. It can help to identify biomarkers that can indicate disease risk, onset, prognosis, and treatment response, discover new therapies, and assess the effect of environmental or pharmacological exposures.

Continue reading

Non-negative spatial matrix factorization for multi-sample spatial transcriptomics data

Non-negative spatial matrix factorization for multi-sample spatial transcriptomics data Author(s): Yi Wang,Kasper Daniel Hansen Affiliation(s): Department of Biostatistics, Johns Hopkins University Social media: https://twitter.com/Yi_Wang___ Spatial resolved transcriptomics opens the door to analyzing gene expression data in the context of spatial position. Recently, Townes and Engelhardt developed NSF, a non-negative spatial matrix factorization method. NSF can be used to identify spatial-dependent latent factors that are associated with functional anatomical regions. However, the model formulation is currently limited to a single sample.

Continue reading

MultimodalExperiment: Integrative Bulk and Single-Cell Experiment Container

MultimodalExperiment: Integrative Bulk and Single-Cell Experiment Container Author(s): Lucas Schiffer Affiliation(s): Center for Data Science, Rutgers New Jersey Medical School, Newark, NJ, U.S.A. MultimodalExperiment is an S4 class that integrates bulk and single-cell experiment data; it is optimally storage-efficient, and its methods are exceptionally fast. It effortlessly represents multimodal data of any nature and features normalized experiment, subject, sample, and cell annotations which are related to underlying biological experiments through maps.

Continue reading

MolEvolvR: A web-app for characterizing proteins using molecular evolution and phylogeny

MolEvolvR: A web-app for characterizing proteins using molecular evolution and phylogeny Author(s): Janani Ravi,Jacob Dennis Krol Affiliation(s): University of Colorado Anschutz Background: The landscape of protein analyses software/databases is distributed and siloed, for e.g., BLAST suite for homology searches, InterPro for domain scans, and several more packages for individual sequence feature analysis. Often, biologists create in-house pipelines for data cleanup, wrangling, and interoperability, and summarizing and visualizing the disparate outputs.

Continue reading

Modeling the effects of nicotine and smoking exposures on the developing brain

Modeling the effects of nicotine and smoking exposures on the developing brain Author(s): Daianna Gonzalez-Padilla,Leonardo Collado Torres,Keri Martinowich,Kristen R. Maynard,Andrew E. Jaffe Affiliation(s): Lieber Institute for Brain Development Social media: https://twitter.com/daianna_glez Maternal smoking during pregnancy (MSDP) is a major health concern with significant implications for offspring health and well-being, including poor cognitive and behavioral outcomes that could be explained by the influence of prenatal tobacco exposure on brain development. Even when a lot has been investigated around prenatal smoking, very little is known about the nicotine-specific effects on the developing brain.

Continue reading

Model-based Dimensionality Reduction for Single-cell RNA-seq with Generalized Bilinear Models

Model-based Dimensionality Reduction for Single-cell RNA-seq with Generalized Bilinear Models Author(s): Phillip Nicol,Jeffrey W. Miller Affiliation(s): Department of Biostatistics, Harvard University Social media: https://twitter.com/PhillipNicol Dimensionality reduction is a critical step in the analysis of single-cell RNA-seq (scRNA-seq) data. The standard approach is to apply a transformation to the count matrix followed by principal component analysis (PCA). However, this approach can induce spurious heterogeneity and mask true biological variability. An alternative approach is to directly model the counts, but existing methods tend to be computationally intractable on large datasets and do not quantify uncertainty in the low-dimensional representation.

Continue reading

Mentoring Opportunities with Outreachy

Mentoring Opportunities with Outreachy Author(s): Jen Wokaty Affiliation(s): Bioconductor Community Coordinator with Outreachy Bioconductor's commitment to building a diverse and welcoming open source community makes it a great environment for newcomers to participate in open source software. In 2022, the Bioconductor community was accepted to participate in Outreachy, which provides internships for individuals impacted by bias and underrepresentation in technology. In Bioconductor's first cohort, two community members mentored interns on the BSgenomeForge and Sweave2Rmd projects.

Continue reading

lute, a new framework for bulk transcriptomics deconvolution experiments

lute, a new framework for bulk transcriptomics deconvolution experiments Author(s): Sean Maden,Stephanie Hicks Affiliation(s): Johns Hopkins Bloomberg School of Public Health Social media: https://twitter.com/MadenSean Each year sees the publication of many new and innovative algorithms to deconvolve cell type amounts from bulk tissues using single-cell RNA-seq reference datasets. Many of these new algorithms are published to GitHub and Bioconductor, have several of their own dependencies, and are written in different programming languages.

Continue reading

Linear models and empirical Bayes methods for proteome-wide label-free quantification and differential expression in mass spectrometry-based proteomics experiments

Linear models and empirical Bayes methods for proteome-wide label-free quantification and differential expression in mass spectrometry-based proteomics experiments Author(s): Mengbo Li,Gordon K Smyth Affiliation(s): Walter and Eliza Hall Institute of Medical Research Mass spectrometry-based proteomics is a powerful tool in biomedical research, but its usefulness is limited by the frequent occurrence of missing values in peptides that cannot be reliably quantified. Many analysis strategies have been proposed for missing values where the discussion often focuses on distinguishing whether values are missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR).

Continue reading

Interactive analysis of single-cell data using flexible workflows with SCTK2.0

Interactive analysis of single-cell data using flexible workflows with SCTK2.0 Author(s): Yichen Wang,Irzam Sarfraz,Nida Pervaiz,W Evan Johnson,Joshua Campbell Affiliation(s): Boston University School of Medicine Social media: https://twitter.com/camplab1 Analysis of single-cell RNA-seq (scRNA-seq) data can reveal novel insights into heterogeneity of complex biological systems. Many tools and workflows have been developed to perform different types of analysis. However, these tools are spread across different packages or programming environments, rely on different underlying data structures, and can only be utilized by people with knowledge of programming languages.

Continue reading

Injecting rigor and reproducibility into CITE-seq workflows: decontamination and in silico gating approaches

Injecting rigor and reproducibility into CITE-seq workflows: decontamination and in silico gating approaches Author(s): Jae Min Park,Ava Jensen,Santiago Carmona,Joshua Campbell,Tim Triche Affiliation(s): Van Andel Institute Single-cell transcriptomic and proteomic assays have added substantial breadth and depth to our understanding of cellular phenotypes and interactions. Particularly in the study of cellular immunity, the recent CITE-seq and REAP-seq protocols (which simultaneously assay hundreds of cell surface proteins alongside thousands of mRNA transcripts) have provided a robust and scalable means to dissect tissue- and condition-specific roles of individual cells.

Continue reading

GHA-Built: Building binaries for R/Bioconductor packages via Github Actions

GHA-Built: Building binaries for R/Bioconductor packages via Github Actions Author(s): Alexandru Mahmoud Affiliation(s): Bioconductor Core Team / BWH / Channing at HMS Since their introduction, the Bioconductor Docker containers have gained significant popularity as a convenient solution for running an R or RStudio session in an environment pre-configured to work out-of-the-box with little to no user configuration. In order to improve the user experience in this pre-configured environment, the Bioconductor team started building package binaries for the Docker containers, greatly reducing the installation time for Bioconductor and their dependent R packages.

Continue reading

Genotype calling from Recount3 RNA-seq data

Genotype calling from Recount3 RNA-seq data Author(s): Afrooz Razi,Christopher C. Lo,Sirou Wang,Jeffrey T. Leek,Kasper Daniel Hansen Affiliation(s): Department of Human Genetics, Johns Hopkins University School of Medicine. Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health. Biostatistics Program, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center Genotype calling from Recount3 RNA-seq data Advances in high throughput sequencing technologies have enabled gene expression studies of human health and disease.

Continue reading

Forensics for Multi-Omics Data Generation: Diagnosing and Resolving Mislabeled Samples by Integrating Multiple Data Sources

Forensics for Multi-Omics Data Generation: Diagnosing and Resolving Mislabeled Samples by Integrating Multiple Data Sources Author(s): Ryan Conrad Thompson,Matt Johnson,Diane Marie Del Valle,Edgar Gonzalez-Kozlova,Seunghee Kim-Schulze,Kai Nie,Eric Vornholt,Lora Liharska,Brian Kopell,The Mount Sinai COVID-19 Biobank Team,Eric E Schadt,Miriam Merad,Sacha Gnjatic,Carlos Estevez,Alex W Charney,Noam D Beckmann Affiliation(s): Icahn School of Medicine at Mount Sinai Social media: https://twitter.com/DarwinAwdWinner Mislabeling events, which inevitably occur at varying rates in any study involving human handling of samples, pose a serious multi-layered threat to the integrity of biological findings.

Continue reading

Enabling Reusable and Reproducible Genomic Data Management and Analysis in R

Enabling Reusable and Reproducible Genomic Data Management and Analysis in R Author(s): Qian Liu Affiliation(s): Roswell Park Comprehensive Cancer Center Social media: https://twitter.com/QianLiu28878838 Efficient management and analysis of genomic data is becoming increasingly challenging due to the growing volume and complexity of these data and public resources, especially with the widespread adoption of FAIR (findability, accessibility, interoperability, and reusability) data principles and organizational requirements for Data Management and Sharing Plans.

Continue reading

DESpace: a novel analysis framework to discover spatially variable genes

DESpace: a novel analysis framework to discover spatially variable genes Author(s): Peiying Cai,Mark Robinson,Simone Tiberi Affiliation(s): University of Zurich Background Spatially resolved transcriptomics (SRT) technologies allow measuring gene expression profiles, while also retaining information of the spatial tissue. SRT technologies have led to the release of novel methods that take advantage of the joint availability of mRNA abundance and spatial information. Notably, several computational tools have been developed to identify spatially variable genes (SVGs), i.

Continue reading

cytofQC: A better way to clean cytof data

cytofQC: A better way to clean cytof data Author(s): Jill Lundell Affiliation(s): Dana-Farber Cancer Institute Social media: https://twitter.com/JillLundell Cytometry by time of flight, or CyTOF, is a powerful alternative to flow cytometry for quantifying targets on the surface and interior of cells. CyTOF data requires considerable cleaning because many observations are debris, doublets, or calibration beads. As with any technology, the data analysis is only as good as the data itself so careful data cleaning is essential.

Continue reading

CuratedAtlasQueryR: a query API for the CELLxGENE human cell atlas enables defining a body map of immune composition through ageing

CuratedAtlasQueryR: a query API for the CELLxGENE human cell atlas enables defining a body map of immune composition through ageing Author(s): Stefano Mangiola Affiliation(s): WEHI Social media: https://twitter.com/steman_research The Human Cell Atlas, a large-scale single-cell sequencing initiative, has the potential to revolutionise our understanding of human cellular biology and the immune system. Data harmonisation, curation and effective data query are essential to extract knowledge from these complex atlases. Here, we present a harmonised and curated version of the CELLxGENE human cell atlas.

Continue reading

Comparative analysis of annotation tools in grouping cell types from single cell RNA sequencing data

Comparative analysis of annotation tools in grouping cell types from single cell RNA sequencing data Author(s): Meghana Kshirsagar,Gauri Vaidya Affiliation(s): University of Limerick Social media: https://www.linkedin.com/in/meghana-kshirsagar-0843415/ The identification of differentially expressed immune related genes in immune cells and subtypes holds huge potential in predicting prognosis and survival of immunotherapy treatment outcomes. Gene expression profiling can help to identify the patterns of genes expressed in major immune cells amongst cohorts of patients at different stages of cancer which can generate new biological hypotheses.

Continue reading

catchSalmon/catchKallisto: Dividing out quantification uncertainty allows efficient assessment of differential transcript expression

catchSalmon/catchKallisto: Dividing out quantification uncertainty allows efficient assessment of differential transcript expression Author(s): Pedro Baldoni,Yunshun Chen,Soroor Hediyeh-zadeh,Yang Liao,Wei Shi,Gordon K. Smyth Affiliation(s): Walter and Eliza Hall Institute of Medical Research Social media: https://twitter.com/plbaldoni A major challenge in transcript-level RNA-seq data analysis is the inherent variability introduced during quantification of RNA sequencing reads due to the high level of sequence similarity among transcripts annotated to the same genomic locus. The quantification uncertainty of transcript-level counts, which is intractable to measure analytically, introduces an extra level of technical variation that is difficult to estimate and compromises differential transcript expression (DTE) analyses with standard methods developed for gene-level analyses.

Continue reading

bugphyzz: a harmonized data resource and software for enrichment analysis of microbial physiologies

bugphyzz: a harmonized data resource and software for enrichment analysis of microbial physiologies Author(s): Samuel David Gamboa-Tuz,Kelly Eckenrode,Jonathan Xi-Yao Ye,Jennifer Wokaty,Ben Nachod,Eric Franzosa,Nicola Segata,Curtis Huttenhower,Levi Waldron Affiliation(s): CUNY ISPH Social media: https://twitter.com/samueldgamboa Microbiome sequencing allows the study of the abundance and composition of uncultured microbial communities. Tools such as PICRUSt and HUMAnN allow the analysis of microbial molecular functions and metabolic pathways using gene and genome information from highly curated databases such as the NCBI, KEGG, and the Gene Ontology.

Continue reading

Bringing Sweave Vignettes Into The Modern Age With R

Bringing Sweave Vignettes Into The Modern Age With R Author(s): Beryl Kanali Bioconductor has been around for over 20 years and has a lot of older Sweave vignettes that enable embedding R within Latex. However, Sweave is hard to maintain and compiles to PDF. The Sweave2Rmd project is an open source project for converting vignettes from Sweave to R Markdown, which are rendered as HTML files and easier to maintain.

Continue reading

BiocPy: enabling Bioconductor workflows in Python

BiocPy: enabling Bioconductor workflows in Python Author(s): Jayaram Kancherla,Aaron Lun Affiliation(s): Genentech Social media: https://twitter.com/jayaram Analysts today use a variety of languages in their workflows, including R/Bioconductor for statistical analysis and Python for imaging or machine learning tasks. Currently, Python lacks an ecosystem that supports genomic interval-based analyses and data structures for managing genomic experiments. Although single-cell representations have become a de-facto standard in Python, they are not appropriate for all types of genomic experiments, nor do they fully support genomic analysis.

Continue reading

Benchmarking Spot Deconvolution Methods in the Human Dorsolateral Prefrontal Cortex

Benchmarking Spot Deconvolution Methods in the Human Dorsolateral Prefrontal Cortex Author(s): Nicholas J Eagles,Louise A. Huuki-Myers,Abby Spangler,Kelsey D. Montgomery,Sang Ho Kwon,Boyi Guo,Melissa Grant-Peters,Heena R. Divecha,Madhavi Tippani,Chaichontat Sriworarat,Annie B. Nguyen,Prashanti Ravichandran,Matthew N. Tran,Arta Seyedian,Thomas M. Hyde,Joel E. Kleinman,Alexis Battle,Stephanie C. Page,Mina Ryten,Stephanie Hicks,Keri Martinowich,Leonardo Collado Torres,Kristen R. Maynard Affiliation(s): Lieber Institute for Brain Development Spatial transcriptomics is an increasingly popular field of study that allows the measurement of gene-expression information along with spatial coordinates.

Continue reading

AnVILWorkflow: A runnable workflow package for Cloud-implemented analysis pipelines

AnVILWorkflow: A runnable workflow package for Cloud-implemented analysis pipelines Author(s): Sehyun Oh,Martin Morgan,Levi Waldron Affiliation(s): City University of New York Social media: https://twitter.com/drsehyun Advancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. However, the computational infrastructure and skills currently required to leverage the vast quantities of big biological data render such analyses infeasible for basic, translational, and clinical researchers. We developed the software package, AnVILWorkflow, which provides an R-user-friendly working environment to utilize Cloud-implemented workflows.

Continue reading

A phylogenetic method linking nucleotide substitution rates to continuous trait evolution

A phylogenetic method linking nucleotide substitution rates to continuous trait evolution Author(s): Patrick Gemmell Affiliation(s): Harvard University We present a R/C++ software method that relates nucleotide substitution rates to changes in a continuous trait of interest. The method takes as input a multiple sequence alignment of conserved elements, continuous trait data observed in extant species, and a background phylogeny and substitution process. Gibbs sampling is used to assign conservation states (background, conserved, accelerated) to lineages and explore whether the assigned states are associated with increases or decreases in the rate of trait evolution.

Continue reading

A novel statistical method for single isoform proteogenomics inference

A novel statistical method for single isoform proteogenomics inference Author(s): Jordy Bollon,Michael Shortreed,Ben T Jordan,Rachel Miller,Colin Dewey,Gloria M Sheynkman,Simone Tiberi Affiliation(s): Department of Statistical Sciences, University of Bologna, Bologna, Italy Social media: https://twitter.com/tiberi_simone Background Currently, the main strategy to infer proteins is via “bottom-up” proteomics, where proteins are only measured indirectly via peptides. However, most peptides (called shared peptides) map to multiple proteins in the database; this results in ambiguous protein identifications, where various protein isoforms cannot be distinguished, and protein inference is typically abstracted at the gene-level (NB: most genes are associated to multiple isoforms).

Continue reading