A Bioconductor-style differential expression analysis powered by SPEAQeasy*
Author(s): Daianna Gonzalez-Padilla,Renee Garcia-Flores,Nicholas J Eagles,Leonardo Collado Torres
Affiliation(s): Lieber Institute for Brain Development
Social media: https://twitter.com/daianna_glez
With the increase in research projects involving data from RNA sequencing (RNA-seq), there has been an increase in the availability of software designed to perform the required preparation steps prior to statistical analyses such as differential expression. Many of these programs focus on performing only one of the mandatory steps in the pre-analysis, which makes it necessary to use multiple programs in one single pipeline. Integrating these programs can require substantial time and computational experience. Additionally, much of the existing software is no longer maintained by their creators, which means having to make changes to already established pipelines and reinvesting time in finding new tools to use. SPEAQeasy (doi: 10.1186/s12859-021-04142-3), which stands for Scalable Pipeline for Expression Analysis and Quantification, was created to overcome these challenges when analyzing RNA-seq data. SPEAQeasy has an easy installation and allows users to implement it in multiple computational frameworks (SGE, SLURM, local execution, Docker/Singularity integration). This minimizes the amount of previous computational knowledge required for its implementation and makes it accessible to a wider audience of researchers. Also, one of the software’s outstanding characteristics is that it generates RangedSummarizedExperiment (RSE) objects, which allows a smoother transition to R/Bioconductor downstream analyses. The aim of this workshop is to explain to entry-level users how to use the RSE R/Bioconductor objects generated by SPEAQeasy for differential expression analysis, among other downstream analyses. Thus we will briefly explain what SPEAQeasy is and mostly focus on the specific colData(RSE) information that is provided, then have hands-on practical exercises to get familiarized with this data. That will be done through an example of a differential expression analysis (DEA) using a real dataset resulting from this pipeline which will enable us to describe the main outputs for the features and the samples of each experiment and to exemplify the type of analyses that this data allows to perform implementing Bioconductor tools such as limma and edgeR. The dataset is an RSE object with the expression counts of 55401 mouse genes across 95 samples from frontal cortices of P0 pups that were born of pregnant mice exposed to cigarette smoke (n=46) or healthy ones (n=49). The idea is to find differentially expressed genes (DEG) in the brains of these pups to evaluate the effects of smoking during pregnancy on the developing brain. You will be able to find the dataset in the smokingMouse Bioconductor package by the time Bioc2023 will be held. Our presenters team is composed of: Nick, the author of SPEAQeasy, Daianna who has analyzed data processed by SPEAQeasy at https://github.com/LieberInstitute/smokingMouse_Indirects, Renee who along with Daianna helped teach https://lcolladotor.github.io/rnaseq_LCG-UNAM_2023/. All three of us are junior scientists: Nick is a Research Associate, Renee graduated from undergrad in 2022, and Daianna will graduate from undergrad in 2024.