catchSalmon/catchKallisto: Dividing out quantification uncertainty allows efficient assessment of differential transcript expression

catchSalmon/catchKallisto: Dividing out quantification uncertainty allows efficient assessment of differential transcript expression


Author(s): Pedro Baldoni,Yunshun Chen,Soroor Hediyeh-zadeh,Yang Liao,Wei Shi,Gordon K. Smyth

Affiliation(s): Walter and Eliza Hall Institute of Medical Research

Social media: https://twitter.com/plbaldoni

A major challenge in transcript-level RNA-seq data analysis is the inherent variability introduced during quantification of RNA sequencing reads due to the high level of sequence similarity among transcripts annotated to the same genomic locus. The quantification uncertainty of transcript-level counts, which is intractable to measure analytically, introduces an extra level of technical variation that is difficult to estimate and compromises differential transcript expression (DTE) analyses with standard methods developed for gene-level analyses. Bootstrap counts, as provided by popular light-weight quantification tools Salmon and kallisto, allow us to estimate the extra technical variability due to the quantification uncertainty and account for such an effect in DTE analyses. In this talk, I will present catchSalmon and catchKallisto, two functions included in the Bioconductor package edgeR that estimate the extra technical overdispersion of transcript-level counts using bootstrap samples from the aforementioned quantification tools. I will discuss how the technical overdispersion can be effectively removed from the data by count scaling, reducing transcript-level counts to effective count sizes that reflect their true precision. With this approach, catchSalmon and catchKallisto provide users with an efficient DTE analysis within the fast and well-stablished edgeR framework. Applications of the proposed DTE pipeline within edgeR on real experiments and extensive simulation studies will be presented to illustrate the benefits of accounting for the quantification uncertainty via count scaling over standard methods.