Tree-based differential testing using inferential replicate counts for RNASeq
Author(s): Noor Pratap Singh,Michael I Love,Rob Patro
Affiliation(s): University of Maryland - College Park
The discovery of differentially expressed transcripts is an important but challenging problem in transcriptomics. There is substantial uncertainty associated with the abundance estimates of transcripts which, if ignored, can lead to exaggeration of false positives and, if included, may lead to reduced power for transcripts that have high uncertainty. Given a set of samples, TreeTerminus arranges transcripts in a tree structure that encodes different layers of resolution for interpretation of the abundance of transcriptional groups, with uncertainty generally decreasing as one ascends up the tree from the leaves. The tree can be leveraged to find differentially expressed nodes between conditions, rather than finding individual differentially expressed transcripts, for the subsets of transcripts that exhibit high uncertainty. Existing differential testing methods that incorporate tree structures have certain limitations with respect to this important use case in transcriptomics. Some methods assume that, when an inner node is selected, all of its underlying transcripts are differentially expressed; or propose climbing to the highest level in the tree such that no node above it is differentially expressed. Another proposed approach has been to descend from the top to the bottom of the tree and stop searching as soon as an inner node is found not to be differentially expressed. Yet, this approach is not reasonable when an inner node represents an aggregation of nodes that show opposite directions of changes in expression across conditions. Further, uncertainty itself is not incorporated in any of these methods. We introduce a differential testing method that overcomes the above limitations. It uses the tree structure from TreeTerminus, and the Swish method for nonparametric hypothesis testing, which accounts for inferential uncertainty. Our method outputs a set of nodes that are deemed significant at a given FDR by traversing the tree using a set of rules that takes into account inferential uncertainty and the direction of sign changes. The reported entities can consist of both transcripts and inner nodes, with the inner nodes determined in a data-dependent manner to maximize the resolution of analysis while controlling for uncertainty in inference. We ran our method on both experimental and simulated datasets and compared its performance with other tree-based differential methods as well as with uncertainty-aware differential transcript estimation methods run on just the transcripts themselves. Our method shows improved sensitivity compared to other approaches. It finds inner nodes that show a strong signal for differential expression between the conditions, which would have been missed when looking at their underlying transcripts alone.