Isoformic: Isoform level biological interpretation from transcriptomic data
Author(s): Izabela Mamede Conceição,Lucio Rezende Queiroz,Thomaz Luscher Dias,Julia Raspante Martins,Nayara Evelin de Toledo,Gloria Regina Franco
Affiliation(s): Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Social media: https://twitter.com/Izabela_M_C_A_C
Transcriptome analysis is one of the most used methods in current biological sciences. There are multiple software that execute the gathering of the reference transcriptome, the alignment or pseudo-alignment, the quality check, the differential expression and the biological interpretation of all data. The step with the greatest group-to-group variability and which takes the most time when the staff are proficient is the downstream interpretation of all data. Transcriptomes are mostly analyzed at gene level, however, gene level analysis only can be misleading since over 80% of all mammalian genes suffer alternative splicing or have more than one start and end transcription sites. These processes will produce multiple transcript types from the same gene template which can be productive, generating different isoforms of the same protein, or unproductive which are not able to be translated into proteins. For mice and human transcriptomes there are comprehensive and widely available transcript annotations from Gencode, Chess, and Entrez databases which, for humans, amount to over 200.000 transcripts, over 70% of them being products of aberrant splicing events. The splicing event calling software for short-reads are known to have poor performance, require extremely deep sequencing and output transcripts that do not correspond to the annotated ones. Nevertheless, short-read sequencing technology is still the default for most bioinformatics analyses. Today there are no pipeline for this short-read transcript level analysis that is customizable, usable for any types of transcripts, and that results in ready-to-use plots for downstream analyses or that is able to infer biologically relevant pathways from the data. Here we present Isoformic, a pipeline for biological interpretation of transcript level results from short-read sequencing data entirely in R and which can be used for any well-annotated transcriptome. Isoformic is available at github on version 0.0.9 (https://github.com/luciorq/isoformic). The package allows you to input your differential expression data (gene and transcript) and your reference transcriptome and from that you will: 1. extract the transcript types, 2. find genes whose transcripts are differentially expressed but the genes are not, 3. plot in different ways the relative counts and FoldChange of those transcripts between conditions, 4. visualize the difference in intron and exon composition between the same transcripts of a given gene and 5. functionally enrich the transcripts with any chosen gmt, separating them by transcript type and plot the results. Isoformic is capable of combining different types of visualization commonly used in the field, but not aggregated into a single package, creates functions to easily explore the transcript differential expression and assigns biologically relevant information to them. The package provides a customizable and easy-to-use workflow for transcript-level finds from bulk transcriptomic data.
Source code