mTim: Rapid and accurate transcript reconstruction from RNA-Seq data
This work addresses the need for accurate computational tools in transcriptome studies and genome annotation, though it appears incremental as it builds on existing methods like Cufflinks.
The authors tackled the problem of transcript reconstruction from RNA-Seq data by developing mTim, a machine-learning method based on hidden Markov support vector machines, which achieved highly accurate and robust transcript predictions compared to the widely used Cufflinks method.
Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology have revolutionized transcriptome studies. A major motivation for RNA-Seq is to map the structure of expressed transcripts at nucleotide resolution. With accurate computational tools for transcript reconstruction, this technology may also become useful for genome (re-)annotation, which has mostly relied on de novo gene finding where gene structures are primarily inferred from the genome sequence. We developed a machine-learning method, called mTim (margin-based transcript inference method) for transcript reconstruction from RNA-Seq read alignments that is based on discriminatively trained hidden Markov support vector machines. In addition to features derived from read alignments, it utilizes characteristic genomic sequences, e.g. around splice sites, to improve transcript predictions. mTim inferred transcripts that were highly accurate and relatively robust to alignment errors in comparison to those from Cufflinks, a widely used transcript assembly method.