GNLGMLOct 8, 2018

Towards the Latent Transcriptome

arXiv:1810.03442v22 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of RNA-seq analysis for researchers in genomics by providing a novel embedding method, though it appears incremental as it builds on existing RNN techniques for sequence representation.

The authors tackled the problem of analyzing RNA-seq data without genome alignment by proposing a method to compute continuous embeddings for kmers, which captures sequence similarity and abundance, and demonstrated its utility in recovering exon information and detecting genomic abnormalities in acute myeloid leukemia patients.

In this work we propose a method to compute continuous embeddings for kmers from raw RNA-seq data, without the need for alignment to a reference genome. The approach uses an RNN to transform kmers of the RNA-seq reads into a 2 dimensional representation that is used to predict abundance of each kmer. We report that our model captures information of both DNA sequence similarity as well as DNA sequence abundance in the embedding latent space, that we call the Latent Transcriptome. We confirm the quality of these vectors by comparing them to known gene sub-structures and report that the latent space recovers exon information from raw RNA-Seq data from acute myeloid leukemia patients. Furthermore we show that this latent space allows the detection of genomic abnormalities such as translocations as well as patient-specific mutations, making this representation space both useful for visualization as well as analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes