QMMLOct 16, 2017

Convolutional neural networks for structured omics: OmicsCNN and the OmicsConv layer

arXiv:1710.05918v14 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of using CNNs for structured omics data, which could benefit researchers in bioinformatics and medical fields, though it is incremental as it adapts existing CNN concepts to new data types.

The authors tackled the problem of applying convolutional neural networks to omics data, which lack a natural distance metric, by introducing OmicsConv, a novel Keras layer that enables convolution based on derived metrics, and demonstrated its effectiveness on gut microbiota data for Inflammatory Bowel Disease prediction, achieving competitive performance on a dataset of 222 patients.

Convolutional Neural Networks (CNNs) are a popular deep learning architecture widely applied in different domains, in particular in classifying over images, for which the concept of convolution with a filter comes naturally. Unfortunately, the requirement of a distance (or, at least, of a neighbourhood function) in the input feature space has so far prevented its direct use on data types such as omics data. However, a number of omics data are metrizable, i.e., they can be endowed with a metric structure, enabling to adopt a convolutional based deep learning framework, e.g., for prediction. We propose a generalized solution for CNNs on omics data, implemented through a dedicated Keras layer. In particular, for metagenomics data, a metric can be derived from the patristic distance on the phylogenetic tree. For transcriptomics data, we combine Gene Ontology semantic similarity and gene co-expression to define a distance; the function is defined through a multilayer network where 3 layers are defined by the GO mutual semantic similarity while the fourth one by gene co-expression. As a general tool, feature distance on omics data is enabled by OmicsConv, a novel Keras layer, obtaining OmicsCNN, a dedicated deep learning framework. Here we demonstrate OmicsCNN on gut microbiota sequencing data, for Inflammatory Bowel Disease (IBD) 16S data, first on synthetic data and then a metagenomics collection of gut microbiota of 222 IBD patients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes