QMLGNEGNSep 6, 2017

Phylogenetic Convolutional Neural Networks in Metagenomics

arXiv:1709.02268v1104 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of applying CNNs to non-Euclidean metagenomics data for disease classification, representing an incremental advancement in domain-specific deep learning methods.

The paper tackles the problem of classifying metagenomics data by introducing Ph-CNN, a deep learning architecture that uses patristic distance on phylogenetic trees as a proximity measure, showing promising classification performance compared to classical algorithms like SVM and Random Forest on a dataset of 38 healthy subjects and 222 IBD patients.

Background: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Results: Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Conclusion: Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user. Keywords: Metagenomics; Deep learning; Convolutional Neural Networks; Phylogenetic trees

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes