QMLGAPCOMLOct 30, 2017

Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data

arXiv:1710.10728v17 citations
Originality Incremental advance
AI Analysis

This addresses the need for interpretable and accurate nonlinear models in scientific discovery, offering a solution for researchers in fields like genomics, though it appears incremental as it builds on existing hybrid architectures.

The paper tackles the problem of lacking both interpretability and accuracy in nonlinear scientific data mining by introducing contextual regression, a hybrid method combining neural network embedding and dot product layers. It achieved high fidelity feature recovery under up to 200% noise in simulations and outperformed state-of-the-art methods in predicting open chromatin sites, uncovering two new histone marks.

Machine learning algorithms such as linear regression, SVM and neural network have played an increasingly important role in the process of scientific discovery. However, none of them is both interpretable and accurate on nonlinear datasets. Here we present contextual regression, a method that joins these two desirable properties together using a hybrid architecture of neural network embedding and dot product layer. We demonstrate its high prediction accuracy and sensitivity through the task of predictive feature selection on a simulated dataset and the application of predicting open chromatin sites in the human genome. On the simulated data, our method achieved high fidelity recovery of feature contributions under random noise levels up to 200%. On the open chromatin dataset, the application of our method not only outperformed the state of the art method in terms of accuracy, but also unveiled two previously unfound open chromatin related histone marks. Our method can fill the blank of accurate and interpretable nonlinear modeling in scientific data mining tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes