LGNCMLJun 21, 2016

An artificial neural network to find correlation patterns in an arbitrary number of variables

arXiv:1606.06564v21 citations
Originality Synthesis-oriented
AI Analysis

This addresses a need in statistics, machine learning, and data mining for multi-variable correlation analysis, but appears incremental as it builds on existing neural network and optimization concepts.

The authors tackled the problem of measuring correlation among an arbitrary number of variables, which is limited by existing two-variable methods, by proposing a criterion based on optimizing parameters to maximize differences between statistics on real and scrambled data, and tested it on a cancer gene dataset.

Methods to find correlation among variables are of interest to many disciplines, including statistics, machine learning, (big) data mining and neurosciences. Parameters that measure correlation between two variables are of limited utility when used with multiple variables. In this work, I propose a simple criterion to measure correlation among an arbitrary number of variables, based on a data set. The central idea is to i) design a function of the variables that can take different forms depending on a set of parameters, ii) calculate the difference between a statistics associated to the function computed on the data set and the same statistics computed on a randomised version of the data set, called "scrambled" data set, and iii) optimise the parameters to maximise this difference. Many such functions can be organised in layers, which can in turn be stacked one on top of the other, forming a neural network. The function parameters are searched with an enhanced genetic algortihm called POET and the resulting method is tested on a cancer gene data set. The method may have potential implications for some issues that affect the field of neural networks, such as overfitting, the need to process huge amounts of data for training and the presence of "adversarial examples".

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes