LGMLSep 19, 2018

Using Eigencentrality to Estimate Joint, Conditional and Marginal Probabilities from Mixed-Variable Data: Method and Applications

arXiv:1809.07006v1
AI Analysis

This addresses a fundamental problem in machine learning for practitioners dealing with heterogeneous data, though it appears incremental as it builds on existing graph and eigenvector techniques.

The paper tackles the challenge of estimating joint, conditional, and marginal probability distributions from mixed discrete and continuous data by proposing a non-parametric graph-based method using eigencentrality, and demonstrates its application to tasks like classification, regression, and clustering.

The ability to estimate joint, conditional and marginal probability distributions over some set of variables is of great utility for many common machine learning tasks. However, estimating these distributions can be challenging, particularly in the case of data containing a mix of discrete and continuous variables. This paper presents a non-parametric method for estimating these distributions directly from a dataset. The data are first represented as a graph consisting of object nodes and attribute value nodes. Depending on the distribution to be estimated, an appropriate eigenvector equation is then constructed. This equation is then solved to find the corresponding stationary distribution of the graph, from which the required distributions can then be estimated and sampled from. The paper demonstrates how the method can be applied to many common machine learning tasks including classification, regression, missing value imputation, outlier detection, random vector generation, and clustering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes