LGDSMLOct 4, 2018

Monte Carlo Dependency Estimation

arXiv:1810.02112v14 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of dependency monitoring in data streams and attribute relevance in databases, which is crucial for improving data understanding and learning algorithm performance, though it appears incremental as it builds on existing dependency estimation methods.

The paper tackles the problem of estimating multivariate dependency in static and dynamic data by proposing Monte Carlo Dependency Estimation (MCDE), a theoretical framework that quantifies dependency via Monte Carlo simulations, and introduces Mann-Whitney P (MWP) as a novel estimator, demonstrating its superiority over state-of-the-art measures.

Estimating the dependency of variables is a fundamental task in data analysis. Identifying the relevant attributes in databases leads to better data understanding and also improves the performance of learning algorithms, both in terms of runtime and quality. In data streams, dependency monitoring provides key insights into the underlying process, but is challenging. In this paper, we propose Monte Carlo Dependency Estimation (MCDE), a theoretical framework to estimate multivariate dependency in static and dynamic data. MCDE quantifies dependency as the average discrepancy between marginal and conditional distributions via Monte Carlo simulations. Based on this framework, we present Mann-Whitney P (MWP), a novel dependency estimator. We show that MWP satisfies a number of desirable properties and can accommodate any kind of numerical data. We demonstrate the superiority of our estimator by comparing it to the state-of-the-art multivariate dependency measures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes