LG MLJan 23, 2013

Fast Learning from Sparse Data

David Maxwell Chickering, David Heckerman

arXiv:1301.6685v222 citations

AI Analysis

This work addresses efficiency issues for machine learning practitioners dealing with sparse data, though it is incremental as it builds on existing algorithms.

The paper tackles the problem of slow learning algorithms on sparse data by introducing two techniques for efficient count extraction and EM inference, resulting in a dramatic decrease in running time on real-world datasets.

We describe two techniques that significantly improve the running time of several standard machine-learning algorithms when data is sparse. The first technique is an algorithm that effeciently extracts one-way and two-way counts--either real or expected-- from discrete data. Extracting such counts is a fundamental step in learning algorithms for constructing a variety of models including decision trees, decision graphs, Bayesian networks, and naive-Bayes clustering models. The second technique is an algorithm that efficiently performs the E-step of the EM algorithm (i.e. inference) when applied to a naive-Bayes clustering model. Using real-world data sets, we demonstrate a dramatic decrease in running time for algorithms that incorporate these techniques.

View on arXiv PDF

Similar