MLDec 25, 2015

Histogram Meets Topic Model: Density Estimation by Mixture of Histograms

arXiv:1512.07960v11 citations
Originality Incremental advance
AI Analysis

This addresses the sparsity issue in non-parametric density estimation for units with small data sizes, representing an incremental improvement over conventional histogram methods.

The paper tackles the problem of histogram density estimation for sparse datasets by developing a Bayesian approach that uses a mixture of basis histograms, automatically determining bin numbers and heights, and demonstrates good performance on synthetic data.

The histogram method is a powerful non-parametric approach for estimating the probability density function of a continuous variable. But the construction of a histogram, compared to the parametric approaches, demands a large number of observations to capture the underlying density function. Thus it is not suitable for analyzing a sparse data set, a collection of units with a small size of data. In this paper, by employing the probabilistic topic model, we develop a novel Bayesian approach to alleviating the sparsity problem in the conventional histogram estimation. Our method estimates a unit's density function as a mixture of basis histograms, in which the number of bins for each basis, as well as their heights, is determined automatically. The estimation procedure is performed by using the fast and easy-to-implement collapsed Gibbs sampling. We apply the proposed method to synthetic data, showing that it performs well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes