MLApr 16, 2016

Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures

arXiv:1604.04741v11.3

Originality Incremental advance

AI Analysis

This work addresses a domain-specific need for non-parametric models in scenarios like evolving keyword distributions or video features, but it appears incremental as it builds upon the Hierarchical Dirichlet Process.

The paper tackles the problem of modeling time-varying mixture densities with an unknown number of mixtures and smoothness constraints, proposing a Smoothed Hierarchical Dirichlet Process (sHDP) that introduces a temporal constraint using symmetric KL divergence. Experiments on NIPS keywords demonstrated the model's desirable effects, though no concrete numerical results were provided.

Time-varying mixture densities occur in many scenarios, for example, the distributions of keywords that appear in publications may evolve from year to year, video frame features associated with multiple targets may evolve in a sequence. Any models that realistically cater to this phenomenon must exhibit two important properties: the underlying mixture densities must have an unknown number of mixtures, and there must be some "smoothness" constraints in place for the adjacent mixture densities. The traditional Hierarchical Dirichlet Process (HDP) may be suited to the first property, but certainly not the second. This is due to how each random measure in the lower hierarchies is sampled independent of each other and hence does not facilitate any temporal correlations. To overcome such shortcomings, we proposed a new Smoothed Hierarchical Dirichlet Process (sHDP). The key novelty of this model is that we place a temporal constraint amongst the nearby discrete measures $\{G_j\}$ in the form of symmetric Kullback-Leibler (KL) Divergence with a fixed bound $B$. Although the constraint we place only involves a single scalar value, it nonetheless allows for flexibility in the corresponding successive measures. Remarkably, it also led us to infer the model within the stick-breaking process where the traditional Beta distribution used in stick-breaking is now replaced by a new constraint calculated from $B$. We present the inference algorithm and elaborate on its solutions. Our experiment using NIPS keywords has shown the desirable effect of the model.

View on arXiv PDF

Similar