ML IR LGMar 12, 2015

Hierarchical learning of grids of microtopics

Nebojsa Jojic, Alessandro Perina, Dongwoo Kim

arXiv:1503.03701v41.5

Originality Incremental advance

AI Analysis

This work addresses the challenge of topic modeling and feature extraction for researchers in machine learning and data analysis, though it is incremental as it builds upon an existing counting grid model.

The paper tackles the problem of learning coherent microtopics from small datasets using a hierarchical extension of the counting grid model, resulting in improved classification accuracy and the extraction of large numbers of coherent microtopics, as validated through consistency, diversity, clarity metrics, and user studies.

The counting grid is a grid of microtopics, sparse word/feature distributions. The generative model associated with the grid does not use these microtopics individually. Rather, it groups them in overlapping rectangular windows and uses these grouped microtopics as either mixture or admixture components. This paper builds upon the basic counting grid model and it shows that hierarchical reasoning helps avoid bad local minima, produces better classification accuracy and, most interestingly, allows for extraction of large numbers of coherent microtopics even from small datasets. We evaluate this in terms of consistency, diversity and clarity of the indexed content, as well as in a user study on word intrusion tasks. We demonstrate that these models work well as a technique for embedding raw images and discuss interesting parallels between hierarchical CG models and other deep architectures.

View on arXiv PDF

Similar