CVMar 6, 2018

Categorical Mixture Models on VGGNet activations

arXiv:1803.02446v1
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for unsupervised image clustering in domain-specific applications like restaurant photo analysis.

The paper tackled clustering Yelp restaurant photos into meaningful topics by using VGGNet activations and LDA, finding that object-based extraction produced archetypes like restaurant, food, and drinks that aligned well with human intuition and Yelp labels.

In this project, I use unsupervised learning techniques in order to cluster a set of yelp restaurant photos under meaningful topics. In order to do this, I extract layer activations from a pre-trained implementation of the popular VGGNet convolutional neural network. First, I explore using LDA with the activations of convolutional layers as features. Secondly, I explore using the object-recognition powers of VGGNet trained on ImageNet in order to extract meaningful objects from the photos, and then perform LDA to group the photos under topic-archetypes. I find that this second approach finds meaningful archetypes, which match the human intuition for photo topics such as restaurant, food, and drinks. Furthermore, these clusters align well and distinctly with the actual yelp photo labels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes