LG IR MLOct 16, 2012

Factorized Multi-Modal Topic Model

Seppo Virtanen, Yangqing Jia, Arto Klami, Trevor Darrell

arXiv:1210.4920v143 citations

Originality Incremental advance

AI Analysis

This work addresses the need for better analysis of multi-modal data collections, such as image-text pairs, by providing a method that avoids forcing dependencies between minimally correlating modalities, though it is incremental in combining existing approaches.

The authors tackled the problem of analyzing multi-modal data, specifically paired images and text, by developing a novel topic model that learns both shared and private topics, enabling more accurate cross-modal querying.

Multi-modal data collections, such as corpora of paired images and text snippets, require analysis methods beyond single-view component and topic models. For continuous observations the current dominant approach is based on extensions of canonical correlation analysis, factorizing the variation into components shared by the different modalities and those private to each of them. For count data, multiple variants of topic models attempting to tie the modalities together have been presented. All of these, however, lack the ability to learn components private to one modality, and consequently will try to force dependencies even between minimally correlating modalities. In this work we combine the two approaches by presenting a novel HDP-based topic model that automatically learns both shared and private topics. The model is shown to be especially useful for querying the contents of one domain given samples of the other.

View on arXiv PDF

Similar