MLCVLGMar 4, 2016

Learning deep representation of multityped objects and tasks

arXiv:1603.01359v13 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of handling diverse data types in machine learning for applications like image analysis, but it is incremental as it builds on existing multitask and multimodal approaches without introducing a fundamentally new paradigm.

The paper tackles the problem of integrating multityped representations of multimodal objects, such as images with visual views and social tags, by introducing a deep multitask architecture that learns a high-level homogeneous representation and supports heterogeneously typed tasks. The result is a model that produces more compact representations, integrates multiviews and multimodalities, and performs competitively against baselines in applications like social image retrieval and multiple concept prediction.

We introduce a deep multitask architecture to integrate multityped representations of multimodal objects. This multitype exposition is less abstract than the multimodal characterization, but more machine-friendly, and thus is more precise to model. For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-valued). At the same time, the image may have several social tags, which are best described using a sparse binary vector. Our deep model takes as input multiple type-specific features, narrows the cross-modality semantic gaps, learns cross-type correlation, and produces a high-level homogeneous representation. At the same time, the model supports heterogeneously typed tasks. We demonstrate the capacity of the model on two applications: social image retrieval and multiple concept prediction. The deep architecture produces more compact representation, naturally integrates multiviews and multimodalities, exploits better side information, and most importantly, performs competitively against baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes