LGOct 8, 2013

Fast Multi-Instance Multi-Label Learning

arXiv:1310.2049v1135 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues in MIML for applications like image and text analysis, offering a significant speed improvement over existing methods, though it is incremental in nature.

The paper tackles the problem of efficiently handling large datasets in multi-instance multi-label learning (MIML) by proposing MIMLfast, which constructs a shared low-dimensional subspace and trains label-specific models via stochastic gradient descent, resulting in a method that is over 100 times faster on a dataset with 20K bags and 180K instances and takes only 12 minutes on a larger dataset where existing approaches fail within 24 hours.

In many real-world tasks, particularly those involving data objects with complicated semantics such as images and texts, one object can be represented by multiple instances and simultaneously be associated with multiple labels. Such tasks can be formulated as multi-instance multi-label learning (MIML) problems, and have been extensively studied during the past few years. Existing MIML approaches have been found useful in many applications; however, most of them can only handle moderate-sized data. To efficiently handle large data sets, in this paper we propose the MIMLfast approach, which first constructs a low-dimensional subspace shared by all labels, and then trains label specific linear models to optimize approximated ranking loss via stochastic gradient descent. Although the MIML problem is complicated, MIMLfast is able to achieve excellent performance by exploiting label relations with shared space and discovering sub-concepts for complicated labels. Experiments show that the performance of MIMLfast is highly competitive to state-of-the-art techniques, whereas its time cost is much less; particularly, on a data set with 20K bags and 180K instances, MIMLfast is more than 100 times faster than existing MIML approaches. On a larger data set where none of existing approaches can return results in 24 hours, MIMLfast takes only 12 minutes. Moreover, our approach is able to identify the most representative instance for each label, and thus providing a chance to understand the relation between input patterns and output label semantics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes