CVOct 2, 2020

Taking Modality-free Human Identification as Zero-shot Learning

arXiv:2010.00975v2
Originality Incremental advance
AI Analysis

This addresses a scalable identification problem for video surveillance applications where only textual descriptions or attributes are available, representing an incremental advance over existing modality-specific methods.

The paper tackles the problem of modality-free human identification, where queries and gallery sets can be in different modalities (e.g., image or text), by formulating it as a zero-shot learning model that bridges visual and semantic modalities with discriminative prototypes and semantics-guided attention. The result shows that the method outperforms state-of-the-art methods on face and person re-identification tasks.

Human identification is an important topic in event detection, person tracking, and public security. There have been numerous methods proposed for human identification, such as face identification, person re-identification, and gait identification. Typically, existing methods predominantly classify a queried image to a specific identity in an image gallery set (I2I). This is seriously limited for the scenario where only a textual description of the query or an attribute gallery set is available in a wide range of video surveillance applications (A2I or I2A). However, very few efforts have been devoted towards modality-free identification, i.e., identifying a query in a gallery set in a scalable way. In this work, we take an initial attempt, and formulate such a novel Modality-Free Human Identification (named MFHI) task as a generic zero-shot learning model in a scalable way. Meanwhile, it is capable of bridging the visual and semantic modalities by learning a discriminative prototype of each identity. In addition, the semantics-guided spatial attention is enforced on visual modality to obtain representations with both high global category-level and local attribute-level discrimination. Finally, we design and conduct an extensive group of experiments on two common challenging identification tasks, including face identification and person re-identification, demonstrating that our method outperforms a wide variety of state-of-the-art methods on modality-free human identification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes