CVJun 13, 2019

Semantics to Space(S2S): Embedding semantics into spatial space for zero-shot verb-object query inferencing

arXiv:1906.05894v23 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of verb-object query inferencing in zero-shot learning for computer vision, offering an incremental improvement over existing two-stream approaches.

The paper tackles the problem of zero-shot learning for human-object-interaction inference with verb-object queries by embedding semantics into the visual representation stream, resulting in outperforming state-of-the-art methods and consistently improving performance across different baseline architectures.

We present a novel deep zero-shot learning (ZSL) model for inferencing human-object-interaction with verb-object (VO) query. While the previous two-stream ZSL approaches only use the semantic/textual information to be fed into the query stream, we seek to incorporate and embed the semantics into the visual representation stream as well. Our approach is powered by Semantics-to-Space (S2S) architecture where semantics derived from the residing objects are embedded into a spatial space of the visual stream. This architecture allows the co-capturing of the semantic attributes of the human and the objects along with their location/size/silhouette information. To validate, we have constructed a new dataset, Verb-Transferability 60 (VT60). VT60 provides 60 different VO pairs with overlapping verbs tailored for testing two-stream ZSL approaches with VO query. Experimental evaluations show that our approach not only outperforms the state-of-the-art, but also shows the capability of consistently improving performance regardless of which ZSL baseline architecture is used.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes