CVJan 20, 2021

Video Relation Detection with Trajectory-aware Multi-modal Features

arXiv:2101.08165v124 citations
Originality Incremental advance
AI Analysis

It improves video understanding for applications like surveillance and robotics, but is incremental as it builds on existing object detection and multi-modal methods.

The paper tackled video relation detection by decomposing it into object detection, trajectory proposal, and relation prediction, using trajectory-aware multi-modal features, achieving 11.74% mAP and first place in the ACM Multimedia 2020 challenge.

Video relation detection problem refers to the detection of the relationship between different objects in videos, such as spatial relationship and action relationship. In this paper, we present video relation detection with trajectory-aware multi-modal features to solve this task. Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction. We use the state-of-the-art object detection method to ensure the accuracy of object trajectory detection and multi-modal feature representation to help the prediction of relation between objects. Our method won the first place on the video relation detection task of Video Relation Understanding Grand Challenge in ACM Multimedia 2020 with 11.74\% mAP, which surpasses other methods by a large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes