CVMay 22, 2015

Joint Inference of Groups, Events and Human Roles in Aerial Videos

arXiv:1505.05957v1184 citations
Originality Incremental advance
AI Analysis

This addresses the understudied problem of aerial video analysis for applications like surveillance or monitoring, but it is incremental as it builds on existing inference methods.

The paper tackles the problem of parsing low-resolution aerial videos to jointly infer groups, events, and human roles, using a novel framework based on spatiotemporal AND-OR graphs and templates, and demonstrates successful inference on a new dataset of picnic areas.

With the advent of drones, aerial video analysis becomes increasingly important; yet, it has received scant attention in the literature. This paper addresses a new problem of parsing low-resolution aerial videos of large spatial areas, in terms of 1) grouping, 2) recognizing events and 3) assigning roles to people engaged in events. We propose a novel framework aimed at conducting joint inference of the above tasks, as reasoning about each in isolation typically fails in our setting. Given noisy tracklets of people and detections of large objects and scene surfaces (e.g., building, grass), we use a spatiotemporal AND-OR graph to drive our joint inference, using Markov Chain Monte Carlo and dynamic programming. We also introduce a new formalism of spatiotemporal templates characterizing latent sub-events. For evaluation, we have collected and released a new aerial videos dataset using a hex-rotor flying over picnic areas rich with group events. Our results demonstrate that we successfully address above inference tasks under challenging conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes