CV CL IRAug 9, 2014

Video In Sentences Out

Andrei Barbu, Alexander Bridge, Zachary Burchill, Dan Coroian, Sven Dickinson, Sanja Fidler, Aaron Michaux, Sam Mussman, Siddharth Narayanaswamy, Dhaval Salvi, Lara Schmidt, Jiangnan Shangguan

arXiv:1408.6418v1158 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of automated video captioning for applications like accessibility or surveillance, though it appears incremental as it builds on existing event recognition techniques.

The authors tackled the problem of generating detailed sentential descriptions from video by recognizing events, objects, and their relationships, resulting in a system that outputs structured sentences describing actions, participants, and spatial details.

We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases, spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the trackto-role assignments, and changing body posture.

View on arXiv PDF

Similar