CVMMAug 10, 2022

Automatic Camera Control and Directing with an Ultra-High-Definition Collaborative Recording System

arXiv:2208.05213v11 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses the cumbersome task for human directors in broadcasting by automating camera control, though it is incremental as it builds on existing object detection and cinematic techniques.

The paper tackles the problem of automatically directing multi-camera video streams for broadcasting by developing a system that generates visually pleasing shot sequences from ultra-high-resolution inputs, using object detection and cinematic rules, and shows through a user study that it captures events with aesthetically pleasing compositions and human-like behavior.

Capturing an event from multiple camera angles can give a viewer the most complete and interesting picture of that event. To be suitable for broadcasting, a human director needs to decide what to show at each point in time. This can become cumbersome with an increasing number of camera angles. The introduction of omnidirectional or wide-angle cameras has allowed for events to be captured more completely, making it even more difficult for the director to pick a good shot. In this paper, a system is presented that, given multiple ultra-high resolution video streams of an event, can generate a visually pleasing sequence of shots that manages to follow the relevant action of an event. Due to the algorithm being general purpose, it can be applied to most scenarios that feature humans. The proposed method allows for online processing when real-time broadcasting is required, as well as offline processing when the quality of the camera operation is the priority. Object detection is used to detect humans and other objects of interest in the input streams. Detected persons of interest, along with a set of rules based on cinematic conventions, are used to determine which video stream to show and what part of that stream is virtually framed. The user can provide a number of settings that determine how these rules are interpreted. The system is able to handle input from different wide-angle video streams by removing lens distortions. Using a user study it is shown, for a number of different scenarios, that the proposed automated director is able to capture an event with aesthetically pleasing video compositions and human-like shot switching behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes