Vega-Video: Integrating Video into the Grammar of Graphics
For visualization researchers and practitioners, this work provides a principled, high-performance framework for combining video with conventional data in interactive visualizations.
The paper integrates video data visualization into the Vega declarative grammar by identifying three classes (synchronization, annotation, transformation) and introducing a split-signal architecture that improves responsiveness by up to 4x and delivers sub-200ms updates for multi-hour videos.
Video data is increasingly used alongside conventional data for interactive data exploration, necessitating interfaces for exploring and presenting mixed-modality data. However, integrating video into visualizations remains difficult due to its distinct paradigms and inherent performance challenges. We identify three classes of video data visualization - synchronization, annotation, and transformation - and integrate them into the Vega declarative grammar. We show that these abstractions enable high-performance implementation. To reconcile Vega's instantaneous dataflow with video player state, we introduce a split-signal architecture that preserves declarative semantics while masking video update delays. We detect continuous scrubbing interactions at compile time to apply encoding-aware optimizations that improve responsiveness by up to 4x. We also repurpose VOD protocols to transform videos in real time, delivering sub-200ms updates even on multi-hour-long compilations. These contributions enable seamless integration of conventional and video data visualization.