Exploiting Inductive Biases in Video Modeling through Neural CDEs
This addresses video processing problems for computer vision applications, presenting an incremental improvement with a novel method for a known bottleneck.
The authors tackled video modeling challenges like interpolation and mask propagation by introducing a continuous-time U-Net architecture based on controlled differential equations (CDEs), which avoids explicit optical flow learning. They demonstrated competitive performance against state-of-the-art models on these tasks.
We introduce a novel approach to video modeling that leverages controlled differential equations (CDEs) to address key challenges in video tasks, notably video interpolation and mask propagation. We apply CDEs at varying resolutions leading to a continuous-time U-Net architecture. Unlike traditional methods, our approach does not require explicit optical flow learning, and instead makes use of the inherent continuous-time features of CDEs to produce a highly expressive video model. We demonstrate competitive performance against state-of-the-art models for video interpolation and mask propagation tasks.