Generative Video Bi-flow
This work addresses video generation for applications requiring efficient streaming, though it appears incremental as it builds on existing ODE and diffusion methods.
The authors tackled the problem of generative video modeling by proposing a neural ODE flow with a bilinear objective to learn temporal changes robustly, achieving competitive quality with higher speed compared to a conditional diffusion baseline.
We propose a novel generative video model to robustly learn temporal change as a neural Ordinary Differential Equation (ODE) flow with a bilinear objective which combines two aspects: The first is to map from the past into future video frames directly. Previous work has mapped the noise to new frames, a more computationally expensive process. Unfortunately, starting from the previous frame, instead of noise, is more prone to drifting errors. Hence, second, we additionally learn how to remove the accumulated errors as the joint objective by adding noise during training. We demonstrate unconditional video generation in a streaming manner for various video datasets, all at competitive quality compared to a conditional diffusion baseline but with higher speed, i.e., fewer ODE solver steps.