Complex Valued Gated Auto-encoder for Video Frame Prediction
This work addresses video prediction for computer graphics applications, but it is incremental as it builds on existing gated auto-encoders by incorporating complex values and convolutions.
The paper tackles video frame prediction by using a complex-valued gated auto-encoder, motivated by the Fourier transform for translational operations, and shows that it improves performance and parameter efficiency compared to real-valued versions, with enhancements from convolutional units.
In recent years, complex valued artificial neural networks have gained increasing interest as they allow neural networks to learn richer representations while potentially incorporating less parameters. Especially in the domain of computer graphics, many traditional operations rely heavily on computations in the complex domain, thus complex valued neural networks apply naturally. In this paper, we perform frame predictions in video sequences using a complex valued gated auto-encoder. First, our method is motivated showing how the Fourier transform can be seen as the basis for translational operations. Then, we present how a complex neural network can learn such transformations and compare its performance and parameter efficiency to a real-valued gated autoencoder. Furthermore, we show how extending both - the real and the complex valued - neural networks by using convolutional units can significantly improve prediction performance and parameter efficiency. The networks are assessed on a moving noise and a bouncing ball dataset.