TransNet: A deep network for fast detection of common shot transitions
This work addresses a key preprocessing step for video analysis applications, offering a fast and accurate solution, though it appears incremental as it builds on existing deep learning methods for a specific domain.
The paper tackles shot boundary detection in videos by introducing a modular convolutional neural network architecture that achieves state-of-the-art results on the RAI dataset with inference speeds well above real-time on a single mediocre GPU.
Shot boundary detection (SBD) is an important first step in many video processing applications. This paper presents a simple modular convolutional neural network architecture that achieves state-of-the-art results on the RAI dataset with well above real-time inference speed even on a single mediocre GPU. The network employs dilated convolutions and operates just on small resized frames. The training process employed randomly generated transitions using selected shots from the TRECVID IACC.3 dataset. The code and a selected trained network will be available at https://github.com/soCzech/TransNet.