Unsupervised Multi-stream Highlight detection for the Game "Honor of Kings"
This addresses the need for efficient highlight generation in esports live streaming platforms, though it is incremental as it builds on existing multi-stream fusion techniques.
The paper tackles the problem of automatically generating highlight clips from long esports live videos without manual annotation, achieving satisfying performance by using a multi-stream framework that fuses spatial, temporal, and audio features.
With the increasing popularity of E-sport live, Highlight Flashback has been a critical functionality of live platforms, which aggregates the overall exciting fighting scenes in a few seconds. In this paper, we introduce a novel training strategy without any additional annotation to automatically generate highlights for game video live. Considering that the existing manual edited clips contain more highlights than long game live videos, we perform pair-wise ranking constraints across clips from edited and long live videos. A multi-stream framework is also proposed to fuse spatial, temporal as well as audio features extracted from videos. To evaluate our method, we test on long game live videos with an average length of about 15 minutes. Extensive experimental results on videos demonstrate its satisfying performance on highlights generation and effectiveness by the fusion of three streams.