Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features
This addresses video retrieval and visualization for users needing efficient access to edited content, but it is incremental as it builds on existing deep learning and segmentation techniques.
The paper tackles the problem of retrieving significant scenes from edited videos based on textual queries and selecting aesthetically pleasing thumbnails, achieving effectiveness as demonstrated through qualitative and quantitative experiments.
This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an edited video for a given query, and represent them with thumbnails which are at the same time semantically meaningful and aesthetically remarkable. Videos are first segmented into coherent and story-telling scenes, then a retrieval algorithm based on deep learning is proposed to retrieve the most significant scenes for a textual query. A ranking strategy based on deep features is finally used to tackle the problem of visualizing the best thumbnail. Qualitative and quantitative experiments are conducted on a collection of edited videos to demonstrate the effectiveness of our approach.