The complementarity of a diverse range of deep learning features extracted from video content for video recommendation
This paper offers an incremental improvement in video recommendation for streaming services by better handling new item cold-start through the use of diverse deep learning features.
This paper addresses the new item cold-start problem in video recommendation by exploring various deep learning features (visual, audio, motion) extracted from video content. They found that deep learning features outperform hand-crafted features, with audio and action-centric deep learning features being superior to MFCC and iDT. Combining diverse deep learning features with hand-crafted features and textual metadata significantly improved recommendations compared to using only deep learning features.
Following the popularisation of media streaming, a number of video streaming services are continuously buying new video content to mine the potential profit from them. As such, the newly added content has to be handled well to be recommended to suitable users. In this paper, we address the new item cold-start problem by exploring the potential of various deep learning features to provide video recommendations. The deep learning features investigated include features that capture the visual-appearance, audio and motion information from video content. We also explore different fusion methods to evaluate how well these feature modalities can be combined to fully exploit the complementary information captured by them. Experiments on a real-world video dataset for movie recommendations show that deep learning features outperform hand-crafted features. In particular, recommendations generated with deep learning audio features and action-centric deep learning features are superior to MFCC and state-of-the-art iDT features. In addition, the combination of various deep learning features with hand-crafted features and textual metadata yields significant improvement in recommendations compared to combining only the former.