IR CV MMApr 20, 2017

Using Mise-En-Scène Visual Features based on MPEG-7 and Deep Learning for Movie Recommendation

Yashar Deldjoo, Massimo Quadrana, Mehdi Elahi, Paolo Cremonesi

arXiv:1704.06109v15.614 citations

Originality Incremental advance

AI Analysis

This addresses the cold-start and scalability issues in movie recommendation systems by automating feature extraction, though it is incremental as it builds on existing visual descriptor and deep learning methods.

The paper tackles the problem of noisy and expensive human-generated movie features in recommender systems by proposing mise-en-scène visual features, showing that recommendations based on these features consistently outperform traditional features like genre and tag on a dataset of 4K movies.

Item features play an important role in movie recommender systems, where recommendations can be generated by using explicit or implicit preferences of users on traditional features (attributes) such as tag, genre, and cast. Typically, movie features are human-generated, either editorially (e.g., genre and cast) or by leveraging the wisdom of the crowd (e.g., tag), and as such, they are prone to noise and are expensive to collect. Moreover, these features are often rare or absent for new items, making it difficult or even impossible to provide good quality recommendations. In this paper, we show that user's preferences on movies can be better described in terms of the mise-en-scène features, i.e., the visual aspects of a movie that characterize design, aesthetics and style (e.g., colors, textures). We use both MPEG-7 visual descriptors and Deep Learning hidden layers as example of mise-en-scène features that can visually describe movies. Interestingly, mise-en-scène features can be computed automatically from video files or even from trailers, offering more flexibility in handling new items, avoiding the need for costly and error-prone human-based tagging, and providing good scalability. We have conducted a set of experiments on a large catalogue of 4K movies. Results show that recommendations based on mise-en-scène features consistently provide the best performance with respect to richer sets of more traditional features, such as genre and tag.

View on arXiv PDF

Similar