IRLGJun 9, 2020

I know why you like this movie: Interpretable Efficient Multimodal Recommender

arXiv:2006.09979v12 citations
Originality Incremental advance
AI Analysis

This provides interpretability for a state-of-the-art multimodal recommender system, addressing a bottleneck in understanding model decisions for users and developers.

The paper tackles the problem of interpreting the Efficient Manifold Density Estimator (EMDE) model for multimodal movie recommendations, proving that white-box interpretation is possible and showing the influence of text, categorical features, and images on recommendations.

Recently, the Efficient Manifold Density Estimator (EMDE) model has been introduced. The model exploits Local Sensitive Hashing and Count-Min Sketch algorithms, combining them with a neural network to achieve state-of-the-art results on multiple recommender datasets. However, this model ingests a compressed joint representation of all input items for each user/session, so calculating attributions for separate items via gradient-based methods seems not applicable. We prove that interpreting this model in a white-box setting is possible thanks to the properties of EMDE item retrieval method. By exploiting multimodal flexibility of this model, we obtain meaningful results showing the influence of multiple modalities: text, categorical features, and images, on movie recommendation output.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes