IRApr 15, 2021

MM-Rec: Multimodal News Recommendation

arXiv:2104.07407v219 citations
AI Analysis

This addresses the need for more accurate news recommendations for users by leveraging multimodal data, though it is incremental as it builds on existing methods by adding visual components.

The paper tackles the problem of news recommendation by incorporating both textual and visual information to learn multimodal news representations, resulting in improved recommendation accuracy as validated by experiments.

Accurate news representation is critical for news recommendation. Most of existing news representation methods learn news representations only from news texts while ignore the visual information in news like images. In fact, users may click news not only because of the interest in news titles but also due to the attraction of news images. Thus, images are useful for representing news and predicting user behaviors. In this paper, we propose a multimodal news recommendation method, which can incorporate both textual and visual information of news to learn multimodal news representations. We first extract region-of-interests (ROIs) from news images via object detection. Then we use a pre-trained visiolinguistic model to encode both news texts and news image ROIs and model their inherent relatedness using co-attentional Transformers. In addition, we propose a crossmodal candidate-aware attention network to select relevant historical clicked news for accurate user modeling by measuring the crossmodal relatedness between clicked news and candidate news. Experiments validate that incorporating multimodal news information can effectively improve news recommendation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes