Yashar Deldjoo

IR
h-index42
38papers
1,525citations
Novelty32%
AI Score50

38 Papers

IRMay 23, 2022
Fairness in Recommender Systems: Research Landscape and Future Directions

Yashar Deldjoo, Dietmar Jannach, Alejandro Bellogin et al.

Recommender systems can strongly influence which information we see online, e.g., on social media, and thus impact our beliefs, decisions, and actions. At the same time, these systems can create substantial business value for different stakeholders. Given the growing potential impact of such AI-based systems on individuals, organizations, and society, questions of fairness have gained increased attention in recent years. However, research on fairness in recommender systems is still a developing area. In this survey, we first review the fundamental concepts and notions of fairness that were put forward in the area in the recent past. Afterward, through a review of more than 160 scholarly publications, we present an overview of how research in this field is currently operationalized, e.g., in terms of general research methodology, fairness measures, and algorithmic approaches. Overall, our analysis of recent works points to certain research gaps. In particular, we find that in many research works in computer science, very abstract problem operationalizations are prevalent and questions of the underlying normative claims and what represents a fair recommendation in the context of a given application are often not discussed in depth. These observations call for more interdisciplinary research to address fairness in recommendation in a more comprehensive and impactful manner.

IRApr 17, 2022
CPFair: Personalized Consumer and Producer Fairness Re-ranking for Recommender Systems

Mohammadmehdi Naghiaei, Hossein A. Rahmani, Yashar Deldjoo

Recently, there has been a rising awareness that when machine learning (ML) algorithms are used to automate choices, they may treat/affect individuals unfairly, with legal, ethical, or economic consequences. Recommender systems are prominent examples of such ML systems that assist users in making high-stakes judgments. A common trend in the previous literature research on fairness in recommender systems is that the majority of works treat user and item fairness concerns separately, ignoring the fact that recommender systems operate in a two-sided marketplace. In this work, we present an optimization-based re-ranking approach that seamlessly integrates fairness constraints from both the consumer and producer-side in a joint objective framework. We demonstrate through large-scale experiments on 8 datasets that our proposed method is capable of improving both consumer and producer fairness without reducing overall recommendation quality, demonstrating the role algorithms may play in minimizing data biases.

CLSep 4, 2022
Interactive Question Answering Systems: Literature Review

Giovanni Maria Biancofiore, Yashar Deldjoo, Tommaso Di Noia et al.

Question answering systems are recognized as popular and frequently effective means of information seeking on the web. In such systems, information seekers can receive a concise response to their query by presenting their questions in natural language. Interactive question answering is a recently proposed and increasingly popular solution that resides at the intersection of question answering and dialogue systems. On the one hand, the user can ask questions in normal language and locate the actual response to her inquiry; on the other hand, the system can prolong the question-answering session into a dialogue if there are multiple probable replies, very few, or ambiguities in the initial request. By permitting the user to ask more questions, interactive question answering enables users to dynamically interact with the system and receive more precise results. This survey offers a detailed overview of the interactive question-answering methods that are prevalent in current literature. It begins by explaining the foundational principles of question-answering systems, hence defining new notations and taxonomies to combine all identified works inside a unified framework. The reviewed published work on interactive question-answering systems is then presented and examined in terms of its proposed methodology, evaluation approaches, and dataset/application domain. We also describe trends surrounding specific tasks and issues raised by the community, so shedding light on the future interests of scholars. Our work is further supported by a GitHub page with a synthesis of all the major topics covered in this literature study. https://sisinflab.github.io/interactive-question-answering-systems-survey/

AIAug 17, 2023
ChatGPT-HealthPrompt. Harnessing the Power of XAI in Prompt-Based Healthcare Decision Support using ChatGPT

Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia

This study presents an innovative approach to the application of large language models (LLMs) in clinical decision-making, focusing on OpenAI's ChatGPT. Our approach introduces the use of contextual prompts-strategically designed to include task description, feature description, and crucially, integration of domain knowledge-for high-quality binary classification tasks even in data-scarce scenarios. The novelty of our work lies in the utilization of domain knowledge, obtained from high-performing interpretable ML models, and its seamless incorporation into prompt design. By viewing these ML models as medical experts, we extract key insights on feature importance to aid in decision-making processes. This interplay of domain knowledge and AI holds significant promise in creating a more insightful diagnostic tool. Additionally, our research explores the dynamics of zero-shot and few-shot prompt learning based on LLMs. By comparing the performance of OpenAI's ChatGPT with traditional supervised ML models in different data conditions, we aim to provide insights into the effectiveness of prompt engineering strategies under varied data availability. In essence, this paper bridges the gap between AI and healthcare, proposing a novel methodology for LLMs application in clinical decision support systems. It highlights the transformative potential of effective prompt design, domain knowledge integration, and flexible learning approaches in enhancing automated decision-making.

IRSep 18, 2024
Recommendation with Generative Models

Yashar Deldjoo, Zhankui He, Julian McAuley et al.

Generative models are a class of AI models capable of creating new instances of data by learning and sampling from their statistical distributions. In recent years, these models have gained prominence in machine learning due to the development of approaches such as generative adversarial networks (GANs), variational autoencoders (VAEs), and transformer-based architectures such as GPT. These models have applications across various domains, such as image generation, text synthesis, and music composition. In recommender systems, generative models, referred to as Gen-RecSys, improve the accuracy and diversity of recommendations by generating structured outputs, text-based interactions, and multimedia content. By leveraging these capabilities, Gen-RecSys can produce more personalized, engaging, and dynamic user experiences, expanding the role of AI in eCommerce, media, and beyond. Our book goes beyond existing literature by offering a comprehensive understanding of generative models and their applications, with a special focus on deep generative models (DGMs) and their classification. We introduce a taxonomy that categorizes DGMs into three types: ID-driven models, large language models (LLMs), and multimodal models. Each category addresses unique technical and architectural advancements within its respective research area. This taxonomy allows researchers to easily navigate developments in Gen-RecSys across domains such as conversational AI and multimodal content generation. Additionally, we examine the impact and potential risks of generative models, emphasizing the importance of robust evaluation frameworks.

CLJul 14, 2023
Fairness of ChatGPT and the Role Of Explainable-Guided Prompts

Yashar Deldjoo

Our research investigates the potential of Large-scale Language Models (LLMs), specifically OpenAI's GPT, in credit risk assessment-a binary classification task. Our findings suggest that LLMs, when directed by judiciously designed prompts and supplemented with domain-specific knowledge, can parallel the performance of traditional Machine Learning (ML) models. Intriguingly, they achieve this with significantly less data-40 times less, utilizing merely 20 data points compared to the ML's 800. LLMs particularly excel in minimizing false positives and enhancing fairness, both being vital aspects of risk analysis. While our results did not surpass those of classical ML models, they underscore the potential of LLMs in analogous tasks, laying a groundwork for future explorations into harnessing the capabilities of LLMs in diverse ML tasks.

CRMar 28, 2023
Machine-learned Adversarial Attacks against Fault Prediction Systems in Smart Electrical Grids

Carmelo Ardito, Yashar Deldjoo, Tommaso Di Noia et al.

In smart electrical grids, fault detection tasks may have a high impact on society due to their economic and critical implications. In the recent years, numerous smart grid applications, such as defect detection and load forecasting, have embraced data-driven methodologies. The purpose of this study is to investigate the challenges associated with the security of machine learning (ML) applications in the smart grid scenario. Indeed, the robustness and security of these data-driven algorithms have not been extensively studied in relation to all power grid applications. We demonstrate first that the deep neural network method used in the smart grid is susceptible to adversarial perturbation. Then, we highlight how studies on fault localization and type classification illustrate the weaknesses of present ML algorithms in smart grids to various adversarial attacks

AIAug 20, 2024
Large Language Model Driven Recommendation

Anton Korikov, Scott Sanner, Yashar Deldjoo et al.

While previous chapters focused on recommendation systems (RSs) based on standardized, non-verbal user feedback such as purchases, views, and clicks -- the advent of LLMs has unlocked the use of natural language (NL) interactions for recommendation. This chapter discusses how LLMs' abilities for general NL reasoning present novel opportunities to build highly personalized RSs -- which can effectively connect nuanced and diverse user preferences to items, potentially via interactive dialogues. To begin this discussion, we first present a taxonomy of the key data sources for language-driven recommendation, covering item descriptions, user-system interactions, and user profiles. We then proceed to fundamental techniques for LLM recommendation, reviewing the use of encoder-only and autoregressive LLM recommendation in both tuned and untuned settings. Afterwards, we move to multi-module recommendation architectures in which LLMs interact with components such as retrievers and RSs in multi-stage pipelines. This brings us to architectures for conversational recommender systems (CRSs), in which LLMs facilitate multi-turn dialogues where each turn presents an opportunity not only to make recommendations, but also to engage with the user in interactive preference elicitation, critiquing, and question-answering.

49.5AIApr 27Code
Multi-Dimensional Evaluation of Sustainable City Trips with LLM-as-a-Judge and Human-in-the-Loop

Ashmi Banerjee, Adithi Satish, Wolfgang Wörndl et al.

Evaluating nuanced conversational travel recommendations is challenging when human annotations are costly and standard metrics ignore stakeholder-centric goals. We study LLMs-as-Judges for sustainable city-trip lists across four dimensions -- relevance, diversity, sustainability, and popularity balance, and propose a three-phase calibration framework: (1) baseline judging with multiple LLMs, (2) expert evaluation to identify systematic misalignment, and (3) dimension-specific calibration via rules and few-shot examples. Across two recommendation settings, we observe model-specific biases and high dimension-level variance, even when judges agree on overall rankings. Calibration clarifies reasoning per dimension but exposes divergent interpretations of sustainability, highlighting the need for transparent, bias-aware LLM evaluation. Prompts and code are released for reproducibility: https://github.com/ashmibanerjee/trs-llm-calibration.

32.2IRApr 14
TRACE: A Conversational Framework for Sustainable Tourism Recommendation with Agentic Counterfactual Explanations

Ashmi Banerjee, Adithi Satish, Wolfgang Wörndl et al.

Traditional conversational travel recommender systems primarily optimize for user relevance and convenience, often reinforcing popular, overcrowded destinations and carbon-intensive travel choices. To address this, we present TRACE (Tourism Recommendation with Agentic Counterfactual Explanations), a multi-agent, LLM-based framework that promotes sustainable tourism through interactive nudging. TRACE uses a modular orchestrator-worker architecture where specialized agents elicit latent sustainability preferences, construct structured user personas, and generate recommendations that balance relevance with environmental impact. A key innovation lies in its use of agentic counterfactual explanations and LLM-driven clarifying questions, which together surface greener alternatives and refine understanding of intent, fostering user reflection without coercion. User studies and semantic alignment analyses demonstrate that TRACE effectively supports sustainable decision-making while preserving recommendation quality and interactive responsiveness. TRACE is implemented on Google's Agent Development Kit, with full code, Docker setup, prompts, and a publicly available demo video to ensure reproducibility. A project summary, including all resources, prompts, and demo access, is available at https://ashmibanerjee.github.io/trace-chatbot.

IRMar 31, 2024
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)

Yashar Deldjoo, Zhankui He, Julian McAuley et al.

Traditional recommender systems (RS) typically use user-item rating histories as their main data source. However, deep generative models now have the capability to model and sample from complex data distributions, including user-item interactions, text, images, and videos, enabling novel recommendation tasks. This comprehensive, multidisciplinary survey connects key advancements in RS using Generative Models (Gen-RecSys), covering: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS. Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges. This survey accompanies a tutorial presented at ACM KDD'24, with supporting materials provided at: https://encr.pw/vDhLq.

IRJan 19, 2024Code
Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency

Yashar Deldjoo

This paper explores the biases in ChatGPT-based recommender systems, focusing on provider fairness (item-side fairness). Through extensive experiments and over a thousand API calls, we investigate the impact of prompt design strategies-including structure, system role, and intent-on evaluation metrics such as provider fairness, catalog coverage, temporal stability, and recency. The first experiment examines these strategies in classical top-K recommendations, while the second evaluates sequential in-context learning (ICL). In the first experiment, we assess seven distinct prompt scenarios on top-K recommendation accuracy and fairness. Accuracy-oriented prompts, like Simple and Chain-of-Thought (COT), outperform diversification prompts, which, despite enhancing temporal freshness, reduce accuracy by up to 50%. Embedding fairness into system roles, such as "act as a fair recommender," proved more effective than fairness directives within prompts. Diversification prompts led to recommending newer movies, offering broader genre distribution compared to traditional collaborative filtering (CF) models. The second experiment explores sequential ICL, comparing zero-shot and few-shot ICL. Results indicate that including user demographic information in prompts affects model biases and stereotypes. However, ICL did not consistently improve item fairness and catalog coverage over zero-shot learning. Zero-shot learning achieved higher NDCG and coverage, while ICL-2 showed slight improvements in hit rate (HR) when age-group context was included. Our study provides insights into biases of RecLLMs, particularly in provider fairness and catalog coverage. By examining prompt design, learning strategies, and system roles, we highlight the potential and challenges of integrating LLMs into recommendation systems. Further details can be found at https://github.com/yasdel/Benchmark_RecLLM_Fairness.

LGAug 17, 2020Code
How to Put Users in Control of their Data in Federated Top-N Recommendation with Learning to Rank

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia et al.

Recommendation services are extensively adopted in several user-centered applications as a tool to alleviate the information overload problem and help users in orienteering in a vast space of possible choices. In such scenarios, data ownership is a crucial concern since users may not be willing to share their sensitive preferences (e.g., visited locations) with a central server. Unfortunately, data harvesting and collection is at the basis of modern, state-of-the-art approaches to recommendation. To address this issue, we present FPL, an architecture in which users collaborate in training a central factorization model while controlling the amount of sensitive data leaving their devices. The proposed approach implements pair-wise learning-to-rank optimization by following the Federated Learning principles, originally conceived to mitigate the privacy risks of traditional machine learning. The public implementation is available at https://split.to/sisinflab-fpl.

IRApr 9, 2025
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models

Yashar Deldjoo, Nikhil Mehta, Maheswaran Sathiamoorthy et al. · amazon-science

Recommender systems powered by generative models (Gen-RecSys) extend beyond classical item ranking by producing open-ended content, which simultaneously unlocks richer user experiences and introduces new risks. On one hand, these systems can enhance personalization and appeal through dynamic explanations and multi-turn dialogues. On the other hand, they might venture into unknown territory-hallucinating nonexistent items, amplifying bias, or leaking private information. Traditional accuracy metrics cannot fully capture these challenges, as they fail to measure factual correctness, content safety, or alignment with user intent. This paper makes two main contributions. First, we categorize the evaluation challenges of Gen-RecSys into two groups: (i) existing concerns that are exacerbated by generative outputs (e.g., bias, privacy) and (ii) entirely new risks (e.g., item hallucinations, contradictory explanations). Second, we propose a holistic evaluation approach that includes scenario-based assessments and multi-metric checks-incorporating relevance, factual grounding, bias detection, and policy compliance. Our goal is to provide a guiding framework so researchers and practitioners can thoroughly assess Gen-RecSys, ensuring effective personalization and responsible deployment.

AIFeb 1, 2024
A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender Systems

Hossein A. Rahmani, Mohammadmehdi Naghiaei, Yashar Deldjoo

In recent years, there has been an increasing recognition that when machine learning (ML) algorithms are used to automate decisions, they may mistreat individuals or groups, with legal, ethical, or economic implications. Recommender systems are prominent examples of these machine learning (ML) systems that aid users in making decisions. The majority of past literature research on RS fairness treats user and item fairness concerns independently, ignoring the fact that recommender systems function in a two-sided marketplace. In this paper, we propose CP-FairRank, an optimization-based re-ranking algorithm that seamlessly integrates fairness constraints from both the consumer and producer side in a joint objective framework. The framework is generalizable and may take into account varied fairness settings based on group segmentation, recommendation model selection, and domain, which is one of its key characteristics. For instance, we demonstrate that the system may jointly increase consumer and producer fairness when (un)protected consumer groups are defined on the basis of their activity level and main-streamness, while producer groups are defined according to their popularity level. For empirical validation, through large-scale on eight datasets and four mainstream collaborative filtering (CF) recommendation models, we demonstrate that our proposed strategy is able to improve both consumer and producer fairness without compromising or very little overall recommendation quality, demonstrating the role algorithms may play in avoiding data biases.

IRApr 12, 2025
SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders

Ashmi Banerjee, Adithi Satish, Fitri Nur Aisyah et al.

Tourism Recommender Systems (TRS) are crucial in personalizing travel experiences by tailoring recommendations to users' preferences, constraints, and contextual factors. However, publicly available travel datasets often lack sufficient breadth and depth, limiting their ability to support advanced personalization strategies -- particularly for sustainable travel and off-peak tourism. In this work, we explore using Large Language Models (LLMs) to generate synthetic travel queries that emulate diverse user personas and incorporate structured filters such as budget constraints and sustainability preferences. This paper introduces a novel SynthTRIPs framework for generating synthetic travel queries using LLMs grounded in a curated knowledge base (KB). Our approach combines persona-based preferences (e.g., budget, travel style) with explicit sustainability filters (e.g., walkability, air quality) to produce realistic and diverse queries. We mitigate hallucination and ensure factual correctness by grounding the LLM responses in the KB. We formalize the query generation process and introduce evaluation metrics for assessing realism and alignment. Both human expert evaluations and automatic LLM-based assessments demonstrate the effectiveness of our synthetic dataset in capturing complex personalization aspects underrepresented in existing datasets. While our framework was developed and tested for personalized city trip recommendations, the methodology applies to other recommender system domains. Code and dataset are made public at https://bit.ly/synthTRIPs

LGMay 10, 2024
XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia et al.

Clinical decision support systems require models that are not only highly accurate but also equitable and sensitive to the implications of missed diagnoses. In this study, we introduce a knowledge-guided in-context learning (ICL) framework designed to enable large language models (LLMs) to effectively process structured clinical data. Our approach integrates domain-specific feature groupings, carefully balanced few-shot examples, and task-specific prompting strategies. We systematically evaluate this method across seventy distinct ICL designs by various prompt variations and two different communication styles-natural-language narrative and numeric conversational-and compare its performance to robust classical machine learning (ML) benchmarks on tasks involving heart disease and diabetes prediction. Our findings indicate that while traditional ML models maintain superior performance in balanced precision-recall scenarios, LLMs employing narrative prompts with integrated domain knowledge achieve higher recall and significantly reduce gender bias, effectively narrowing fairness disparities by an order of magnitude. Despite the current limitation of increased inference latency, LLMs provide notable advantages, including the capacity for zero-shot deployment and enhanced equity. This research offers the first comprehensive analysis of ICL design considerations for applying LLMs to tabular clinical tasks and highlights distillation and multimodal extensions as promising directions for future research.

IRMay 3, 2024
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System

Yashar Deldjoo, Fatemeh Nazary

The rapid adoption of large language models (LLMs) in recommender systems (RS) presents new challenges in understanding and evaluating their biases, which can result in unfairness or the amplification of stereotypes. Traditional fairness evaluations in RS primarily focus on collaborative filtering (CF) settings, which may not fully capture the complexities of LLMs, as these models often inherit biases from large, unregulated data. This paper proposes a normative framework to benchmark consumer fairness in LLM-powered recommender systems (RecLLMs). We critically examine how fairness norms in classical RS fall short in addressing the challenges posed by LLMs. We argue that this gap can lead to arbitrary conclusions about fairness, and we propose a more structured, formal approach to evaluate fairness in such systems. Our experiments on the MovieLens dataset on consumer fairness, using in-context learning (zero-shot vs. few-shot) reveal fairness deviations in age-based recommendations, particularly when additional contextual examples are introduced (ICL-2). Statistical significance tests confirm that these deviations are not random, highlighting the need for robust evaluation methods. While this work offers a preliminary discussion on a proposed normative framework, our hope is that it could provide a formal, principled approach for auditing and mitigating bias in RecLLMs. The code and dataset used for this work will be shared at "gihub-anonymized".

SIJan 29, 2025
Towards Recommender Systems LLMs Playground (RecSysLLMsP): Exploring Polarization and Engagement in Simulated Social Networks

Ljubisa Bojic, Zorica Dodevska, Yashar Deldjoo et al.

Given the exponential advancement in AI technologies and the potential escalation of harmful effects from recommendation systems, it is crucial to simulate and evaluate these effects early on. Doing so can help prevent possible damage to both societies and technology companies. This paper introduces the Recommender Systems LLMs Playground (RecSysLLMsP), a novel simulation framework leveraging Large Language Models (LLMs) to explore the impacts of different content recommendation setups on user engagement and polarization in social networks. By creating diverse AI agents (AgentPrompts) with descriptive, static, and dynamic attributes, we assess their autonomous behaviour across three scenarios: Plurality, Balanced, and Similarity. Our findings reveal that the Similarity Scenario, which aligns content with user preferences, maximizes engagement while potentially fostering echo chambers. Conversely, the Plurality Scenario promotes diverse interactions but produces mixed engagement results. Our study emphasizes the need for a careful balance in recommender system designs to enhance user satisfaction while mitigating societal polarization. It underscores the unique value and challenges of incorporating LLMs into simulation environments. The benefits of RecSysLLMsP lie in its potential to calculate polarization effects, which is crucial for assessing societal impacts and determining user engagement levels with diverse recommender system setups. This advantage is essential for developing and maintaining a successful business model for social media companies. However, the study's limitations revolve around accurately emulating reality. Future efforts should validate the similarity in behaviour between real humans and AgentPrompts and establish metrics for measuring polarization scores.

IRNov 20, 2025
Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation

Elena V. Epure, Yashar Deldjoo, Bruno Sguerra et al.

Music Recommender Systems (MRS) have long relied on an information-retrieval framing, where progress is measured mainly through accuracy on retrieval-oriented subtasks. While effective, this reductionist paradigm struggles to address the deeper question of what makes a good recommendation, and attempts to broaden evaluation, through user studies or fairness analyses, have had limited impact. The emergence of Large Language Models (LLMs) disrupts this framework: LLMs are generative rather than ranking-based, making standard accuracy metrics questionable. They also introduce challenges such as hallucinations, knowledge cutoffs, non-determinism, and opaque training data, rendering traditional train/test protocols difficult to interpret. At the same time, LLMs create new opportunities, enabling natural-language interaction and even allowing models to act as evaluators. This work argues that the shift toward LLM-driven MRS requires rethinking evaluation. We first review how LLMs reshape user modeling, item modeling, and natural-language recommendation in music. We then examine evaluation practices from NLP, highlighting methodologies and open challenges relevant to MRS. Finally, we synthesize insights-focusing on how LLM prompting applies to MRS, to outline a structured set of success and risk dimensions. Our goal is to provide the MRS community with an updated, pedagogical, and cross-disciplinary perspective on evaluation.

AIAug 20, 2025
Collab-REC: An LLM-based Agentic Framework for Balancing Recommendations in Tourism

Ashmi Banerjee, Adithi Satish, Fitri Nur Aisyah et al.

We propose Collab-REC, a multi-agent framework designed to counteract popularity bias and enhance diversity in tourism recommendations. In our setting, three LLM-based agents -- Personalization, Popularity, and Sustainability generate city suggestions from complementary perspectives. A non-LLM moderator then merges and refines these proposals via multi-round negotiation, ensuring each agent's viewpoint is incorporated while penalizing spurious or repeated responses. Experiments on European city queries show that Collab-REC improves diversity and overall relevance compared to a single-agent baseline, surfacing lesser-visited locales that often remain overlooked. This balanced, context-aware approach addresses over-tourism and better aligns with constraints provided by the user, highlighting the promise of multi-stakeholder collaboration in LLM-driven recommender systems.

IRFeb 27, 2022
The Unfairness of Active Users and Popularity Bias in Point-of-Interest Recommendation

Hossein A. Rahmani, Yashar Deldjoo, Ali Tourani et al.

Point-of-Interest (POI) recommender systems provide personalized recommendations to users and help businesses attract potential customers. Despite their success, recent studies suggest that highly data-driven recommendations could be impacted by data biases, resulting in unfair outcomes for different stakeholders, mainly consumers (users) and providers (items). Most existing fairness-related research works in recommender systems treat user fairness and item fairness issues individually, disregarding that RS work in a two-sided marketplace. This paper studies the interplay between (i) the unfairness of active users, (ii) the unfairness of popular items, and (iii) the accuracy (personalization) of recommendation as three angles of our study triangle. We group users into advantaged and disadvantaged levels to measure user fairness based on their activity level. For item fairness, we divide items into short-head, mid-tail, and long-tail groups and study the exposure of these item groups into the top-k recommendation list of users. Experimental validation of eight different recommendation models commonly used for POI recommendation (e.g., contextual, CF) on two publicly available POI recommendation datasets, Gowalla and Yelp, indicate that most well-performing models suffer seriously from the unfairness of popularity bias (provider unfairness). Furthermore, our study shows that most recommendation models cannot satisfy both consumer and producer fairness, indicating a trade-off between these variables possibly due to natural biases in data. We choose the POI recommendation as our test scenario; however, the insights should be trivially extendable on other domains.

IRFeb 6, 2022
A Review of Modern Fashion Recommender Systems

Yashar Deldjoo, Fatemeh Nazary, Arnau Ramisa et al.

The textile and apparel industries have grown tremendously over the last few years. Customers no longer have to visit many stores, stand in long queues, or try on garments in dressing rooms as millions of products are now available in online catalogs. However, given the plethora of options available, an effective recommendation system is necessary to properly sort, order, and communicate relevant product material or information to users. Effective fashion RS can have a noticeable impact on billions of customers' shopping experiences and increase sales and revenues on the provider side. The goal of this survey is to provide a review of recommender systems that operate in the specific vertical domain of garment and fashion products. We have identified the most pressing challenges in fashion RS research and created a taxonomy that categorizes the literature according to the objective they are trying to accomplish (e.g., item or outfit recommendation, size recommendation, explainability, among others) and type of side-information (users, items, context). We have also identified the most important evaluation goals and perspectives (outfit generation, outfit recommendation, pairing recommendation, and fill-in-the-blank outfit compatibility prediction) and the most commonly used datasets and evaluation metrics.

IROct 8, 2021
Simulations for novel problems in recommendation: analyzing misinformation and data characteristics

Alejandro Bellogín, Yashar Deldjoo

In this position paper, we discuss recent applications of simulation approaches for recommender systems tasks. In particular, we describe how they were used to analyze the problem of misinformation spreading and understand which data characteristics affect the performance of recommendation algorithms more significantly. We also present potential lines of future work where simulation methods could advance the work in the recommendation community.

IRJul 29, 2021
Understanding the Effects of Adversarial Personalized Ranking Optimization Method on Recommendation Quality

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia et al.

Recommender systems (RSs) employ user-item feedback, e.g., ratings, to match customers to personalized lists of products. Approaches to top-k recommendation mainly rely on Learning-To-Rank algorithms and, among them, the most widely adopted is Bayesian Personalized Ranking (BPR), which bases on a pair-wise optimization approach. Recently, BPR has been found vulnerable against adversarial perturbations of its model parameters. Adversarial Personalized Ranking (APR) mitigates this issue by robustifying BPR via an adversarial training procedure. The empirical improvements of APR's accuracy performance on BPR have led to its wide use in several recommender models. However, a key overlooked aspect has been the beyond-accuracy performance of APR, i.e., novelty, coverage, and amplification of popularity bias, considering that recent results suggest that BPR, the building block of APR, is sensitive to the intensification of biases and reduction of recommendation novelty. In this work, we model the learning characteristics of the BPR and APR optimization frameworks to give mathematical evidence that, when the feedback data have a tailed distribution, APR amplifies the popularity bias more than BPR due to an unbalanced number of received positive updates from short-head items. Using matrix factorization (MF), we empirically validate the theoretical results by performing preliminary experiments on two public datasets to compare BPR-MF and APR-MF performance on accuracy and beyond-accuracy metrics. The experimental results consistently show the degradation of novelty and coverage measures and a worrying amplification of bias.

IRJul 25, 2021
Content-driven Music Recommendation: Evolution, State of the Art, and Challenges

Yashar Deldjoo, Markus Schedl, Peter Knees

The music domain is among the most important ones for adopting recommender systems technology. In contrast to most other recommendation domains, which predominantly rely on collaborative filtering (CF) techniques, music recommenders have traditionally embraced content-based (CB) approaches. In the past years, music recommendation models that leverage collaborative and content data -- which we refer to as content-driven models -- have been replacing pure CF or CB models. In this survey, we review 55 articles on content-driven music recommendation. Based on a thorough literature analysis, we first propose an onion model comprising five layers, each of which corresponds to a category of music content we identified: signal, embedded metadata, expert-generated content, user-generated content, and derivative content. We provide a detailed characterization of each category along several dimensions. Second, we identify six overarching challenges, according to which we organize our main discussion: increasing recommendation diversity and novelty, providing transparency and explanations, accomplishing context-awareness, recommending sequences of music, improving scalability and efficiency, and alleviating cold start. Each article addresses one or more of these challenges is categorized according to the content layers of our onion model, the article's goal(s), and main methodological choices. Furthermore, articles are discussed in temporal order to shed light on the evolution of content-driven music recommendation strategies. Finally, we provide our personal selection of the persisting grand challenges which are still waiting to be solved in future research endeavors.

IRDec 15, 2020
FedeRank: User Controlled Feedback with Federated Recommender Systems

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia et al.

Recommender systems have shown to be a successful representative of how data availability can ease our everyday digital life. However, data privacy is one of the most prominent concerns in the digital era. After several data breaches and privacy scandals, the users are now worried about sharing their data. In the last decade, Federated Learning has emerged as a new privacy-preserving distributed machine learning paradigm. It works by processing data on the user device without collecting data in a central repository. We present FedeRank (https://split.to/federank), a federated recommendation algorithm. The system learns a personal factorization model onto every device. The training of the model is a synchronous process between the central server and the federated clients. FedeRank takes care of computing recommendations in a distributed fashion and allows users to control the portion of data they want to share. By comparing with state-of-the-art algorithms, extensive experiments show the effectiveness of FedeRank in terms of recommendation accuracy, even with a small portion of shared user data. Further analysis of the recommendation lists' diversity and novelty guarantees the suitability of the algorithm in real production environments.

IROct 3, 2020
Multi-Step Adversarial Perturbations on Recommender Systems Embeddings

Vito Walter Anelli, Alejandro Bellogín, Yashar Deldjoo et al.

Recommender systems (RSs) have attained exceptional performance in learning users' preferences and helping them in finding the most suitable products. Recent advances in adversarial machine learning (AML) in the computer vision domain have raised interests in the security of state-of-the-art model-based recommenders. Recently, worrying deterioration of recommendation accuracy has been acknowledged on several state-of-the-art model-based recommenders (e.g., BPR-MF) when machine-learned adversarial perturbations contaminate model parameters. However, while the single-step fast gradient sign method (FGSM) is the most explored perturbation strategy, multi-step (iterative) perturbation strategies, that demonstrated higher efficacy in the computer vision domain, have been highly under-researched in recommendation tasks. In this work, inspired by the basic iterative method (BIM) and the projected gradient descent (PGD) strategies proposed in the CV domain, we adapt the multi-step strategies for the item recommendation task to study the possible weaknesses of embedding-based recommender models under minimal adversarial perturbations. Letting the magnitude of the perturbation be fixed, we illustrate the highest efficacy of the multi-step perturbation compared to the single-step one with extensive empirical evaluation on two widely adopted recommender datasets. Furthermore, we study the impact of structural dataset characteristics, i.e., sparsity, density, and size, on the performance degradation issued by presented perturbations to support RS designer in interpreting recommendation performance variation due to minimal variations of model parameters. Our implementation and datasets are available at https://anonymous.4open.science/r/9f27f909-93d5-4016-b01c-8976b8c14bc5/.

LGJul 17, 2020
Prioritized Multi-Criteria Federated Learning

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia et al.

In Machine Learning scenarios, privacy is a crucial concern when models have to be trained with private data coming from users of a service, such as a recommender system, a location-based mobile service, a mobile phone text messaging service providing next word prediction, or a face image classification system. The main issue is that, often, data are collected, transferred, and processed by third parties. These transactions violate new regulations, such as GDPR. Furthermore, users usually are not willing to share private data such as their visited locations, the text messages they wrote, or the photo they took with a third party. On the other hand, users appreciate services that work based on their behaviors and preferences. In order to address these issues, Federated Learning (FL) has been recently proposed as a means to build ML models based on private datasets distributed over a large number of clients, while preventing data leakage. A federation of users is asked to train a same global model on their private data, while a central coordinating server receives locally computed updates by clients and aggregate them to obtain a better global model, without the need to use clients' actual data. In this work, we extend the FL approach by pushing forward the state-of-the-art approaches in the aggregation step of FL, which we deem crucial for building a high-quality global model. Specifically, we propose an approach that takes into account a suite of client-specific criteria that constitute the basis for assigning a score to each client based on a priority of criteria defined by the service provider. Extensive experiments on two publicly available datasets indicate the merits of the proposed approach compared to standard FL baseline.

IRMay 20, 2020
A survey on Adversarial Recommender Systems: from Attack/Defense strategies to Generative Adversarial Networks

Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra

Latent-factor models (LFM) based on collaborative filtering (CF), such as matrix factorization (MF) and deep CF methods, are widely used in modern recommender systems (RS) due to their excellent performance and recommendation accuracy. However, success has been accompanied with a major new arising challenge: many applications of machine learning (ML) are adversarial in nature. In recent years, it has been shown that these methods are vulnerable to adversarial examples, i.e., subtle but non-random perturbations designed to force recommendation models to produce erroneous outputs. The goal of this survey is two-fold: (i) to present recent advances on adversarial machine learning (AML) for the security of RS (i.e., attacking and defense recommendation models), (ii) to show another successful application of AML in generative adversarial networks (GANs) for generative applications, thanks to their ability for learning (high-dimensional) data distributions. In this survey, we provide an exhaustive literature review of 74 articles published in major RS and ML journals and conferences. This review serves as a reference for the RS community, working on the security of RS or on generative models using GANs to improve their quality.

IRAug 29, 2019
Towards Evaluating User Profiling Methods Based on Explicit Ratings on Item Features

Luca Luciano Costanzo, Yashar Deldjoo, Maurizio Ferrari Dacrema et al.

In order to improve the accuracy of recommendations, many recommender systems nowadays use side information beyond the user rating matrix, such as item content. These systems build user profiles as estimates of users' interest on content (e.g., movie genre, director or cast) and then evaluate the performance of the recommender system as a whole e.g., by their ability to recommend relevant and novel items to the target user. The user profile modelling stage, which is a key stage in content-driven RS is barely properly evaluated due to the lack of publicly available datasets that contain user preferences on content features of items. To raise awareness of this fact, we investigate differences between explicit user preferences and implicit user profiles. We create a dataset of explicit preferences towards content features of movies, which we release publicly. We then compare the collected explicit user feature preferences and implicit user profiles built via state-of-the-art user profiling models. Our results show a maximum average pairwise cosine similarity of 58.07\% between the explicit feature preferences and the implicit user profiles modelled by the best investigated profiling method and considering movies' genres only. For actors and directors, this maximum similarity is only 9.13\% and 17.24\%, respectively. This low similarity between explicit and implicit preference models encourages a more in-depth study to investigate and improve this important user profile modelling step, which will eventually translate into better recommendations.

IRAug 21, 2019
Assessing the Impact of a User-Item Collaborative Attack on Class of Users

Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra

Collaborative Filtering (CF) models lie at the core of most recommendation systems due to their state-of-the-art accuracy. They are commonly adopted in e-commerce and online services for their impact on sales volume and/or diversity, and their impact on companies' outcome. However, CF models are only as good as the interaction data they work with. As these models rely on outside sources of information, counterfeit data such as user ratings or reviews can be injected by attackers to manipulate the underlying data and alter the impact of resulting recommendations, thus implementing a so-called shilling attack. While previous works have focused on evaluating shilling attack strategies from a global perspective paying particular attention to the effect of the size of attacks and attacker's knowledge, in this work we explore the effectiveness of shilling attacks under novel aspects. First, we investigate the effect of attack strategies crafted on a target user in order to push the recommendation of a low-ranking item to a higher position, referred to as user-item attack. Second, we evaluate the effectiveness of attacks in altering the impact of different CF models by contemplating the class of the target user, from the perspective of the richness of her profile (i.e., cold v.s. warm user). Finally, similar to previous work we contemplate the size of attack (i.e., the amount of fake profiles injected) in examining their success. The results of experiments on two widely used datasets in business and movie domains, namely Yelp and MovieLens, suggest that warm and cold users exhibit contrasting behaviors in datasets with different characteristics.

LGAug 20, 2019
Towards Effective Device-Aware Federated Learning

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia et al.

With the wealth of information produced by social networks, smartphones, medical or financial applications, speculations have been raised about the sensitivity of such data in terms of users' personal privacy and data security. To address the above issues, Federated Learning (FL) has been recently proposed as a means to leave data and computational resources distributed over a large number of nodes (clients) where a central coordinating server aggregates only locally computed updates without knowing the original data. In this work, we extend the FL framework by pushing forward the state the art in the field on several dimensions: (i) unlike the original FedAvg approach relying solely on single criteria (i.e., local dataset size), a suite of domain- and client-specific criteria constitute the basis to compute each local client's contribution, (ii) the multi-criteria contribution of each device is computed in a prioritized fashion by leveraging a priority-aware aggregation operator used in the field of information retrieval, and (iii) a mechanism is proposed for online-adjustment of the aggregation operator parameters via a local search strategy with backtracking. Extensive experiments on a publicly available dataset indicate the merits of the proposed approach compared to standard FedAvg baseline.

IRAug 19, 2019
Recommender Systems Fairness Evaluation via Generalized Cross Entropy

Yashar Deldjoo, Vito Walter Anelli, Hamed Zamani et al.

Fairness in recommender systems has been considered with respect to sensitive attributes of users (e.g., gender, race) or items (e.g., revenue in a multistakeholder setting). Regardless, the concept has been commonly interpreted as some form of equality -- i.e., the degree to which the system is meeting the information needs of all its users in an equal sense. In this paper, we argue that fairness in recommender systems does not necessarily imply equality, but instead it should consider a distribution of resources based on merits and needs. We present a probabilistic framework based on generalized cross entropy to evaluate fairness of recommender systems under this perspective, where we show that the proposed framework is flexible and explanatory by allowing to incorporate domain knowledge (through an ideal fair distribution) that can help to understand which item or user aspects a recommendation algorithm is over- or under-representing. Results on two real-world datasets show the merits of the proposed evaluation framework both in terms of user and item fairness.

IRJul 31, 2019
Session-Based Hotel Recommendations: Challenges and Future Directions

Jens Adamczak, Gerard-Paul Leyson, Peter Knees et al.

In the year 2019, the Recommender Systems Challenge deals with a real-world task from the area of e-tourism for the first time, namely the recommendation of hotels in booking sessions. In this context, this article aims at identifying and investigating what we believe are important domain-specific challenges recommendation systems research in hotel search is facing, from both academic and industry perspectives. We focus on three main challenges, namely dealing with (1) multiple stakeholders and value-awareness in recommendations, (2) sparsity of user data and the extensive cold-start problem, and (3) dynamic input data and computational requirements. To this end, we review the state of the art toward solving these challenges and discuss shortcomings. We detail possible future directions and visions we contemplate for the further evolution of the field. This article should, therefore, serve two purposes: giving the interested reader an overview of current challenges in the field and inspiring new approaches for the ACM Recommender Systems Challenge 2019 and beyond.

IROct 9, 2017
Current Challenges and Visions in Music Recommender Systems Research

Markus Schedl, Hamed Zamani, Ching-Wei Chen et al.

Music recommender systems (MRS) have experienced a boom in recent years, thanks to the emergence and success of online streaming services, which nowadays make available almost all music in the world at the user's fingertip. While today's MRS considerably help users to find interesting music in these huge catalogs, MRS research is still facing substantial challenges. In particular when it comes to build, incorporate, and evaluate recommendation strategies that integrate information beyond simple user--item interactions or content-based descriptors, but dig deep into the very essence of listener needs, preferences, and intentions, MRS research becomes a big endeavor and related publications quite sparse. The purpose of this trends and survey article is twofold. We first identify and shed light on what we believe are the most pressing challenges MRS research is facing, from both academic and industry perspectives. We review the state of the art towards solving these challenges and discuss its limitations. Second, we detail possible future directions and visions we contemplate for the further evolution of the field. The article should therefore serve two purposes: giving the interested reader an overview of current challenges in MRS research and providing guidance for young researchers by identifying interesting, yet under-researched, directions in the field.

IRApr 20, 2017
Using Mise-En-Scène Visual Features based on MPEG-7 and Deep Learning for Movie Recommendation

Yashar Deldjoo, Massimo Quadrana, Mehdi Elahi et al.

Item features play an important role in movie recommender systems, where recommendations can be generated by using explicit or implicit preferences of users on traditional features (attributes) such as tag, genre, and cast. Typically, movie features are human-generated, either editorially (e.g., genre and cast) or by leveraging the wisdom of the crowd (e.g., tag), and as such, they are prone to noise and are expensive to collect. Moreover, these features are often rare or absent for new items, making it difficult or even impossible to provide good quality recommendations. In this paper, we show that user's preferences on movies can be better described in terms of the mise-en-scène features, i.e., the visual aspects of a movie that characterize design, aesthetics and style (e.g., colors, textures). We use both MPEG-7 visual descriptors and Deep Learning hidden layers as example of mise-en-scène features that can visually describe movies. Interestingly, mise-en-scène features can be computed automatically from video files or even from trailers, offering more flexibility in handling new items, avoiding the need for costly and error-prone human-based tagging, and providing good scalability. We have conducted a set of experiments on a large catalogue of 4K movies. Results show that recommendations based on mise-en-scène features consistently provide the best performance with respect to richer sets of more traditional features, such as genre and tag.

CVJul 30, 2016
Sparse vs. Non-sparse: Which One Is Better for Practical Visual Tracking?

Yashar Deldjoo, Shengping Zhang, Bahman Zanj et al.

Recently, sparse representation based visual tracking methods have attracted increasing attention in the computer vision community. Although achieve superior performance to traditional tracking methods, however, a basic problem has not been answered yet --- that whether the sparsity constrain is really needed for visual tracking? To answer this question, in this paper, we first propose a robust non-sparse representation based tracker and then conduct extensive experiments to compare it against several state-of-the-art sparse representation based trackers. Our experiment results and analysis indicate that the proposed non-sparse tracker achieved competitive tracking accuracy with sparse trackers while having faster running speed, which support our non-sparse tracker to be used in practical applications.