Alan Said

IR
h-index37
6papers
57citations
Novelty26%
AI Score39

6 Papers

8.9CYApr 24
Trust as a Situated User State in Social LLM-Based Chatbots: A Longitudinal Study of Snapchat's My AI

Annie Landerberg, Kari Flatmo, Alan Said

Social chatbots based on large language models are increasingly embedded in everyday platforms, yet how users develop trust in these systems over time remains unclear. We present a four-week longitudinal qualitative survey study (N = 27) of trust formation in Snapchat's My AI, a socially embedded conversational agent. Our findings show that trust is shaped by perceived ability, conversational behavior, human-likeness, transparency, privacy concerns, and trust in the host platform. Trust does not remain stable, but evolves through interaction as users adapt their expectations, refine their prompting strategies, and actively regulate how and when they rely on the system. These processes reflect a continuous negotiation of trust, not a one-time evaluation. While conversational fluency supports engagement, excessive anthropomorphism and limited transparency can undermine trust over time. We synthesize these findings into a conceptual model that frames trust as a dynamic user state shaped by interaction context and expectations, with implications for the design of human-centered and adaptive conversational agents.

LGSep 5, 2024
Understanding Fairness in Recommender Systems: A Healthcare Perspective

Veronica Kecki, Alan Said

Fairness in AI-driven decision-making systems has become a critical concern, especially when these systems directly affect human lives. This paper explores the public's comprehension of fairness in healthcare recommendations. We conducted a survey where participants selected from four fairness metrics -- Demographic Parity, Equal Accuracy, Equalized Odds, and Positive Predictive Value -- across different healthcare scenarios to assess their understanding of these concepts. Our findings reveal that fairness is a complex and often misunderstood concept, with a generally low level of public understanding regarding fairness metrics in recommender systems. This study highlights the need for enhanced information and education on algorithmic fairness to support informed decision-making in using these systems. Furthermore, the results suggest that a one-size-fits-all approach to fairness may be insufficient, pointing to the importance of context-sensitive designs in developing equitable AI systems.

IRNov 25, 2024
Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that Matters

Dietmar Jannach, Alan Said, Marko Tkalčič et al.

In the area of recommender systems, the vast majority of research efforts is spent on developing increasingly sophisticated recommendation models, also using increasingly more computational resources. Unfortunately, most of these research efforts target a very small set of application domains, mostly e-commerce and media recommendation. Furthermore, many of these models are never evaluated with users, let alone put into practice. The scientific, economic and societal value of much of these efforts by scholars therefore remains largely unclear. To achieve a stronger positive impact resulting from these efforts, we posit that we as a research community should more often address use cases where recommender systems contribute to societal good (RS4Good). In this opinion piece, we first discuss a number of examples where the use of recommender systems for problems of societal concern has been successfully explored in the literature. We then proceed by outlining a paradigmatic shift that is needed to conduct successful RS4Good research, where the key ingredients are interdisciplinary collaborations and longitudinal evaluation approaches with humans in the loop.

IRSep 11, 2025
We're Still Doing It (All) Wrong: Recommender Systems, Fifteen Years Later

Alan Said, Maria Soledad Pera, Michael D. Ekstrand

In 2011, Xavier Amatriain sounded the alarm: recommender systems research was "doing it all wrong" [1]. His critique, rooted in statistical misinterpretation and methodological shortcuts, remains as relevant today as it was then. But rather than correcting course, we added new layers of sophistication on top of the same broken foundations. This paper revisits Amatriain's diagnosis and argues that many of the conceptual, epistemological, and infrastructural failures he identified still persist, in more subtle or systemic forms. Drawing on recent work in reproducibility, evaluation methodology, environmental impact, and participatory design, we showcase how the field's accelerating complexity has outpaced its introspection. We highlight ongoing community-led initiatives that attempt to shift the paradigm, including workshops, evaluation frameworks, and calls for value-sensitive and participatory research. At the same time, we contend that meaningful change will require not only new metrics or better tooling, but a fundamental reframing of what recommender systems research is for, who it serves, and how knowledge is produced and validated. Our call is not just for technical reform, but for a recommender systems research agenda grounded in epistemic humility, human impact, and sustainable practice.

IRAug 7, 2025
On the Reliability of Sampling Strategies in Offline Recommender Evaluation

Bruno L. Pereira, Alan Said, Rodrygo L. T. Santos

Offline evaluation plays a central role in benchmarking recommender systems when online testing is impractical or risky. However, it is susceptible to two key sources of bias: exposure bias, where users only interact with items they are shown, and sampling bias, introduced when evaluation is performed on a subset of logged items rather than the full catalog. While prior work has proposed methods to mitigate sampling bias, these are typically assessed on fixed logged datasets rather than for their ability to support reliable model comparisons under varying exposure conditions or relative to true user preferences. In this paper, we investigate how different combinations of logging and sampling choices affect the reliability of offline evaluation. Using a fully observed dataset as ground truth, we systematically simulate diverse exposure biases and assess the reliability of common sampling strategies along four dimensions: sampling resolution (recommender model separability), fidelity (agreement with full evaluation), robustness (stability under exposure bias), and predictive power (alignment with ground truth). Our findings highlight when and how sampling distorts evaluation outcomes and offer practical guidance for selecting strategies that yield faithful and robust offline comparisons.

IRJan 31, 2021
Improving Accountability in Recommender Systems Research Through Reproducibility

Alejandro Bellogín, Alan Said

Reproducibility is a key requirement for scientific progress. It allows the reproduction of the works of others, and, as a consequence, to fully trust the reported claims and results. In this work, we argue that, by facilitating reproducibility of recommender systems experimentation, we indirectly address the issues of accountability and transparency in recommender systems research from the perspectives of practitioners, designers, and engineers aiming to assess the capabilities of published research works. These issues have become increasingly prevalent in recent literature. Reasons for this include societal movements around intelligent systems and artificial intelligence striving towards fair and objective use of human behavioral data (as in Machine Learning, Information Retrieval, or Human-Computer Interaction). Society has grown to expect explanations and transparency standards regarding the underlying algorithms making automated decisions for and around us. This work surveys existing definitions of these concepts, and proposes a coherent terminology for recommender systems research, with the goal to connect reproducibility to accountability. We achieve this by introducing several guidelines and steps that lead to reproducible and, hence, accountable experimental workflows and research. We additionally analyze several instantiations of recommender system implementations available in the literature, and discuss the extent to which they fit in the introduced framework. With this work, we aim to shed light on this important problem, and facilitate progress in the field by increasing the accountability of research.