IRFeb 24Code
RMIT-ADM+S at the MMU-RAG NeurIPS 2025 CompetitionKun Ran, Marwah Alaofi, Danula Hettiachchi et al.
This paper presents the award-winning RMIT-ADM+S system for the Text-to-Text track of the NeurIPS~2025 MMU-RAG Competition. We introduce Routing-to-RAG (R2RAG), a research-focused retrieval-augmented generation (RAG) architecture composed of lightweight components that dynamically adapt the retrieval strategy based on inferred query complexity and evidence sufficiency. The system uses smaller LLMs, enabling operation on a single consumer-grade GPU while supporting complex research tasks. It builds on the G-RAG system, winner of the ACM~SIGIR~2025 LiveRAG Challenge, and extends it with modules informed by qualitative review of outputs. R2RAG won the Best Dynamic Evaluation award in the Open Source category, demonstrating high effectiveness with careful design and efficient use of resources.
HCMar 2, 2023
Helpful, Misleading or Confusing: How Humans Perceive Fundamental Building Blocks of Artificial Intelligence ExplanationsEdward Small, Yueqing Xuan, Danula Hettiachchi et al.
Explainable artificial intelligence techniques are developed at breakneck speed, but suitable evaluation approaches lag behind. With explainers becoming increasingly complex and a lack of consensus on how to assess their utility, it is challenging to judge the benefit and effectiveness of different explanations. To address this gap, we take a step back from sophisticated predictive algorithms and instead look into explainability of simple decision-making models. In this setting, we aim to assess how people perceive comprehensibility of their different representations such as mathematical formulation, graphical representation and textual summarisation (of varying complexity and scope). This allows us to capture how diverse stakeholders -- engineers, researchers, consumers, regulators and the like -- judge intelligibility of fundamental concepts that more elaborate artificial intelligence explanations are built from. This position paper charts our approach to establishing appropriate evaluation methodology as well as a conceptual and practical framework to facilitate setting up and executing relevant user studies.
45.6IRMay 13
Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language ModelsXinye Wanyan, Chenglong Ma, Danula Hettiachchi et al.
Large Language Model (LLM)-based agent simulation has emerged as a promising approach to meet the increasing demand for real-time and rigorous evaluation in modern recommender systems. A typical LLM-driven simulation framework comprises three essential components: the profile module, memory module, and action module. However, existing studies have primarily concentrated on enhancing the memory and action modules, with limited attention to profile generation, which plays a pivotal role in ensuring realistic agent behaviours and aligning simulated interactions with real user dynamics. Moreover, the scarcity of datasets specifically designed for recommendation simulations has led to heavy reliance on manually crafted profiles, significantly limiting the scalability and generalisability of simulation frameworks across different datasets. To address these challenges, this work proposes an Automated Profile Generation Framework for Recommendation Simulation, APG4RecSim, that constructs realistic, coherent, and robust user profiles with minimal supervision. Extensive experiments on three benchmark datasets demonstrate that APG4RecSim achieves the best overall performance on discrimination, ranking, and rating tasks, improving ranking quality by up to 7% in nDCG@10 and reducing rating distribution divergence by 8% in JSD compared to existing profile-generation baselines. Beyond overall performance gains, our results show that profiles generated by APG4RecSim are resilient to popularity- and position-induced biases and maintain stable performance across datasets and different LLMs.
CVDec 15, 2021Code
Does a Face Mask Protect my Privacy?: Deep Learning to Predict Protected Attributes from Masked Face ImagesSachith Seneviratne, Nuran Kasthuriarachchi, Sanka Rasnayaka et al.
Contactless and efficient systems are implemented rapidly to advocate preventive methods in the fight against the COVID-19 pandemic. Despite the positive benefits of such systems, there is potential for exploitation by invading user privacy. In this work, we analyse the privacy invasiveness of face biometric systems by predicting privacy-sensitive soft-biometrics using masked face images. We train and apply a CNN based on the ResNet-50 architecture with 20,003 synthetic masked images and measure the privacy invasiveness. Despite the popular belief of the privacy benefits of wearing a mask among people, we show that there is no significant difference to privacy invasiveness when a mask is worn. In our experiments we were able to accurately predict sex (94.7%),race (83.1%) and age (MAE 6.21 and RMSE 8.33) from masked face images. Our proposed approach can serve as a baseline utility to evaluate the privacy-invasiveness of artificial intelligence systems that make use of privacy-sensitive information. We open-source all contributions for re-producibility and broader use by the research community.
13.6HCMar 12
Applying Value Sensitive Design to Location-Based Services: Designing for Shared Spaces and Local ConditionsHiruni Kegalle, Flora D. Salim, Mark Sanderson et al.
Location-Based Services (LBS) such as ride-sharing, accommodation, food delivery, and location-driven social media platforms entangle digital systems with physical spaces, thereby generating impacts that extend beyond users to others who share the same environments. Existing design approaches struggle to address the dual challenge of value tensions that arise in shared physical spaces and the locality-specific contexts in which LBS operate. To respond, we introduce Location-Aware Value Sensitive Design (LA-VSD), a domain-specific adaptation of VSD tailored to the distinctive characteristics of LBS. LA-VSD guides designers through three heuristics to help (1) identify and prioritise stakeholders through local space-sharing scenarios, (2) adapt empirical methods to capture values and tensions in context, and (3) support value-aligned interactions across both digital and physical layers of the service. Through a case study of e-scooter sharing in Melbourne, Australia, we demonstrate how LA-VSD enables more grounded, context-aware, and actionable design of LBS.
HCNov 29, 2021
Proceedings of the CSCW 2021 Workshop -- Investigating and Mitigating Biases in Crowdsourced DataDanula Hettiachchi, Mark Sanderson, Jorge Goncalves et al.
This volume contains the position papers presented at CSCW 2021 Workshop - Investigating and Mitigating Biases in Crowdsourced Data, held online on 23rd October 2021, at the 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2021). The workshop explored how specific crowdsourcing workflows, worker attributes, and work practices contribute to biases in data. The workshop also included discussions on research directions to mitigate labelling biases, particularly in a crowdsourced context, and the implications of such methods for the workers.
HCNov 15, 2021
A Survey on Task Assignment in CrowdsourcingDanula Hettiachchi, Vassilis Kostakos, Jorge Goncalves
Quality improvement methods are essential to gathering high-quality crowdsourced data, both for research and industry applications. A popular and broadly applicable method is task assignment that dynamically adjusts crowd workflow parameters. In this survey, we review task assignment methods that address: heterogeneous task assignment, question assignment, and plurality problems in crowdsourcing. We discuss and contrast how these methods estimate worker performance, and highlight potential challenges in their implementation. Finally, we discuss future research directions for task assignment methods, and how crowdsourcing platforms and other stakeholders can benefit from them.
HCMay 20, 2021
The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can HelpDanula Hettiachchi, Mike Schaekermann, Tristan McKinney et al.
We consider a class of variable effort human annotation tasks in which the number of labels required per item can greatly vary (e.g., finding all faces in an image, named entities in a text, bird calls in an audio recording, etc.). In such tasks, some items require far more effort than others to annotate. Furthermore, the per-item annotation effort is not known until after each item is annotated since determining the number of labels required is an implicit part of the annotation task itself. On an image bounding-box task with crowdsourced annotators, we show that annotator accuracy and recall consistently drop as effort increases. We hypothesize reasons for this drop and investigate a set of approaches to counteract it. Firstly, we benchmark on this task a set of general best-practice methods for quality crowdsourcing. Notably, only one of these methods actually improves quality: the use of visible gold questions that provide periodic feedback to workers on their accuracy as they work. Given these promising results, we then investigate and evaluate variants of the visible gold approach, yielding further improvement. Final results show a 7% improvement in bounding-box accuracy over the baseline. We discuss the generality of the visible gold approach and promising directions for future research.