Yuri Nakao

AI
h-index6
8papers
92citations
Novelty41%
AI Score33

8 Papers

AIJun 1, 2022
Towards Responsible AI: A Design Space Exploration of Human-Centered Artificial Intelligence User Interfaces to Investigate Fairness

Yuri Nakao, Lorenzo Strappelli, Simone Stumpf et al.

With Artificial intelligence (AI) to aid or automate decision-making advancing rapidly, a particular concern is its fairness. In order to create reliable, safe and trustworthy systems through human-centred artificial intelligence (HCAI) design, recent efforts have produced user interfaces (UIs) for AI experts to investigate the fairness of AI models. In this work, we provide a design space exploration that supports not only data scientists but also domain experts to investigate AI fairness. Using loan applications as an example, we held a series of workshops with loan officers and data scientists to elicit their requirements. We instantiated these requirements into FairHIL, a UI to support human-in-the-loop fairness investigations, and describe how this UI could be generalized to other use cases. We evaluated FairHIL through a think-aloud user study. Our work contributes better designs to investigate an AI model's fairness-and move closer towards responsible AI.

HCApr 22, 2022
Towards Involving End-users in Interactive Human-in-the-loop AI Fairness

Yuri Nakao, Simone Stumpf, Subeida Ahmed et al.

Ensuring fairness in artificial intelligence (AI) is important to counteract bias and discrimination in far-reaching applications. Recent work has started to investigate how humans judge fairness and how to support machine learning (ML) experts in making their AI models fairer. Drawing inspiration from an Explainable AI (XAI) approach called \emph{explanatory debugging} used in interactive machine learning, our work explores designing interpretable and interactive human-in-the-loop interfaces that allow ordinary end-users without any technical or domain background to identify potential fairness issues and possibly fix them in the context of loan decisions. Through workshops with end-users, we co-designed and implemented a prototype system that allowed end-users to see why predictions were made, and then to change weights on features to "debug" fairness issues. We evaluated the use of this prototype system through an online study. To investigate the implications of diverse human values about fairness around the globe, we also explored how cultural dimensions might play a role in using this prototype. Our results contribute to the design of interfaces to allow end-users to be involved in judging and addressing AI fairness through a human-in-the-loop approach.

AIJul 16, 2024
EARN Fairness: Explaining, Asking, Reviewing, and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders

Lin Luo, Yuri Nakao, Mathieu Chollet et al.

Numerous fairness metrics have been proposed and employed by artificial intelligence (AI) experts to quantitatively measure bias and define fairness in AI models. Recognizing the need to accommodate stakeholders' diverse fairness understandings, efforts are underway to solicit their input. However, conveying AI fairness metrics to stakeholders without AI expertise, capturing their personal preferences, and seeking a collective consensus remain challenging and underexplored. To bridge this gap, we propose a new framework, EARN Fairness, which facilitates collective metric decisions among stakeholders without requiring AI expertise. The framework features an adaptable interactive system and a stakeholder-centered EARN Fairness process to Explain fairness metrics, Ask stakeholders' personal metric preferences, Review metrics collectively, and Negotiate a consensus on metric selection. To gather empirical results, we applied the framework to a credit rating scenario and conducted a user study involving 18 decision subjects without AI knowledge. We identify their personal metric preferences and their acceptable level of unfairness in individual sessions. Subsequently, we uncovered how they reached metric consensus in team sessions. Our work shows that the EARN Fairness framework enables stakeholders to express personal preferences and reach consensus, providing practical guidance for implementing human-centered AI fairness in high-risk contexts. Through this approach, we aim to harmonize fairness expectations of diverse stakeholders, fostering more equitable and inclusive AI fairness.

AIDec 13, 2023
Human-in-the-loop Fairness: Integrating Stakeholder Feedback to Incorporate Fairness Perspectives in Responsible AI

Evdoxia Taka, Yuri Nakao, Ryosuke Sonoda et al.

Fairness is a growing concern for high-risk decision-making using Artificial Intelligence (AI) but ensuring it through purely technical means is challenging: there is no universally accepted fairness measure, fairness is context-dependent, and there might be conflicting perspectives on what is considered fair. Thus, involving stakeholders, often without a background in AI or fairness, is a promising avenue. Research to directly involve stakeholders is in its infancy, and many questions remain on how to support stakeholders to feedback on fairness, and how this feedback can be integrated into AI models. Our work follows an approach where stakeholders can give feedback on specific decision instances and their outcomes with respect to their fairness, and then to retrain an AI model. In order to investigate this approach, we conducted two studies of a complex AI model for credit rating used in loan applications. In study 1, we collected feedback from 58 lay users on loan application decisions, and conducted offline experiments to investigate the effects on accuracy and fairness metrics. In study 2, we deepened this investigation by showing 66 participants the results of their feedback with respect to fairness, and then conducted further offline analyses. Our work contributes two datasets and associated code frameworks to bootstrap further research, highlights the opportunities and challenges of employing lay user feedback for improving AI fairness, and discusses practical implications for developing AI applications that more closely reflect stakeholder views about fairness.

CYMar 3, 2025
Towards Multi-Stakeholder Evaluation of ML Models: A Crowdsourcing Study on Metric Preferences in Job-matching System

Takuya Yokota, Yuri Nakao

While machine learning (ML) technology affects diverse stakeholders, there is no one-size-fits-all metric to evaluate the quality of outputs, including performance and fairness. Using predetermined metrics without soliciting stakeholder opinions is problematic because it leads to an unfair disregard for stakeholders in the ML pipeline. In this study, to establish practical ways to incorporate diverse stakeholder opinions into the selection of metrics for ML, we investigate participants' preferences for different metrics by using crowdsourcing. We ask 837 participants to choose a better model from two hypothetical ML models in a hypothetical job-matching system twenty times and calculate their utility values for seven metrics. To examine the participants' feedback in detail, we divide them into five clusters based on their utility values and analyze the tendencies of each cluster, including their preferences for metrics and common attributes. Based on the results, we discuss the points that should be considered when selecting appropriate metrics and evaluating ML models with multiple stakeholders.

AISep 22, 2025
"I think this is fair'': Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment

Lin Luo, Yuri Nakao, Mathieu Chollet et al.

Assessing fairness in artificial intelligence (AI) typically involves AI experts who select protected features, fairness metrics, and set fairness thresholds. However, little is known about how stakeholders, particularly those affected by AI outcomes but lacking AI expertise, assess fairness. To address this gap, we conducted a qualitative study with 30 stakeholders without AI expertise, representing potential decision subjects in a credit rating scenario, to examine how they assess fairness when placed in the role of deciding on features with priority, metrics, and thresholds. We reveal that stakeholders' fairness decisions are more complex than typical AI expert practices: they considered features far beyond legally protected features, tailored metrics for specific contexts, set diverse yet stricter fairness thresholds, and even preferred designing customized fairness. Our results extend the understanding of how stakeholders can meaningfully contribute to AI fairness governance and mitigation, underscoring the importance of incorporating stakeholders' nuanced fairness judgments.

AIMay 12, 2025
Accountability of Generative AI: Exploring a Precautionary Approach for "Artificially Created Nature"

Yuri Nakao

The rapid development of generative artificial intelligence (AI) technologies raises concerns about the accountability of sociotechnical systems. Current generative AI systems rely on complex mechanisms that make it difficult for even experts to fully trace the reasons behind the outputs. This paper first examines existing research on AI transparency and accountability and argues that transparency is not a sufficient condition for accountability but can contribute to its improvement. We then discuss that if it is not possible to make generative AI transparent, generative AI technology becomes ``artificially created nature'' in a metaphorical sense, and suggest using the precautionary principle approach to consider AI risks. Finally, we propose that a platform for citizen participation is needed to address the risks of generative AI.

LGOct 26, 2020
One-vs.-One Mitigation of Intersectional Bias: A General Method to Extend Fairness-Aware Binary Classification

Kenji Kobayashi, Yuri Nakao

With the widespread adoption of machine learning in the real world, the impact of the discriminatory bias has attracted attention. In recent years, various methods to mitigate the bias have been proposed. However, most of them have not considered intersectional bias, which brings unfair situations where people belonging to specific subgroups of a protected group are treated worse when multiple sensitive attributes are taken into consideration. To mitigate this bias, in this paper, we propose a method called One-vs.-One Mitigation by applying a process of comparison between each pair of subgroups related to sensitive attributes to the fairness-aware machine learning for binary classification. We compare our method and the conventional fairness-aware binary classification methods in comprehensive settings using three approaches (pre-processing, in-processing, and post-processing), six metrics (the ratio and difference of demographic parity, equalized odds, and equal opportunity), and two real-world datasets (Adult and COMPAS). As a result, our method mitigates the intersectional bias much better than conventional methods in all the settings. With the result, we open up the potential of fairness-aware binary classification for solving more realistic problems occurring when there are multiple sensitive attributes.