Irina Shklovski

CY
h-index49
6papers
86citations
Novelty40%
AI Score28

6 Papers

CYAug 12, 2023
Ground Truth Or Dare: Factors Affecting The Creation Of Medical Datasets For Training AI

Hubert D. Zając, Natalia R. Avlona, Tariq O. Andersen et al.

One of the core goals of responsible AI development is ensuring high-quality training datasets. Many researchers have pointed to the importance of the annotation step in the creation of high-quality data, but less attention has been paid to the work that enables data annotation. We define this work as the design of ground truth schema and explore the challenges involved in the creation of datasets in the medical domain even before any annotations are made. Based on extensive work in three health-tech organisations, we describe five external and internal factors that condition medical dataset creation processes. Three external factors include regulatory constraints, the context of creation and use, and commercial and operational pressures. These factors condition medical data collection and shape the ground truth schema design. Two internal factors include epistemic differences and limits of labelling. These directly shape the design of the ground truth schema. Discussions of what constitutes high-quality data need to pay attention to the factors that shape and constrain what is possible to be created, to ensure responsible AI design.

HCFeb 13, 2025
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking

Greta Warren, Irina Shklovski, Isabelle Augenstein

The pervasiveness of large language models and generative AI in online media has amplified the need for effective automated fact-checking to assist fact-checkers in tackling the increasing volume and sophistication of misinformation. The complex nature of fact-checking demands that automated fact-checking systems provide explanations that enable fact-checkers to scrutinise their outputs. However, it is unclear how these explanations should align with the decision-making and reasoning processes of fact-checkers to be effectively integrated into their workflows. Through semi-structured interviews with fact-checking professionals, we bridge this gap by: (i) providing an account of how fact-checkers assess evidence, make decisions, and explain their processes; (ii) examining how fact-checkers use automated tools in practice; and (iii) identifying fact-checker explanation requirements for automated fact-checking tools. The findings show unmet explanation needs and identify important criteria for replicable fact-checking explanations that trace the model's reasoning path, reference specific evidence, and highlight uncertainty and information gaps.

CLMay 23, 2025
Explaining Sources of Uncertainty in Automated Fact-Checking

Jingyi Sun, Greta Warren, Irina Shklovski et al.

Understanding sources of a model's uncertainty regarding its predictions is crucial for effective human-AI collaboration. Prior work proposes using numerical uncertainty or hedges ("I'm not sure, but ..."), which do not explain uncertainty that arises from conflicting evidence, leaving users unable to resolve disagreements or rely on the output. We introduce CLUE (Conflict-and-Agreement-aware Language-model Uncertainty Explanations), the first framework to generate natural language explanations of model uncertainty by (i) identifying relationships between spans of text that expose claim-evidence or inter-evidence conflicts and agreements that drive the model's predictive uncertainty in an unsupervised way, and (ii) generating explanations via prompting and attention steering that verbalize these critical interactions. Across three language models and two fact-checking datasets, we show that CLUE produces explanations that are more faithful to the model's uncertainty and more consistent with fact-checking decisions than prompting for uncertainty explanations without span-interaction guidance. Human evaluators judge our explanations to be more helpful, more informative, less redundant, and more logically consistent with the input than this baseline. CLUE requires no fine-tuning or architectural changes, making it plug-and-play for any white-box language model. By explicitly linking uncertainty to evidence conflicts, it offers practical support for fact-checking and generalises readily to other tasks that require reasoning over complex information.

AIMar 24, 2024
Public Perceptions of Fairness Metrics Across Borders

Yuya Sasaki, Sohei Tokuno, Haruka Maeda et al.

Which fairness metrics are appropriately applicable in your contexts? There may be instances of discordance regarding the perception of fairness, even when the outcomes comply with established fairness metrics. Several questionnaire-based surveys have been conducted to evaluate fairness metrics with human perceptions of fairness. However, these surveys were limited in scope, including only a few hundred participants within a single country. In this study, we conduct an international survey to evaluate public perceptions of various fairness metrics in decision-making scenarios. We collected responses from 1,000 participants in each of China, France, Japan, and the United States, amassing a total of 4,000 participants, to analyze the preferences of fairness metrics. Our survey consists of three distinct scenarios paired with four fairness metrics. This investigation explores the relationship between personal attributes and the choice of fairness metrics, uncovering a significant influence of national context on these preferences.

CYDec 13, 2021
Can Machine Learning be Moral?

Miguel Sicart, Irina Shklovski, Mirabelle Jones

The ethics of Machine Learning has become an unavoidable topic in the AI Community. The deployment of machine learning systems in multiple social contexts has resulted in a closer ethical scrutiny of the design, development, and application of these systems. The AI/ML community has come to terms with the imperative to think about the ethical implications of machine learning, not only as a product but also as a practice (Birhane, 2021; Shen et al. 2021). The critical question that is troubling many debates is what can constitute an ethically accountable machine learning system. In this paper we explore possibilities for ethical evaluation of machine learning methodologies. We scrutinize techniques, methods and technical practices in machine learning from a relational ethics perspective, taking into consideration how machine learning systems are part of the world and how they relate to different forms of agency. Taking a page from Phil Agre (1997) we use the notion of a critical technical practice as a means of analysis of machine learning approaches. Our radical proposal is that supervised learning appears to be the only machine learning method that is ethically defensible.

HCJun 25, 2019
Peril v. Promise: IoT and the Ethical Imaginaries

Funda Ustek-Spilda, Alison Powell, Irina Shklovski et al.

The future scenarios often associated with Internet of Things (IoT) oscillate between the peril of IoT for the future of humanity and the promises for an ever-connected and efficient future. Such a dichotomous positioning creates problems not only for expanding the field of application of the technology, but also ensuring ethical and responsible design and production. As part of VirtEU (Values and Ethics in Innovation for Responsible Technology in Europe) (EU Horizon 2020 FP7), we have conducted ethnographic research into the main hubs of IoT in Europe, such as London, Amsterdam, Barcelona and Belgrade, with developers and designers of IoT to identify the challenges they face in their day-to-day work. In this paper, we focus on the IoT and the ethical imaginaries explore the practical challenges IoT developers face when they are designing, producing and marketing IoT technologies. We argue that top-down ethical frameworks that overlook the situated capabilities of developers or the solutionist approaches that treat ethical issues as technical problems are unlikely to provide an alternative to the dichotomous imaginary for the future.