Alessandro Bozzon

LG
h-index32
15papers
258citations
Novelty34%
AI Score40

15 Papers

LGMay 9, 2022
Towards a multi-stakeholder value-based assessment framework for algorithmic systems

Mireia Yurrita, Dave Murray-Rust, Agathe Balayn et al.

In an effort to regulate Machine Learning-driven (ML) systems, current auditing processes mostly focus on detecting harmful algorithmic biases. While these strategies have proven to be impactful, some values outlined in documents dealing with ethics in ML-driven systems are still underrepresented in auditing processes. Such unaddressed values mainly deal with contextual factors that cannot be easily quantified. In this paper, we develop a value-based assessment framework that is not limited to bias auditing and that covers prominent ethical principles for algorithmic systems. Our framework presents a circular arrangement of values with two bipolar dimensions that make common motivations and potential tensions explicit. In order to operationalize these high-level principles, values are then broken down into specific criteria and their manifestations. However, some of these value-specific criteria are mutually exclusive and require negotiation. As opposed to some other auditing frameworks that merely rely on ML researchers' and practitioners' input, we argue that it is necessary to include stakeholders that present diverse standpoints to systematically negotiate and consolidate value and criteria tensions. To that end, we map stakeholders with different insight needs, and assign tailored means for communicating value manifestations to them. We, therefore, contribute to current ML auditing practices with an assessment framework that visualizes closeness and tensions between values and we give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.

88.1HCApr 21
"Label from Somewhere": Reflexive Annotating for Situated AI Alignment

Anne Arzberger, Celine Offerman, Ujwal Gadiraju et al.

AI alignment relies on annotator judgments, yet annotation pipelines often treat annotators as interchangeable, obscuring how their social position shapes annotation. We introduce reflexive annotating as a probe that invites crowd workers to reflect on how their positionality informs subjective annotation judgments in a language model alignment context. Through a qualitative study with crowd workers (N=30) and follow-up interviews (N=5), we examine how our probe shapes annotators' behaviour, experience, and the situated metadata it elicits. We find that reflexive annotating captures epistemic metadata beyond static demographics by eliciting intersectional reasoning, surfacing positional humility, and nudging viewpoint change. Crucially, we also denote tensions between reflexive engagement and affective demands such as emotional exposure. We discuss the implications of our work for richer value elicitation and alignment practices that treat annotator judgments as situated and selectively integrate positional metadata.

LGJul 19, 2022
Metadata Representations for Queryable ML Model Zoos

Ziyu Li, Rihan Hai, Alessandro Bozzon et al.

Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the ML models and datasets that are useful for reporting, auditing, reproducibility, and interpretability purposes. The metatada is currently not standardised; its expressivity is limited; and there is no interoperable way to store and query it. Consequently, model search, reuse, comparison, and composition are hindered. In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.

HCSep 30, 2024
Factory Operators' Perspectives on Cognitive Assistants for Knowledge Sharing: Challenges, Risks, and Impact on Work

Samuel Kernan Freire, Tianhao He, Chaofan Wang et al.

In the shift towards human-centered manufacturing, our two-year longitudinal study investigates the real-world impact of deploying Cognitive Assistants (CAs) in factories. The CAs were designed to facilitate knowledge sharing among factory operators. Our investigation focused on smartphone-based voice assistants and LLM-powered chatbots, examining their usability and utility in a real-world factory setting. Based on the qualitative feedback we collected during the deployments of CAs at the factories, we conducted a thematic analysis to investigate the perceptions, challenges, and overall impact on workflow and knowledge sharing. Our results indicate that while CAs have the potential to significantly improve efficiency through knowledge sharing and quicker resolution of production issues, they also introduce concerns around workplace surveillance, the types of knowledge that can be shared, and shortcomings compared to human-to-human knowledge sharing. Additionally, our findings stress the importance of addressing privacy, knowledge contribution burdens, and tensions between factory operators and their managers.

CVMar 1
Data-Efficient Brushstroke Generation with Diffusion Models for Oil Painting

Dantong Qin, Alessandro Bozzon, Xian Yang et al.

Many creative multimedia systems are built upon visual primitives such as strokes or textures, which are difficult to collect at scale and fundamentally different from natural image data. This data scarcity makes it challenging for modern generative models to learn expressive and controllable primitives, limiting their use in process-aware content creation. In this work, we study the problem of learning human-like brushstroke generation from a small set of hand-drawn samples (n=470) and propose StrokeDiff, a diffusion-based framework with Smooth Regularization (SmR). SmR injects stochastic visual priors during training, providing a simple mechanism to stabilize diffusion models under sparse supervision without altering the inference process. We further show how the learned primitives can be made controllable through a Bézier-based conditioning module and integrated into a complete stroke-based painting pipeline, including prediction, generation, ordering, and compositing. This demonstrates how data-efficient primitive modeling can support expressive and structured multimedia content creation. Experiments indicate that the proposed approach produces diverse and structurally coherent brushstrokes and enables paintings with richer texture and layering, validated by both automatic metrics and human evaluation.

LGApr 5, 2024
Model Selection with Model Zoo via Graph Learning

Ziyu Li, Hilco van der Wilk, Danning Zhan et al.

Pre-trained deep learning (DL) models are increasingly accessible in public repositories, i.e., model zoos. Given a new prediction task, finding the best model to fine-tune can be computationally intensive and costly, especially when the number of pre-trained models is large. Selecting the right pre-trained models is crucial, yet complicated by the diversity of models from various model families (like ResNet, Vit, Swin) and the hidden relationships between models and datasets. Existing methods, which utilize basic information from models and datasets to compute scores indicating model performance on target datasets, overlook the intrinsic relationships, limiting their effectiveness in model selection. In this study, we introduce TransferGraph, a novel framework that reformulates model selection as a graph learning problem. TransferGraph constructs a graph using extensive metadata extracted from models and datasets, while capturing their inherent relationships. Through comprehensive experiments across 16 real datasets, both images and texts, we demonstrate TransferGraph's effectiveness in capturing essential model-dataset relationships, yielding up to a 32% improvement in correlation between predicted performance and the actual fine-tuning results compared to the state-of-the-art methods.

AIOct 5, 2021
Empowering Local Communities Using Artificial Intelligence

Yen-Chia Hsu, Ting-Hao 'Kenneth' Huang, Himanshu Verma et al.

Artificial Intelligence (AI) is increasingly used to analyze large amounts of data in various practices, such as object recognition. We are specifically interested in using AI-powered systems to engage local communities in developing plans or solutions for pressing societal and environmental concerns. Such local contexts often involve multiple stakeholders with different and even contradictory agendas, resulting in mismatched expectations of these systems' behaviors and desired outcomes. There is a need to investigate if AI models and pipelines can work as expected in different contexts through co-creation and field deployment. Based on case studies in co-creating AI-powered systems with local people, we explain challenges that require more attention and provide viable paths to bridge AI research with citizen needs. We advocate for developing new collaboration approaches and mindsets that are needed to co-create AI-powered systems in multi-stakeholder contexts to address local concerns.

LGJan 21, 2021
Active Hybrid Classification

Evgeny Krivosheev, Fabio Casati, Alessandro Bozzon

Hybrid crowd-machine classifiers can achieve superior performance by combining the cost-effectiveness of automatic classification with the accuracy of human judgment. This paper shows how crowd and machines can support each other in tackling classification problems. Specifically, we propose an architecture that orchestrates active learning and crowd classification and combines them in a virtuous cycle. We show that when the pool of items to classify is finite we face learning vs. exploitation trade-off in hybrid classification, as we need to balance crowd tasks optimized for creating a training dataset with tasks optimized for classifying items in the pool. We define the problem, propose a set of heuristics and evaluate the approach on three real-world datasets with different characteristics in terms of machine and crowd classification performance, showing that our active hybrid approach significantly outperforms baselines.

IRNov 11, 2020
Active Learning from Crowd in Document Screening

Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon et al.

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling -- for querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.

IROct 27, 2020
Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics

Tim Draws, Nava Tintarev, Ujwal Gadiraju et al.

The way pages are ranked in search results influences whether the users of search engines are exposed to more homogeneous, or rather to more diverse viewpoints. However, this viewpoint diversity is not trivial to assess. In this paper we use existing and novel ranking fairness metrics to evaluate viewpoint diversity in search result rankings. We conduct a controlled simulation study that shows how ranking fairness metrics can be used for viewpoint diversity, how their outcome should be interpreted, and which metric is most suitable depending on the situation. This paper lays out important ground work for future research to measure and assess viewpoint diversity in real search result rankings.

LGNov 6, 2019
Designing Evaluations of Machine Learning Models for Subjective Inference: The Case of Sentence Toxicity

Agathe Balayn, Alessandro Bozzon

Machine Learning (ML) is increasingly applied in real-life scenarios, raising concerns about bias in automatic decision making. We focus on bias as a notion of opinion exclusion, that stems from the direct application of traditional ML pipelines to infer subjective properties. We argue that such ML systems should be evaluated with subjectivity and bias in mind. Considering the lack of evaluation standards yet to create evaluation benchmarks, we propose an initial list of specifications to define prior to creating evaluation datasets, in order to later accurately evaluate the biases. With the example of a sentence toxicity inference system, we illustrate how the specifications support the analysis of biases related to subjectivity. We highlight difficulties in instantiating these specifications and list future work for the crowdsourcing community to help the creation of appropriate evaluation datasets.

LGNov 6, 2019
Unfairness towards subjective opinions in Machine Learning

Agathe Balayn, Alessandro Bozzon, Zoltan Szlavik

Despite the high interest for Machine Learning (ML) in academia and industry, many issues related to the application of ML to real-life problems are yet to be addressed. Here we put forward one limitation which arises from a lack of adaptation of ML models and datasets to specific applications. We formalise a new notion of unfairness as exclusion of opinions. We propose ways to quantify this unfairness, and aid understanding its causes through visualisation. These insights into the functioning of ML-based systems hint at methods to mitigate unfairness.

IRSep 5, 2017
Interacting Attention-gated Recurrent Networks for Recommendation

Wenjie Pei, Jie Yang, Zhu Sun et al.

Capturing the temporal dynamics of user preferences over items is important for recommendation. Existing methods mainly assume that all time steps in user-item interaction history are equally relevant to recommendation, which however does not apply in real-world scenarios where user-item interactions can often happen accidentally. More importantly, they learn user and item dynamics separately, thus failing to capture their joint effects on user-item interactions. To better model user and item dynamics, we present the Interacting Attention-gated Recurrent Network (IARN) which adopts the attention model to measure the relevance of each time step. In particular, we propose a novel attention scheme to learn the attention scores of user and item history in an interacting way, thus to account for the dependencies between user and item dynamics in shaping user-item interactions. By doing so, IARN can selectively memorize different time steps of a user's history when predicting her preferences over different items. Our model can therefore provide meaningful interpretations for recommendation results, which could be further enhanced by auxiliary features. Extensive validation on real-world datasets shows that IARN consistently outperforms state-of-the-art methods.

SIJun 30, 2017
Chatbots as Conversational Recommender Systems in Urban Contexts

Pavel Kucherbaev, Achilleas Psyllidis, Alessandro Bozzon

In this paper, we outline the vision of chatbots that facilitate the interaction between citizens and policy-makers at the city scale. We report the results of a co-design session attended by more than 60 participants. We give an outlook of how some challenges associated with such chatbot systems could be addressed in the future.

SIFeb 11, 2017
On the Improvement of Quality and Reliability of Trust Cues in Micro-task Crowdsourcing (Position paper)

Jie Yang, Alessandro Bozzon

Micro-task crowdsourcing has become a successful mean to obtain high-quality data from a large crowd of diverse people. In this context, trust between all the involved actors (i.e. requesters, workers, and platform owners) is a critical factor for acceptance and long-term success. As actors have no expectation for "real life" meetings, thus trust can only be attributed through computer-mediated trust cues like workers qualifications and requester ratings. Such cues are often the result of technical or social assessments that are performed in isolation, considering only a subset of relevant properties, and with asynchronous and asymmetrical interactions. In this paper, we advocate for a new generation of micro-task crowdsourcing systems that pursue an holistic understanding of trust, by offering an open, transparent, privacy-friendly, and socially-aware view on the all the actors of a micro-task crowdsourcing environment.