LGAICYAug 18, 2024

Say My Name: a Model's Bias Discovery Framework

arXiv:2408.09570v26 citationsh-index: 15
AI Analysis

This addresses the issue of interpretable bias discovery for non-expert users in machine learning, though it appears incremental as it builds on existing debiasing methods by adding semantic interpretation.

The paper tackles the problem of identifying biases in deep learning models by introducing SaMyNa, a tool that semantically detects biases learned by models, and demonstrates its effectiveness on traditional benchmarks for bias detection and disclaimer.

In the last few years, due to the broad applicability of deep learning to downstream tasks and end-to-end training capabilities, increasingly more concerns about potential biases to specific, non-representative patterns have been raised. Many works focusing on unsupervised debiasing usually leverage the tendency of deep models to learn ``easier'' samples, for example by clustering the latent space to obtain bias pseudo-labels. However, the interpretation of such pseudo-labels is not trivial, especially for a non-expert end user, as it does not provide semantic information about the bias features. To address this issue, we introduce ``Say My Name'' (SaMyNa), the first tool to identify biases within deep models semantically. Unlike existing methods, our approach focuses on biases learned by the model. Our text-based pipeline enhances explainability and supports debiasing efforts: applicable during either training or post-hoc validation, our method can disentangle task-related information and proposes itself as a tool to analyze biases. Evaluation on traditional benchmarks demonstrates its effectiveness in detecting biases and even disclaiming them, showcasing its broad applicability for model diagnosis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes