LGAIMLOct 9, 2019

Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data

arXiv:1910.04214v215 citations
Originality Incremental advance
AI Analysis

This addresses accountability in machine learning, which is crucial as failures become more common, though it is incremental in providing a formal method for an existing challenge.

The paper tackles the problem of attributing responsibility for poor model performance on subpopulations between the learning algorithm and training data, proposing Extended Shapley as a principled framework to quantify their joint contributions.

A learning algorithm $A$ trained on a dataset $D$ is revealed to have poor performance on some subpopulation at test time. Where should the responsibility for this lay? It can be argued that the data is responsible, if for example training $A$ on a more representative dataset $D'$ would have improved the performance. But it can similarly be argued that $A$ itself is at fault, if training a different variant $A'$ on the same dataset $D$ would have improved performance. As ML becomes widespread and such failure cases more common, these types of questions are proving to be far from hypothetical. With this motivation in mind, in this work we provide a rigorous formulation of the joint credit assignment problem between a learning algorithm $A$ and a dataset $D$. We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes