xGEMs: Generating Examplars to Explain Black-Box Models
This provides a method for interpreting black-box models, which is important for AI transparency and fairness, though it appears incremental as it builds on existing manifold and generative modeling approaches.
The authors tackled the problem of understanding black-box classifier behavior by proposing xGEMs, a framework that uses an unsupervised implicit generative model as a proxy to explore the data manifold and perturb samples across decision boundaries, enabling quantitative detection of model bias and analysis of training progression.
This work proposes xGEMs or manifold guided exemplars, a framework to understand black-box classifier behavior by exploring the landscape of the underlying data manifold as data points cross decision boundaries. To do so, we train an unsupervised implicit generative model -- treated as a proxy to the data manifold. We summarize black-box model behavior quantitatively by perturbing data samples along the manifold. We demonstrate xGEMs' ability to detect and quantify bias in model learning and also for understanding the changes in model behavior as training progresses.