LGMLMay 29, 2021

Understanding Instance-based Interpretability of Variational Auto-Encoders

arXiv:2105.14203v430 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of interpretability in unsupervised learning for researchers and practitioners, though it is incremental as it adapts existing methods to a new context.

The paper tackles the problem of understanding instance-based interpretability for unsupervised learning, specifically for variational auto-encoders (VAEs), by investigating influence functions and introducing VAE-TracIn, a computationally efficient solution, with evaluation on real-world datasets.

Instance-based interpretation methods have been widely studied for supervised learning methods as they help explain how black box neural networks predict. However, instance-based interpretations remain ill-understood in the context of unsupervised learning. In this paper, we investigate influence functions [Koh and Liang, 2017], a popular instance-based interpretation method, for a class of deep generative models called variational auto-encoders (VAE). We formally frame the counter-factual question answered by influence functions in this setting, and through theoretical analysis, examine what they reveal about the impact of training samples on classical unsupervised learning methods. We then introduce VAE- TracIn, a computationally efficient and theoretically sound solution based on Pruthi et al. [2020], for VAEs. Finally, we evaluate VAE-TracIn on several real world datasets with extensive quantitative and qualitative analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes