CV LGAug 24, 2024

Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations

Aditya Taparia, Som Sagar, Ransalu Senanayake

arXiv:2408.13438v37.64 citationsh-index: 21

Originality Incremental advance

AI Analysis

This work addresses the problem of reducing manual effort in concept-based explainable AI for practitioners, though it is incremental as it builds on existing vision-language models.

The paper tackles the labor-intensive process of manually creating concept image sets for explaining neural networks by framing it as an image generation problem, and it demonstrates that their reinforcement learning-based preference optimization method efficiently and reliably generates diverse concepts.

Understanding the inner representation of a neural network helps users improve models. Concept-based methods have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and manually collect multiple candidate concept image sets, making the process labor-intensive and prone to overlooking important concepts. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a standard generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization (RLPO) algorithm that fine-tunes a vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate our method's ability to efficiently and reliably articulate diverse concepts that are otherwise challenging to craft manually.

View on arXiv PDF

Similar