LG AISep 20, 2024

The FIX Benchmark: Extracting Features Interpretable to eXperts

Helen Jin, Shreya Havaldar, Chaehyeon Kim, Anton Xue, Weiqiu You, Helen Qu, Marco Gatti, Daniel A Hashimoto, Bhuvnesh Jain, Amin Madani, Masao Sako, Lyle Ungar

arXiv:2409.13684v410.45 citationsh-index: 20Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge for domain experts in interpreting model predictions when interpretable features are not readily available, though it is incremental as it focuses on benchmarking rather than proposing new methods.

The paper tackles the problem of automatically extracting features aligned with expert knowledge in high-dimensional data, presenting the FIX benchmark and FIXScore measure, which reveals that popular feature-based explanation methods have poor alignment with expert-specified knowledge across domains like cosmology, psychology, and medicine.

Feature-based methods are commonly used to explain model predictions, but these methods often implicitly assume that interpretable features are readily available. However, this is often not the case for high-dimensional data, and it can be hard even for domain experts to mathematically specify which features are important. Can we instead automatically extract collections or groups of features that are aligned with expert knowledge? To address this gap, we present FIX (Features Interpretable to eXperts), a benchmark for measuring how well a collection of features aligns with expert knowledge. In collaboration with domain experts, we propose FIXScore, a unified expert alignment measure applicable to diverse real-world settings across cosmology, psychology, and medicine domains in vision, language, and time series data modalities. With FIXScore, we find that popular feature-based explanation methods have poor alignment with expert-specified knowledge, highlighting the need for new methods that can better identify features interpretable to experts.

View on arXiv PDF Code

Similar