LGAILOAug 21, 2025

Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models

arXiv:2508.15220v11 citationsh-index: 72ATVA
Originality Incremental advance
AI Analysis

This work addresses the scalability limitations in generating Pareto-optimal interpretations for black-box models, which is important for improving trustworthiness in AI applications, though it is incremental as it builds on existing multi-objective techniques.

The paper tackles the problem of synthesizing interpretations for black-box machine learning models by balancing accuracy and explainability, developing a framework that uses local optimality guarantees to achieve scalable synthesis, and demonstrates that it yields interpretations closely matching those from methods with global guarantees.

Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: accuracy and explainability. Exploring the trade-off between these objectives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been developed, they typically lack formal guarantees on the Pareto-optimality of the results. Methods that do provide such guarantees, on the other hand, often face severe scalability limitations when exploring the Pareto-optimal space. To address this, we develop a framework based on local optimality guarantees that enables more scalable synthesis of interpretations. Specifically, we consider the problem of synthesizing a set of Pareto-optimal interpretations with local optimality guarantees, within the immediate neighborhood of each solution. Our approach begins with a multi-objective learning or search technique, such as Multi-Objective Monte Carlo Tree Search, to generate a best-effort set of Pareto-optimal candidates with respect to accuracy and explainability. We then verify local optimality for each candidate as a Boolean satisfiability problem, which we solve using a SAT solver. We demonstrate the efficacy of our approach on a set of benchmarks, comparing it against previous methods for exploring the Pareto-optimal front of interpretations. In particular, we show that our approach yields interpretations that closely match those synthesized by methods offering global guarantees.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes