Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings
This work addresses the problem of achieving interpretable and verifiable AI for high-dimensional inputs like images, though it appears incremental by building on existing PVG and CBM frameworks.
The paper tackles the challenge of applying Prover-Verifier Games to high-dimensional images by introducing the Neural Concept Verifier, which combines concept encodings with nonlinear predictors, resulting in improved performance over Concept Bottleneck Models and pixel-based baselines on complex datasets.
While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, Concept Bottleneck Models (CBMs) effectively translate such data into interpretable concepts but are limited by their reliance on low-capacity linear predictors. In this work, we introduce the Neural Concept Verifier (NCV), a unified framework combining PVGs with concept encodings for interpretable, nonlinear classification in high-dimensional settings. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier -- implemented as a nonlinear predictor -- uses exclusively for decision-making. Our evaluations show that NCV outperforms CBM and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and also helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward performative, verifiable AI.