LGOct 24, 2025

Mechanistic Interpretability for Neural TSP Solvers

Reuben Narad, Leonard Boussioux, Michael Wagner

arXiv:2510.21693v13 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the problem of interpretability in neural combinatorial optimization for operations research, offering insights into model-internal computations, though it is incremental as it adapts existing interpretability methods to a new domain.

The paper tackled the opacity of neural TSP solvers by applying sparse autoencoders to a Transformer-based model, revealing that the solver develops interpretable features like boundary detectors and cluster-sensitive features without explicit supervision.

Neural networks have advanced combinatorial optimization, with Transformer-based solvers achieving near-optimal solutions on the Traveling Salesman Problem (TSP) in milliseconds. However, these models operate as black boxes, providing no insight into the geometric patterns they learn or the heuristics they employ during tour construction. We address this opacity by applying sparse autoencoders (SAEs), a mechanistic interpretability technique, to a Transformer-based TSP solver, representing the first application of activation-based interpretability methods to operations research models. We train a pointer network with reinforcement learning on 100-node instances, then fit an SAE to the encoder's residual stream to discover an overcomplete dictionary of interpretable features. Our analysis reveals that the solver naturally develops features mirroring fundamental TSP concepts: boundary detectors that activate on convex-hull nodes, cluster-sensitive features responding to locally dense regions, and separator features encoding geometric partitions. These findings provide the first model-internal account of what neural TSP solvers compute before node selection, demonstrate that geometric structure emerges without explicit supervision, and suggest pathways toward transparent hybrid systems that combine neural efficiency with algorithmic interpretability. Interactive feature explorer: https://reubennarad.github.io/TSP_interp

View on arXiv PDF

Similar