AILGSep 29, 2025

TDHook: A Lightweight Framework for Interpretability

arXiv:2509.25475v1h-index: 1Has Code
Originality Incremental advance
AI Analysis

This provides a practical tool for researchers and practitioners in fields like computer vision, NLP, and reinforcement learning who need interpretability for complex models, though it is incremental as it builds on existing methods.

The paper tackles the problem of applying interpretability methods to complex deep learning models with multiple inputs/outputs or composed networks, which often don't fit existing frameworks, by introducing TDHook, a lightweight framework that achieves up to a 2x speed-up over Captum in benchmarks.

Interpretability of Deep Neural Networks (DNNs) is a growing field driven by the study of vision and language models. Yet, some use cases, like image captioning, or domains like Deep Reinforcement Learning (DRL), require complex modelling, with multiple inputs and outputs or use composable and separated networks. As a consequence, they rarely fit natively into the API of popular interpretability frameworks. We thus present TDHook, an open-source, lightweight, generic interpretability framework based on $\texttt{tensordict}$ and applicable to any $\texttt{torch}$ model. It focuses on handling complex composed models which can be trained for Computer Vision, Natural Language Processing, Reinforcement Learning or any other domain. This library features ready-to-use methods for attribution, probing and a flexible get-set API for interventions, and is aiming to bridge the gap between these method classes to make modern interpretability pipelines more accessible. TDHook is designed with minimal dependencies, requiring roughly half as much disk space as $\texttt{transformer_lens}$, and, in our controlled benchmark, achieves up to a $\times$2 speed-up over $\texttt{captum}$ when running integrated gradients for multi-target pipelines on both CPU and GPU. In addition, to value our work, we showcase concrete use cases of our library with composed interpretability pipelines in Computer Vision (CV) and Natural Language Processing (NLP), as well as with complex models in DRL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes