4.6LGMay 24
Continuous-Depth Field Theory for Transformer Patching and Mechanistic InterpretabilityDavid N. Olivieri, Antonio F. Pérez Rodríguez
Mechanistic interpretability often uses activation patching, causal tracing, path patching, and steering directions to reveal behaviorally meaningful directions in Transformer activation space. This paper develops a field-theoretic framework for organizing and predicting such interventions. Treating the residual stream as a depth-token field, we formulate patching as localized source insertion, patch effects as sensitivity-field predictions, downstream propagation as empirical Green-function response, and patch selection as an adjoint variational problem. Empirically, we test the forward response theory in GPT-2-style autoregressive Transformers by applying localized residual-field interventions and observing the induced residual-field differences and logit-difference responses. We identify a bounded local linear regime; predict patch effects from first-order sensitivities across residual sites; measure structured anisotropic propagation across depth and token position; construct response descriptions from high-sensitivity sites and sliced Green operators; and show that prompt-induced residual displacements can transfer answer behavior. These results establish response objects, namely sensitivities, propagated fields, and Green-operator slices, as a practical language for organizing patching experiments and as the forward mathematical basis for formulating patch-site inference and cross-scale transfer.formulated.
12.1AIMay 13
Sheaf-Theoretic Transport and Obstruction for Detecting Scientific Theory Shift in AI AgentsDavid N. Olivieri, Roque J. Hernández
Scientific theory shift in AI agents requires more than fitting equations to data. An artificial scientific agent must detect whether an existing representational framework remains transportable into a new regime, or whether its language has become locally-to-globally obstructed and must be extended. This paper develops a finite sheaf-theoretic framework for detecting theory-shift candidates through transport and obstruction. Contexts are organized as a local-to-global structure in which source, overlap, target, and validation charts are fitted, restricted, and tested for gluing. Obstruction measures failure of coherence through residual fit, overlap incompatibility, constraint violation, limiting-relation failure, and representational cost. We evaluate the framework on a controlled transition-card benchmark designed to separate deformation within a source language from extension of that language. The main result is direct obstruction ranking: the intended deformation or extension is usually the lowest-obstruction candidate, and transition type is separated in the benchmark. A constellation kernel over the same signatures is included only as a secondary representational-similarity probe. The aim is not to reconstruct historical paradigm shifts or solve open-ended autonomous theory invention, but to isolate a finite diagnostic subproblem for AI agents: detecting when representational transport fails and extension becomes the coherent next move.
0.0AIMay 7
Patch-Effect Graph Kernels for LLM InterpretabilityRuben Fernandez-Boullon, David N. Olivieri
Mechanistic interpretability aims to reverse-engineer transformer computations by identifying causal circuits through activation patching. However, scaling these interventions across diverse prompts and task families produces high-dimensional, unstructured datasets that are difficult to compare systematically. We propose a framework that reframes mechanistic analysis as a graph machine-learning problem by representing activation-patching profiles as patch-effect graphs over model components. We introduce three graph-construction methods: direct-influence via causal mediation, partial-correlation, and co-influence and apply graph kernels to analyze the resulting structures. Evaluating this approach on GPT-2 Small using Indirect Object Identification (IOI) and related tasks, we find that patch-effect graphs preserve discriminative structural signals. Specifically, localized edge-slot features provide higher classification accuracy than global graph-shape descriptors. A screened paired-patching validation suggests that CI and PC selected candidate edges correspond to stronger activation-influence effects than random or low-rank candidates. Crucially, by evaluating these representations against rigorous prompt-only and raw patch-effect controls, we make the evidential scope of the benchmark explicit: graph features compress structured patching signal, while raw tensors and surface cues define strong baselines that any circuit-level claim should address. Ultimately, our framework provides a compression and evaluation pipeline for comparing patching-derived structures under controlled baselines, separating robust slice-discriminative evidence from stronger task-general causal-circuit claims.
IRMar 26, 2014
KPCA Spatio-temporal trajectory point cloud classifier for recognizing human actions in a CBVR systemIván Gómez-Conde, David N. Olivieri
We describe a content based video retrieval (CBVR) software system for identifying specific locations of a human action within a full length film, and retrieving similar video shots from a query. For this, we introduce the concept of a trajectory point cloud for classifying unique actions, encoded in a spatio-temporal covariant eigenspace, where each point is characterized by its spatial location, local Frenet-Serret vector basis, time averaged curvature and torsion and the mean osculating hyperplane. Since each action can be distinguished by their unique trajectories within this space, the trajectory point cloud is used to define an adaptive distance metric for classifying queries against stored actions. Depending upon the distance to other trajectories, the distance metric uses either large scale structure of the trajectory point cloud, such as the mean distance between cloud centroids or the difference in hyperplane orientation, or small structure such as the time averaged curvature and torsion, to classify individual points in a fuzzy-KNN. Our system can function in real-time and has an accuracy greater than 93% for multiple action recognition within video repositories. We demonstrate the use of our CBVR system in two situations: by locating specific frame positions of trained actions in two full featured films, and video shot retrieval from a database with a web search application.