Differentiable Causal Computations via Delayed Trace
This work addresses foundational issues in machine learning for researchers, offering a theoretical framework for differentiable causal computations, though it is incremental in extending existing categorical methods.
The paper tackles the problem of modeling causal computations with feedback in category theory by introducing a 'delayed trace' operation, and it constructs a differential operator for these computations that enables backpropagation through time without unrolling, providing properties like a chain rule and Schwartz theorem.
We investigate causal computations taking sequences of inputs to sequences of outputs where the $n$th output depends on the first $n$ inputs only. We model these in category theory via a construction taking a Cartesian category $C$ to another category $St(C)$ with a novel trace-like operation called "delayed trace", which misses yanking and dinaturality axioms of the usual trace. The delayed trace operation provides a feedback mechanism in $St(C)$ with an implicit guardedness guarantee. When $C$ is equipped with a Cartesian differential operator, we construct a differential operator for $St(C)$ using an abstract version of backpropagation through time, a technique from machine learning based on unrolling of functions. This obtains a swath of properties for backpropagation through time, including a chain rule and Schwartz theorem. Our differential operator is also able to compute the derivative of a stateful network without requiring the network to be unrolled.