Neural network layers as parametric spans
This work provides a foundational mathematical framework for neural network layers, potentially benefiting researchers in machine learning theory by offering a unified approach to layer definition and differentiation.
The authors tackled the challenge of mathematically defining increasingly complex neural network layers by introducing a general definition of linear layers based on categorical frameworks, integration theory, and parametric spans, which generalizes classical layers and ensures derivative computability for backpropagation.
Properties such as composability and automatic differentiation made artificial neural networks a pervasive tool in applications. Tackling more challenging problems caused neural networks to progressively become more complex and thus difficult to define from a mathematical perspective. We present a general definition of linear layer arising from a categorical framework based on the notions of integration theory and parametric spans. This definition generalizes and encompasses classical layers (e.g., dense, convolutional), while guaranteeing existence and computability of the layer's derivatives for backpropagation.