Topos and Stacks of Deep Neural Networks
This work provides a foundational mathematical theory for understanding DNNs, potentially impacting all of ML/AI by offering new tools for analyzing generalization and semantics, though it is theoretical and incremental in its application of existing mathematical concepts.
The paper establishes a mathematical framework linking deep neural networks (DNNs) to Grothendieck's topos and stacks, proposing that invariance structures in layers like CNNs or LSTMs correspond to these concepts and may explain generalization. It introduces semantic information measures and homotopical invariants for DNNs, classifying semantic structures using geometric fibrant objects in Quillen's model categories.
Every known artificial deep neural network (DNN) corresponds to an object in a canonical Grothendieck's topos; its learning dynamic corresponds to a flow of morphisms in this topos. Invariance structures in the layers (like CNNs or LSTMs) correspond to Giraud's stacks. This invariance is supposed to be responsible of the generalization property, that is extrapolation from learning data under constraints. The fibers represent pre-semantic categories (Culioli, Thom), over which artificial languages are defined, with internal logics, intuitionist, classical or linear (Girard). Semantic functioning of a network is its ability to express theories in such a language for answering questions in output about input data. Quantities and spaces of semantic information are defined by analogy with the homological interpretation of Shannon's entropy of P.Baudot and D.Bennequin in 2015). They generalize the measures found by Carnap and Bar-Hillel (1952). Amazingly, the above semantical structures are classified by geometric fibrant objects in a closed model category of Quillen, then they give rise to homotopical invariants of DNNs and of their semantic functioning. Intentional type theories (Martin-Loef) organize these objects and fibrations between them. Information contents and exchanges are analyzed by Grothendieck's derivators.