The Network Structure of Mathlib
For developers of formal mathematics libraries, this work quantifies structural inefficiencies and mismatches between human-designed taxonomies and logical dependencies.
The paper analyzes the dependency structure of Lean 4's Mathlib, revealing a 50.9% coupling across namespaces, a median of 1.6% imported scope usage, and network centrality capturing infrastructure rather than mathematical relevance.
The ongoing development of Lean 4's Mathlib has produced a macroscopic structural complexity that interweaves logical, mathematical, and infrastructural dependencies. We present a network analysis of this library, extracting its dependency structure into a multilayer graph of 308,129 declarations, 8.4 million edges, and 7,563 modules. By introducing graph decompositions that isolate explicit edges from those synthesized by the compiler or driven by proofs, we quantify the structural properties of formalized mathematics. Our analysis reveals three findings. First, taxonomies designed by humans diverge from logical structures, exhibiting a 50.9% coupling across namespaces. Second, developers utilize a median of 1.6% of the imported scope. Third, formalization compresses semantic hierarchies, with network centrality capturing language infrastructure rather than mathematical relevance.