Lean Formalization of Generalization Error Bound by Rademacher Complexity
This work provides a rigorous, machine-checkable proof for a fundamental theoretical result in machine learning, addressing the need for formal verification in the field.
The authors formalized the generalization error bound using Rademacher complexity in the Lean 4 theorem prover, based on Mathlib 4's probability theory, achieving this for the first time and including an application to L^2-regularization models.
We formalize the generalization error bound using the Rademacher complexity for the Lean 4 theorem prover based on the probability theory in the Mathlib 4 library. Generalization error quantifies the gap between a learning machine's performance on given training data versus unseen test data, and the Rademacher complexity is a powerful tool to upper-bound the generalization error of a variety of modern learning problems. Previous studies have only formalized extremely simple cases such as bounds by parameter counts and analyses for very simple models (decision stumps). Formalizing the Rademacher complexity bound, also known as the uniform law of large numbers, requires substantial development and is achieved for the first time in this study. In the course of development, we formalize the Rademacher complexity and its unique arguments such as symmetrization, and clarify the topological assumptions on hypothesis classes under which the bound holds. As an application, we also present the formalization of generalization error bound for $L^2$-regularization models.