Everywhere Learning: Artificial Intelligence with Pointwise Constraints
This work provides a theoretical foundation for training AI systems to satisfy hard constraints everywhere, which is crucial for safety-critical applications where worst-case performance matters.
The paper introduces 'everywhere learning,' a new AI training paradigm that enforces loss constraints with probability one over the data distribution, rather than minimizing average loss. The authors develop an approximate duality theory and generalization analysis, demonstrating that dual variables reweigh data towards difficult points and that generalization is controlled by distributional mismatch, with an L1 penalty on constraint relaxations.
Everywhere learning is a new paradigm whereby Artificial Intelligence (AI) systems are trained to satisfy loss constraints with probability one over the data distribution. This is in contrast to the standard paradigm of training AI systems to minimize average losses. We develop an approximate duality theory to substantiate a generalization analysis that establishes the proximity between solutions of empirical and statistical everywhere learning problems. Our results show that dual variables reweigh the data distribution towards points in which loss constraints are more difficult to satisfy and that generalization is controlled by the mismatch between the concentration of mass of the data distribution and the concentration of mass on points where constraints are more difficult to satisfy. We further show that we can control generalization with a sparse L1 penalty on constraint relaxations. We illustrate the merits of everywhere learning with an experiment in agentic classification for language model tasks.