Implicit Bilevel Optimization: Differentiating through Bilevel Optimization Programming
This addresses the need for modeling complex interactions in areas like Robust AI or Privacy-preserving AI, representing an incremental advancement over prior single-level methods.
The paper tackles the integration of bilevel optimization programming into deep learning by proposing BiGrad, a method for end-to-end learning with bilevel programming as a layer, which successfully extends existing single-level approaches.
Bilevel Optimization Programming is used to model complex and conflicting interactions between agents, for example in Robust AI or Privacy-preserving AI. Integrating bilevel mathematical programming within deep learning is thus an essential objective for the Machine Learning community. Previously proposed approaches only consider single-level programming. In this paper, we extend existing single-level optimization programming approaches and thus propose Differentiating through Bilevel Optimization Programming (BiGrad) for end-to-end learning of models that use Bilevel Programming as a layer. BiGrad has wide applicability and can be used in modern machine learning frameworks. BiGrad is applicable to both continuous and combinatorial Bilevel optimization problems. We describe a class of gradient estimators for the combinatorial case which reduces the requirements in terms of computation complexity; for the case of the continuous variable, the gradient computation takes advantage of the push-back approach (i.e. vector-jacobian product) for an efficient implementation. Experiments show that the BiGrad successfully extends existing single-level approaches to Bilevel Programming.