General Loss Functions Lead to (Approximate) Interpolation in High Dimensions
This provides a unified theoretical framework for understanding optimization behavior in machine learning, though it is incremental as it builds on prior work.
The paper tackles the problem of characterizing the implicit bias of gradient descent for general convex losses in overparameterized settings, showing that it approximates minimum-norm interpolation in high dimensions, similar to squared loss results, with new approximate equivalences derived.
We provide a unified framework that applies to a general family of convex losses across binary and multiclass settings in the overparameterized regime to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work, which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques and use our results to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.