OrthoGrad Improves Neural Calibration
This addresses overconfidence in uncertainty-critical applications for machine learning practitioners, offering an incremental improvement through a novel optimization method.
The paper tackled the problem of overconfidence in neural networks by introducing OrthoGrad, a geometry-aware modification to gradient-based optimization that constrains descent directions, resulting in statistically significant improvements in test loss, predictive entropy, and confidence measures on CIFAR-10 with 10% labeled data while matching SGD in accuracy.
We study $\perp$Grad, a geometry-aware modification to gradient-based optimization that constrains descent directions to address overconfidence, a key limitation of standard optimizers in uncertainty-critical applications. By enforcing orthogonality between gradient updates and weight vectors, $\perp$Grad alters optimization trajectories without architectural changes. On CIFAR-10 with 10% labeled data, $\perp$Grad matches SGD in accuracy while achieving statistically significant improvements in test loss ($p=0.05$), predictive entropy ($p=0.001$), and confidence measures. These effects show consistent trends across corruption levels and architectures. $\perp$Grad is optimizer-agnostic, incurs minimal overhead, and remains compatible with post-hoc calibration techniques. Theoretically, we characterize convergence and stationary points for a simplified $\perp$Grad variant, revealing that orthogonalization constrains loss reduction pathways to avoid confidence inflation and encourage decision-boundary improvements. Our findings suggest that geometric interventions in optimization can improve predictive uncertainty estimates at low computational cost.