From Robustness to Improved Generalization and Calibration in Pre-trained Language Models
This work addresses the challenge of making pre-trained language models more reliable and effective for natural language processing tasks, though it is incremental as it builds on existing robustness techniques from computer vision.
The authors tackled the problem of improving generalization and calibration in pre-trained language models by introducing a two-phase regularization method called JacHess, which enhances representation smoothness through Jacobian and Hessian regularization, resulting in significant improvements on the GLUE benchmark compared to unregularized fine-tuning and other methods.
Enhancing generalization and uncertainty quantification in pre-trained language models (PLMs) is crucial for their effectiveness and reliability. Building on machine learning research that established the importance of robustness for improving generalization, we investigate the role of representation smoothness, achieved via Jacobian and Hessian regularization, in enhancing PLM performance. Although such regularization methods have proven effective in computer vision, their application in natural language processing (NLP), where PLM inputs are derived from a discrete domain, poses unique challenges. We introduce a novel two-phase regularization approach, JacHess, which minimizes the norms of the Jacobian and Hessian matrices within PLM intermediate representations relative to their inputs. Our evaluation using the GLUE benchmark demonstrates that JacHess significantly improves in-domain generalization and calibration in PLMs, outperforming unregularized fine-tuning and other similar regularization methods.