LGCLMay 23, 2023

Decoupled Rationalization with Asymmetric Learning Rates: A Flexible Lipschitz Restraint

arXiv:2305.13599v318 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in interpretable NLP models, offering an incremental improvement for researchers and practitioners in the field.

The paper tackles the degeneration problem in self-explaining rationalization models, where the predictor overfits to uninformative pieces from a poorly trained generator, leading to sub-optimal rationale selection; the proposed method, DR, uses asymmetric learning rates to decouple the generator and predictor, effectively restraining the predictor's Lipschitz constant and improving performance on two benchmarks.

A self-explaining rationalization model is generally constructed by a cooperative game where a generator selects the most human-intelligible pieces from the input text as rationales, followed by a predictor that makes predictions based on the selected rationales. However, such a cooperative game may incur the degeneration problem where the predictor overfits to the uninformative pieces generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces. In this paper, we theoretically bridge degeneration with the predictor's Lipschitz continuity. Then, we empirically propose a simple but effective method named DR, which can naturally and flexibly restrain the Lipschitz constant of the predictor, to address the problem of degeneration. The main idea of DR is to decouple the generator and predictor to allocate them with asymmetric learning rates. A series of experiments conducted on two widely used benchmarks have verified the effectiveness of the proposed method. Codes: \href{https://github.com/jugechengzi/Rationalization-DR}{https://github.com/jugechengzi/Rationalization-DR}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes