CLSep 15, 2021

Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets

Michael Kranzlein, Nelson F. Liu, Nathan Schneider

arXiv:2109.07494v130.7662 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses calibration issues for tagging models with sparse tagsets, which is an incremental improvement in model interpretability for natural language processing applications.

The paper tackles the problem of measuring and reducing calibration error in probabilistic tagging models with sparse tagsets, showing that post-hoc recalibration techniques reduce calibration error across the marginal distribution for two existing sequence taggers, and proposing tag frequency grouping to achieve more equitable error reduction across frequency bands.

For interpreting the behavior of a probabilistic model, it is useful to measure a model's calibration--the extent to which it produces reliable confidence scores. We address the open problem of calibration for tagging models with sparse tagsets, and recommend strategies to measure and reduce calibration error (CE) in such models. We show that several post-hoc recalibration techniques all reduce calibration error across the marginal distribution for two existing sequence taggers. Moreover, we propose tag frequency grouping (TFG) as a way to measure calibration error in different frequency bands. Further, recalibrating each group separately promotes a more equitable reduction of calibration error across the tag frequency spectrum.

View on arXiv PDF Code

Similar