CLSep 15, 2021

Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets

arXiv:2109.07494v1662 citations
Originality Incremental advance
AI Analysis

This work addresses calibration issues for tagging models with sparse tagsets, which is an incremental improvement in model interpretability for natural language processing applications.

The paper tackles the problem of measuring and reducing calibration error in probabilistic tagging models with sparse tagsets, showing that post-hoc recalibration techniques reduce calibration error across the marginal distribution for two existing sequence taggers, and proposing tag frequency grouping to achieve more equitable error reduction across frequency bands.

For interpreting the behavior of a probabilistic model, it is useful to measure a model's calibration--the extent to which it produces reliable confidence scores. We address the open problem of calibration for tagging models with sparse tagsets, and recommend strategies to measure and reduce calibration error (CE) in such models. We show that several post-hoc recalibration techniques all reduce calibration error across the marginal distribution for two existing sequence taggers. Moreover, we propose tag frequency grouping (TFG) as a way to measure calibration error in different frequency bands. Further, recalibrating each group separately promotes a more equitable reduction of calibration error across the tag frequency spectrum.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes