CLAIMar 17

Uncertainty Estimation for the Open-Set Text Classification systems

arXiv:2604.0856073.0h-index: 2Has Code
Predicted impact top 85% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses the problem of building robust and trustworthy recognition systems for text classification, particularly in open-set scenarios, though it is incremental as it adapts an existing method to a new domain.

The paper tackles uncertainty estimation for open-set text classification by adapting the Holistic Uncertainty Estimation method to address text and gallery uncertainties, achieving improvements of 40-365% in Prediction Rejection Ratio over a baseline across multiple datasets.

Accurate uncertainty estimation is essential for building robust and trustworthy recognition systems. In this paper, we consider the open-set text classification (OSTC) task - and uncertainty estimation for it. For OSTC a text sample should be classified as one of the existing classes or rejected as unknown. To account for the different uncertainty types encountered in OSTC, we adapt the Holistic Uncertainty Estimation (HolUE) method for the text domain. Our approach addresses two major causes of prediction errors in text recognition systems: text uncertainty that stems from ill formulated queries and gallery uncertainty that is related the ambiguity of data distribution. By capturing these sources, it becomes possible to predict when the system will make a recognition error. We propose a new OSTC benchmark and conduct extensive experiments on a wide range of data, utilizing the authorship attribution, intent and topic classification datasets. HolUE achieves 40-365% improvement in Prediction Rejection Ratio (PRR) over the quality-based SCF baseline across datasets: 365% on Yahoo Answers (0.79 vs 0.17 at FPIR 0.1), 347% on DBPedia (0.85 vs 0.19), 240% on PAN authorship attribution (0.51 vs 0.15 at FPIR 0.5), and 40% on CLINC150 intent classification (0.73 vs~0.52). We make public our code and protocols https://github.com/Leonid-Erlygin/text_uncertainty.git

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes