LGAIFeb 6, 2023

Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness

arXiv:2302.02628v14 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the trustworthiness issue in machine learning for practical deployment, offering an incremental improvement by providing a plug-and-play framework to enhance existing methods.

The paper tackles the problem of untrustworthy predictive confidence scores in deep learning models, which are often overconfident, by introducing a self-supervised probing framework that improves trustworthiness across tasks like misclassification detection, calibration, and out-of-distribution detection, as verified through extensive experiments on multiple benchmarks.

Trustworthy machine learning is of primary importance to the practical deployment of deep learning models. While state-of-the-art models achieve astonishingly good performance in terms of accuracy, recent literature reveals that their predictive confidence scores unfortunately cannot be trusted: e.g., they are often overconfident when wrong predictions are made, or so even for obvious outliers. In this paper, we introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model, thereby improving its trustworthiness. We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner. Extensive experiments on three trustworthiness-related tasks (misclassification detection, calibration and out-of-distribution detection) across various benchmarks verify the effectiveness of our proposed probing framework.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes