LG SEJul 23, 2021

Estimating Predictive Uncertainty Under Program Data Distribution Shift

arXiv:2107.10989v17.511 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses the reliability of uncertainty estimation for deep learning in software engineering, but it is incremental as it extends existing methods to a new domain.

The paper tackles the problem of predictive uncertainty estimation for deep learning models under data distribution shifts in programming tasks, finding that program distribution shift degrades model performance and existing uncertainty methods have limitations on program datasets.

Deep learning (DL) techniques have achieved great success in predictive accuracy in a variety of tasks, but deep neural networks (DNNs) are shown to produce highly overconfident scores for even abnormal samples. Well-defined uncertainty indicates whether a model's output should (or should not) be trusted and thus becomes critical in real-world scenarios which typically involves shifted input distributions due to many factors. Existing uncertainty approaches assume that testing samples from a different data distribution would induce unreliable model predictions thus have higher uncertainty scores. They quantify model uncertainty by calibrating DL model's confidence of a given input and evaluate the effectiveness in computer vision (CV) and natural language processing (NLP)-related tasks. However, their methodologies' reliability may be compromised under programming tasks due to difference in data representations and shift patterns. In this paper, we first define three different types of distribution shift in program data and build a large-scale shifted Java dataset. We implement two common programming language tasks on our dataset to study the effect of each distribution shift on DL model performance. We also propose a large-scale benchmark of existing state-of-the-art predictive uncertainty on programming tasks and investigate their effectiveness under data distribution shift. Experiments show that program distribution shift does degrade the DL model performance to varying degrees and that existing uncertainty methods all present certain limitations in quantifying uncertainty on program dataset.

View on arXiv PDF Code

Similar