LG AI MLOct 9, 2021

The Neural Testbed: Evaluating Joint Predictions

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

arXiv:2110.04629v417.226 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for better evaluation of uncertainty quantification in machine learning, particularly for joint predictions, which is an incremental improvement in benchmarking methods.

The paper tackles the problem of evaluating agents that generate predictive distributions, introducing The Neural Testbed as an open-source benchmark for assessing both marginal and joint predictions. Results show that some Bayesian deep learning agents perform poorly on joint predictions despite accurate marginal ones, and that joint prediction quality impacts downstream decision tasks.

Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a range of agents using a simple neural network data generating process. Our results indicate that some popular Bayesian deep learning agents do not fare well with joint predictions, even when they can produce accurate marginal predictions. We also show that the quality of joint predictions drives performance in downstream decision tasks. We find these results are robust across choice a wide range of generative models, and highlight the practical importance of joint predictions to the community.

View on arXiv PDF Code

Similar