LG CL MLDec 29, 2019

Dirichlet uncertainty wrappers for actionable algorithm accuracy accountability and auditability

arXiv:1912.12628v14.18 citations

Originality Synthesis-oriented

AI Analysis

This work addresses accountability and auditability challenges for auditors and the ML community in complex data products, though it is incremental as it builds on existing uncertainty estimation methods.

The authors tackled the problem of auditing black-box machine learning models by proposing a wrapper that adds an uncertainty measure to predictions, enabling decision rejection to mitigate accuracy risk. Results showed the wrapper's uncertainty measure effectively correlated with misclassifications in a sentiment analysis API scenario.

Nowadays, the use of machine learning models is becoming a utility in many applications. Companies deliver pre-trained models encapsulated as application programming interfaces (APIs) that developers combine with third party components and their own models and data to create complex data products to solve specific problems. The complexity of such products and the lack of control and knowledge of the internals of each component used cause unavoidable effects, such as lack of transparency, difficulty in auditability, and emergence of potential uncontrolled risks. They are effectively black-boxes. Accountability of such solutions is a challenge for the auditors and the machine learning community. In this work, we propose a wrapper that given a black-box model enriches its output prediction with a measure of uncertainty. By using this wrapper, we make the black-box auditable for the accuracy risk (risk derived from low quality or uncertain decisions) and at the same time we provide an actionable mechanism to mitigate that risk in the form of decision rejection; we can choose not to issue a prediction when the risk or uncertainty in that decision is significant. Based on the resulting uncertainty measure, we advocate for a rejection system that selects the more confident predictions, discarding those more uncertain, leading to an improvement in the trustability of the resulting system. We showcase the proposed technique and methodology in a practical scenario where a simulated sentiment analysis API based on natural language processing is applied to different domains. Results demonstrate the effectiveness of the uncertainty computed by the wrapper and its high correlation to bad quality predictions and misclassifications.

View on arXiv PDF

Similar