ML LGJun 15, 2020

Algebraic Ground Truth Inference: Non-Parametric Estimation of Sample Errors by AI Algorithms

Andrés Corrada-Emmanuel, Edward Pantridge, Edward Zahrebelski, Aditya Chaganti, Simeon Simeonov

arXiv:2006.08312v11.4

Originality Incremental advance

AI Analysis

This addresses the challenge of monitoring classifiers in privacy-sensitive or autonomous settings where ground truth is unavailable, though it is incremental as it builds on existing non-parametric estimation methods.

The paper tackles the problem of estimating classifier errors without ground truth in production systems, using algebraic geometry to create a non-parametric estimator for ensembles of weak binary classifiers, achieving accuracy better than one part in a hundred in experiments with ground truth.

Binary classification is widely used in ML production systems. Monitoring classifiers in a constrained event space is well known. However, real world production systems often lack the ground truth these methods require. Privacy concerns may also require that the ground truth needed to evaluate the classifiers cannot be made available. In these autonomous settings, non-parametric estimators of performance are an attractive solution. They do not require theoretical models about how the classifiers made errors in any given sample. They just estimate how many errors there are in a sample of an industrial or robotic datastream. We construct one such non-parametric estimator of the sample errors for an ensemble of weak binary classifiers. Our approach uses algebraic geometry to reformulate the self-assessment problem for ensembles of binary classifiers as an exact polynomial system. The polynomial formulation can then be used to prove - as an algebraic geometry algorithm - that no general solution to the self-assessment problem is possible. However, specific solutions are possible in settings where the engineering context puts the classifiers close to independent errors. The practical utility of the method is illustrated on a real-world dataset from an online advertising campaign and a sample of common classification benchmarks. The accuracy estimators in the experiments where we have ground truth are better than one part in a hundred. The online advertising campaign data, where we do not have ground truth data, is verified by an internal consistency approach whose validity we conjecture as an algebraic geometry theorem. We call this approach - algebraic ground truth inference.

View on arXiv PDF

Similar