LGMLJun 9, 2021

Partial Wasserstein and Maximum Mean Discrepancy distances for bridging the gap between outlier detection and drift detection

arXiv:2106.12893v28 citations
AI Analysis

This addresses the need for reliable monitoring in ML applications to ensure performance assurances, though it appears incremental by combining existing approaches.

The paper tackles the problem of monitoring machine learning systems by bridging outlier detection and drift detection, proposing a method that compares a set of inputs to an automatically selected part of the reference distribution.

With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to check whether the inputs (or intermediates) have strayed from the distribution they were validated for, which can void the performance assurances obtained during testing. There are two common approaches for this. The, perhaps, more classical one is outlier detection or novelty detection, where, for a single input we ask whether it is an outlier, i.e. exceedingly unlikely to have originated from a reference distribution. The second, perhaps more recent approach, is to consider a larger number of inputs and compare its distribution to a reference distribution (e.g. sampled during testing). This is done under the label drift detection. In this work, we bridge the gap between outlier detection and drift detection through comparing a given number of inputs to an automatically chosen part of the reference distribution.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes