CLAILGJun 1, 2023

Estimating Semantic Similarity between In-Domain and Out-of-Domain Samples

arXiv:2306.01206v1222 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of reliably detecting OOD samples for machine learning practitioners, though it appears incremental as it builds on prior definitions and methods.

The paper tackles the problem of defining and analyzing out-of-domain (OOD) samples, proposing an unsupervised method to identify them without a trained model, with results showing promising potential across 12 datasets from 4 tasks.

Prior work typically describes out-of-domain (OOD) or out-of-distribution (OODist) samples as those that originate from dataset(s) or source(s) different from the training set but for the same task. When compared to in-domain (ID) samples, the models have been known to usually perform poorer on OOD samples, although this observation is not consistent. Another thread of research has focused on OOD detection, albeit mostly using supervised approaches. In this work, we first consolidate and present a systematic analysis of multiple definitions of OOD and OODist as discussed in prior literature. Then, we analyze the performance of a model under ID and OOD/OODist settings in a principled way. Finally, we seek to identify an unsupervised method for reliably identifying OOD/OODist samples without using a trained model. The results of our extensive evaluation using 12 datasets from 4 different tasks suggest the promising potential of unsupervised metrics in this task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes