LGMLDec 23, 2023

Optimal Decision Tree and Adaptive Submodular Ranking with Noisy Outcomes

arXiv:2312.15357v24 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental limitation in active learning for applications like clinical trials where noisy outcomes are common, though it is incremental as it extends existing deterministic methods to handle noise.

The paper tackles the Optimal Decision Tree problem in active learning where test outcomes can be noisy, even persistently, by developing approximation algorithms with near-optimal guarantees that degrade continuously with noise levels, and numerically shows costs close to the information-theoretic minimum in applications like toxic chemical identification and linear classifier learning.

In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.e., decision tree) that identifies the true hypothesis. This optimization problem has been extensively studied under the assumption that each test generates a deterministic outcome. However, in numerous applications, for example, clinical trials, the outcomes may be uncertain, which renders the ideas from the deterministic setting invalid. In this work, we study a fundamental variant of the ODT problem in which some test outcomes are noisy, even in the more general case where the noise is persistent, i.e., repeating a test gives the same noisy output. Our approximation algorithms provide guarantees that are nearly best possible and hold for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades continuously with this number. We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classifiers, and observed that our algorithms have costs very close to the information-theoretic minimum.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes