LGAug 27, 2021

A method of supervised learning from conflicting data with hidden contexts

arXiv:2108.12113v31 citations
Originality Highly original
AI Analysis

This addresses a fundamental limitation in supervised learning for scenarios with hidden contexts, such as open-ended training settings, offering a novel solution to improve robustness in applications like domain adaptation or noisy data environments.

The paper tackles the problem of supervised learning when training data comes from multiple hidden domains with conflicting input-output relationships, which standard methods fail to handle. It proposes LEAF, a method that learns to allocate data to different predictive models, and validates it theoretically and empirically on synthetic and real-world tasks.

Conventional supervised learning assumes a stable input-output relationship. However, this assumption fails in open-ended training settings where the input-output relationship depends on hidden contexts. In this work, we formulate a more general supervised learning problem in which training data is drawn from multiple unobservable domains, each potentially exhibiting distinct input-output maps. This inherent conflict in data renders standard empirical risk minimization training ineffective. To address this challenge, we propose a method LEAF that introduces an allocation function, which learns to assign conflicting data to different predictive models. We establish a connection between LEAF and a variant of the Expectation-Maximization algorithm, allowing us to derive an analytical expression for the allocation function. Finally, we provide a theoretical analysis of LEAF and empirically validate its effectiveness on both synthetic and real-world tasks involving conflicting data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes