LGAIMLJun 18, 2021

Dependency Structure Misspecification in Multi-Source Weak Supervision Models

arXiv:2106.10302v19 citations
Originality Incremental advance
AI Analysis

This addresses a critical awareness gap for practitioners in weak supervision, as ignoring dependency structures can degrade downstream classifier performance, though it is incremental in analyzing a specific type of misspecification.

The paper tackles the problem of label model misspecification in data programming, specifically analyzing how over-specifying dependency structures among labeling functions leads to modeling errors, and it derives theoretical bounds and shows empirically that these errors can be substantial.

Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into \emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on test set performance of a downstream classifier are understudied. This presents a serious awareness gap to practitioners, in particular since the dependency structure among LFs is frequently ignored in field applications of DP. We analyse modeling errors due to structure over-specification. We derive novel theoretical bounds on the modeling error and empirically show that this error can be substantial, even when modeling a seemingly sensible structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes