Reverse Supervision at Scale: Exponential Search Meets the Economics of Annotation
This addresses the problem of reducing annotation costs in machine learning, but it is incremental as it reinforces existing needs for human oversight.
The paper analyzes a reversed-supervision strategy that searches over labelings of a large unlabeled set to minimize error on a small labeled set, finding that exponential search complexity persists even with fast computation, so human input remains essential to ground learning in task semantics.
We analyze a reversed-supervision strategy that searches over labelings of a large unlabeled set \(B\) to minimize error on a small labeled set \(A\). The search space is \(2^n\), and the resulting complexity remains exponential even under large constant-factor speedups (e.g., quantum or massively parallel hardware). Consequently, arbitrarily fast -- but not exponentially faster -- computation does not obviate the need for informative labels or priors. In practice, the machine learning pipeline still requires an initial human contribution: specifying the objective, defining classes, and providing a seed set of representative annotations that inject inductive bias and align models with task semantics. Synthetic labels from generative AI can partially substitute provided their quality is human-grade and anchored by a human-specified objective, seed supervision, and validation. In this view, generative models function as \emph{label amplifiers}, leveraging small human-curated cores via active, semi-supervised, and self-training loops, while humans retain oversight for calibration, drift detection, and failure auditing. Thus, extreme computational speed reduces wall-clock time but not the fundamental supervision needs of learning; initial human (or human-grade) input remains necessary to ground the system in the intended task.