LGMLDec 2, 2018

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

arXiv:1812.00417v2155 citations
Originality Highly original
AI Analysis

This addresses the labeling bottleneck for machine learning applications at industrial scale, particularly at Google, by deploying weak supervision in a practical setting.

The paper tackles the high cost of labeling training data by using organizational knowledge as weak supervision, reducing development time and cost by an order of magnitude. It introduces Snorkel DryBell, which creates classifiers comparable to those trained with tens of thousands of hand-labeled examples and achieves a 52% performance improvement on average.

Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. We present a first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting. Snorkel DryBell builds on the Snorkel framework, extending it in three critical aspects: flexible, template-based ingestion of diverse organizational knowledge, cross-feature production serving, and scalable, sampling-free execution. On three classification tasks at Google, we find that Snorkel DryBell creates classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, converts non-servable organizational resources to servable models for an average 52% performance improvement, and executes over millions of data points in tens of minutes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes