Training Classifiers with Natural Language Explanations
This work addresses the inefficiency of data labeling for machine learning practitioners by reducing annotation time, though it is incremental as it builds on existing weak supervision and semantic parsing techniques.
The authors tackled the problem of training classifiers with limited labeled data by introducing BabbleLabble, a framework where annotators provide natural language explanations that are converted into labeling functions to generate noisy labels from unlabeled data. On three relation extraction tasks, this approach achieved comparable F1 scores while being 5-100 times faster than traditional labeling methods.
Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification). In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision. A semantic parser converts these explanations into programmatic labeling functions that generate noisy labels for an arbitrary amount of unlabeled data, which is used to train a classifier. On three relation extraction tasks, we find that users are able to train classifiers with comparable F1 scores from 5-100$\times$ faster by providing explanations instead of just labels. Furthermore, given the inherent imperfection of labeling functions, we find that a simple rule-based semantic parser suffices.