Learning Rules-First Classifiers
This work addresses the problem of improving classifier reliability and interpretability for AI systems, particularly in domains like sentiment analysis, but it is incremental as it builds on existing concepts of rule-based and linear classification.
The paper tackled the problem of classifiers failing in cases where humans can easily justify classifications, by focusing on a setting where labels are predictable by certain features or rules, otherwise by a linear classifier. They defined a hypothesis class, determined its sample complexity, and developed an efficient algorithm with near-optimal sample complexity, demonstrating efficacy in accuracy and interpretability on synthetic and sentiment analysis data.
Complex classifiers may exhibit "embarassing" failures in cases where humans can easily provide a justified classification. Avoiding such failures is obviously of key importance. In this work, we focus on one such setting, where a label is perfectly predictable if the input contains certain features, or rules, and otherwise it is predictable by a linear classifier. We define a hypothesis class that captures this notion and determine its sample complexity. We also give evidence that efficient algorithms cannot achieve this sample complexity. We then derive a simple and efficient algorithm and show that its sample complexity is close to optimal, among efficient algorithms. Experiments on synthetic and sentiment analysis data demonstrate the efficacy of the method, both in terms of accuracy and interpretability.