Editing a classifier by rewriting its prediction rules
This addresses the need for efficient model adaptation and bias mitigation in machine learning, though it appears incremental as it builds on existing editing techniques.
The authors tackled the problem of modifying classifier behavior by directly rewriting its prediction rules, enabling adaptation to new environments and removal of spurious features without additional data collection.
We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules. Our approach requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features. Our code is available at https://github.com/MadryLab/EditingClassifiers .