SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals
This work addresses the need for better risk assessment tools for NLP models, offering a practical solution for human annotators, though it appears incremental as it builds on existing counterfactual generation methods.
The authors tackled the problem of assessing machine learning model risks for language tasks by introducing SparCAssist, a tool that evaluates models using counterfactual instances generated by replacing tokens in rational subsequences, with the result being a system that aids human annotators in deployment risk assessment and provides counterfactuals for training more robust NLP models.
We introduce SparcAssist, a general-purpose risk assessment tool for the machine learning models trained for language tasks. It evaluates models' risk by inspecting their behavior on counterfactuals, namely out-of-distribution instances generated based on the given data instance. The counterfactuals are generated by replacing tokens in rational subsequences identified by ExPred, while the replacements are retrieved using HotFlip or Masked-Language-Model-based algorithms. The main purpose of our system is to help the human annotators to assess the model's risk on deployment. The counterfactual instances generated during the assessment are the by-product and can be used to train more robust NLP models in the future.