CLAILGMay 6, 2023

ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for Verification

arXiv:2305.04003v37 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of verifying NLP models for researchers and practitioners, but it is incremental as it adapts existing verification methods rather than introducing a new paradigm.

The paper tackles the difficulty of verifying NLP models by analyzing technical reasons and proposing methods to prepare datasets and models for existing verification tools, implementing them in the ANTONIO library and evaluating it on the R-U-A-Robot benchmark.

Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP datasets and models in a way that renders them amenable to known verification methods based on abstract interpretation. We implement these methods as a Python library called ANTONIO that links to the neural network verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP applications. We hope that, thanks to its general applicability, this work will open novel possibilities for including NLP verification problems into neural network verification competitions, and will popularise NLP problems within this community.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes