CLLGMay 22, 2023

Cross-functional Analysis of Generalisation in Behavioural Learning

arXiv:2305.12951v14 citations
Originality Incremental advance
AI Analysis

This addresses the problem of overestimation in model evaluation for NLP researchers, but it is incremental as it builds on existing behavioral testing frameworks.

The paper tackles the risk of models overfitting to behavioral test suites in behavioral learning, introducing BeLUGA to evaluate generalization across different granularities and showing its application on three NLP tasks with various methods.

In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioural test suite, leading to overestimation and misrepresentation of model performance -- one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioural learning considering generalisation across dimensions of different granularity levels. We optimise behaviour-specific loss functions and evaluate models on several partitions of the behavioural test suite controlled to leave out specific phenomena. An aggregate score measures generalisation to unseen functionalities (or overfitting). We use BeLUGA to examine three representative NLP tasks (sentiment analysis, paraphrase identification and reading comprehension) and compare the impact of a diverse set of regularisation and domain generalisation methods on generalisation performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes