Benchmarking Neural Network Generalization for Grammar Induction
This provides a standardized tool for evaluating generalization in grammar induction, addressing a gap in previous work, though it is incremental as it builds on existing formal language methods.
The paper tackles the problem of measuring neural network generalization by introducing a benchmark based on fully specified formal languages, and finds that networks trained with a Minimum Description Length objective generalize better and with less data than those using standard loss functions.
How well do neural networks generalize? Even for grammar induction tasks, where the target generalization is fully known, previous works have left the question open, testing very limited ranges beyond the training set and using different success criteria. We provide a measure of neural network generalization based on fully specified formal languages. Given a model and a formal grammar, the method assigns a generalization score representing how well a model generalizes to unseen samples in inverse relation to the amount of data it was trained on. The benchmark includes languages such as $a^nb^n$, $a^nb^nc^n$, $a^nb^mc^{n+m}$, and Dyck-1 and 2. We evaluate selected architectures using the benchmark and find that networks trained with a Minimum Description Length objective (MDL) generalize better and using less data than networks trained using standard loss functions. The benchmark is available at https://github.com/taucompling/bliss.