Extrapolation in NLP
Addresses generalization challenges in NLP for model robustness, but appears incremental as it applies existing models to a known problem.
The paper argues that models capturing global structures, rather than just local fit, facilitate easier extrapolation to examples outside the training space, and demonstrates this with the Decomposable Attention Model and word2vec.
We argue that extrapolation to examples outside the training space will often be easier for models that capture global structures, rather than just maximise their local fit to the training data. We show that this is true for two popular models: the Decomposable Attention Model and word2vec.