Concept-aware Training Improves In-context Learning Ability of Language Models
This addresses the challenge of enhancing in-context learning efficiency for language models, offering a method to reduce training data requirements while maintaining performance, though it is incremental as it builds on prior theoretical insights.
The paper tackled the problem of improving language models' in-context learning ability by proposing Concept-aware Training (CoAT), which constructs training scenarios to capture analogical reasoning concepts, resulting in models trained on only two QA datasets performing comparably to larger models trained on over 1600 tasks.
Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability, manifested in the LMs' ability to modulate their function by a task described in a natural language input. Previous work curating these models assumes that ICL emerges from vast over-parametrization or the scale of multi-task training. However, a complementary branch of recent theoretical work attributes ICL emergence to specific properties of training data and creates functional in-context learners in small-scale, synthetic settings. Inspired by recent findings on data properties driving the emergence of ICL, we propose a method to create LMs able to better utilize the in-context information, by constructing training scenarios where it is beneficial for the LM to capture the analogical reasoning concepts. We measure that data sampling of Concept-aware Training (CoAT) consistently improves models' reasoning ability. As a result, the in-context learners trained with CoAT on only two datasets of a single (QA) task perform comparably to larger models trained on 1600+ tasks.