On the Emergence and Test-Time Use of Structural Information in Large Language Models
This addresses the problem of understanding and improving structural learning in language models for researchers in AI and linguistics, but it is incremental as it builds on existing work.
The paper investigated how large language models learn and use structural information from data, finding that such learning correlates with complex reasoning tasks but test-time compositional generation remains limited.
Learning structural information from observational data is central to producing new knowledge outside the training corpus. This holds for mechanistic understanding in scientific discovery as well as flexible test-time compositional generation. We thus study how language models learn abstract structures and utilize the learnt structural information at test-time. To ensure a controlled setup, we design a natural language dataset based on linguistic structural transformations. We empirically show that the emergence of learning structural information correlates with complex reasoning tasks, and that the ability to perform test-time compositional generation remains limited.