"What's my model inside of?": Exploring the role of environments for grounded natural language understanding
This work addresses the challenge of improving data collection and model development in grounded NLP, with potential applications for knowledge workers like scientists, though it appears incremental in building on existing ecological and embodied cognitive theories.
The thesis tackled the problem of grounded natural language understanding by exploring environment design, developing novel training and annotation methods using text-based game environments, and proposing a new benchmark for commonsense reasoning in large language models. It resulted in Breakpoint Transformers, an approach for modeling intermediate semantic information in long texts, and a design for AI-augmented social thinking environments for knowledge workers.
In contrast to classical cognitive science which studied brains in isolation, ecological approaches focused on the role of the body and environment in shaping cognition. Similarly, in this thesis we adopt an ecological approach to grounded natural language understanding (NLU) research. Grounded language understanding studies language understanding systems situated in the context of events, actions and precepts in naturalistic/simulated virtual environments. Where classic research tends to focus on designing new models and optimization methods while treating environments as given, we explore the potential of environment design for improving data collection and model development. We developed novel training and annotation approaches for procedural text understanding based on text-based game environments. We also drew upon embodied cognitive linguistics literature to propose a roadmap for grounded NLP research, and to inform the development of a new benchmark for measuring the progress of large language models on challenging commonsense reasoning tasks. We leveraged the richer supervision provided by text-based game environments to develop Breakpoint Transformers, a novel approach to modeling intermediate semantic information in long narrative or procedural texts. Finally, we integrated theories on the role of environments in collective human intelligence to propose a design for AI-augmented "social thinking environments" for knowledge workers like scientists.