On Reality and the Limits of Language Data: Aligning LLMs with Human Norms
This addresses the problem of grounding LLMs in reality for AI researchers, but it is incremental as it builds on existing work to identify specific limitations.
The paper investigated whether large language models (LLMs) can understand the physical world using only language data by comparing GPT-3 against human norms on a reasoning test, finding strengths in verbal relations like synonymy but weaknesses in areas like affordance and spatial relations.
Recent advancements in Large Language Models (LLMs) harness linguistic associations in vast natural language data for practical applications. However, their ability to understand the physical world using only language data remains a question. After reviewing existing protocols, we explore this question using a novel and tightly controlled reasoning test (ART) and compare human norms against versions of GPT-3. Our findings highlight the categories of common-sense relations models that could learn directly from data and areas of weakness. GPT-3 offers evidence for verbal reasoning on a par with human subjects for several relations including Synonymy, Antonymy, and Default inheritance, Without reinforcement learning from human judgements, it appears GPT-3 performs at the lower end of the reference interval for Has-part and Contained-in. Weaknesses were observed also in affordance characteristics through Necessary-quality, Order-of-size and Order-of-intensity. Combining LLMs with symbolic world grounding is a promising direction to address associative learning.