Beneath Surface Similarity: Large Language Models Make Reasonable Scientific Analogies after Structure Abduction
This work addresses the challenge of assessing analogical reasoning in AI for cognitive science and AI research, but it is incremental as it builds on existing benchmarks and highlights limitations without major breakthroughs.
The paper tackles the problem of evaluating analogical reasoning in Large Language Models (LLMs) by introducing a task of analogical structure abduction, based on cognitive psychology, and finds that LLMs like ChatGPT and GPT-4 struggle with this task, indicating a need for improvement.
The vital role of analogical reasoning in human cognition allows us to grasp novel concepts by linking them with familiar ones through shared relational structures. Despite the attention previous research has given to word analogies, this work suggests that Large Language Models (LLMs) often overlook the structures that underpin these analogies, raising questions about the efficacy of word analogies as a measure of analogical reasoning skills akin to human cognition. In response to this, our paper introduces a task of analogical structure abduction, grounded in cognitive psychology, designed to abduce structures that form an analogy between two systems. In support of this task, we establish a benchmark called SCAR, containing 400 scientific analogies from 13 distinct fields, tailored for evaluating analogical reasoning with structure abduction. The empirical evidence underlines the continued challenges faced by LLMs, including ChatGPT and GPT-4, in mastering this task, signifying the need for future exploration to enhance their abilities.