A criterion for Artificial General Intelligence: hypothetic-deductive reasoning, tested on ChatGPT
This addresses the challenge of defining and testing AGI capabilities for AI researchers and developers, but it is incremental as it builds on existing reasoning frameworks.
The paper tackles the problem of evaluating whether advanced AI like ChatGPT can perform hypothetic-deductive reasoning as a criterion for Artificial General Intelligence, finding that the chatbot has limited capacity for such reasoning in complex scenarios.
We argue that a key reasoning skill that any advanced AI, say GPT-4, should master in order to qualify as 'thinking machine', or AGI, is hypothetic-deductive reasoning. Problem-solving or question-answering can quite generally be construed as involving two steps: hypothesizing that a certain set of hypotheses T applies to the problem or question at hand, and deducing the solution or answer from T - hence the term hypothetic-deductive reasoning. An elementary proxy of hypothetic-deductive reasoning is causal reasoning. We propose simple tests for both types of reasoning, and apply them to ChatGPT. Our study shows that, at present, the chatbot has a limited capacity for either type of reasoning, as soon as the problems considered are somewhat complex. However, we submit that if an AI would be capable of this type of reasoning in a sufficiently wide range of contexts, it would be an AGI.