Chatbots as Problem Solvers: Playing Twenty Questions with Role Reversals
This work addresses the challenge of assessing AI reasoning capabilities for applications in education, design, and neuroscience, though it is incremental in extending existing game formats.
The paper tackles the problem of evaluating deductive reasoning in chat AI by testing ChatGPT in a multi-role twenty-questions game, finding it guesses random objects in an average of 12 questions with 94% accuracy across 16 setups.
New chat AI applications like ChatGPT offer an advanced understanding of question context and memory across multi-step tasks, such that experiments can test its deductive reasoning. This paper proposes a multi-role and multi-step challenge, where ChatGPT plays the classic twenty-questions game but innovatively switches roles from the questioner to the answerer. The main empirical result establishes that this generation of chat applications can guess random object names in fewer than twenty questions (average, 12) and correctly guess 94% of the time across sixteen different experimental setups. The research introduces four novel cases where the chatbot fields the questions, asks the questions, both question-answer roles, and finally tries to guess appropriate contextual emotions. One task that humans typically fail but trained chat applications complete involves playing bilingual games of twenty questions (English answers to Spanish questions). Future variations address direct problem-solving using a similar inquisitive format to arrive at novel outcomes deductively, such as patentable inventions or combination thinking. Featured applications of this dialogue format include complex protein designs, neuroscience metadata, and child development educational materials.