AIJun 20, 2024

Does GPT Really Get It? A Hierarchical Scale to Quantify Human vs AI's Understanding of Algorithms

arXiv:2406.14722v31 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for rigorous criteria to track AI's progress in cognitive understanding, particularly for researchers in AI and cognitive science, though it is incremental in applying existing interdisciplinary frameworks to AI.

The paper tackles the problem of quantifying whether AI truly understands algorithms by proposing a hierarchical scale based on philosophy, psychology, and education, and conducts a study comparing human subjects with GPT models to reveal similarities and differences.

As Large Language Models (LLMs) perform (and sometimes excel at) more and more complex cognitive tasks, a natural question is whether AI really understands. The study of understanding in LLMs is in its infancy, and the community has yet to incorporate well-trodden research in philosophy, psychology, and education. We initiate this, specifically focusing on understanding algorithms, and propose a hierarchy of levels of understanding. We use the hierarchy to design and conduct a study with human subjects (undergraduate and graduate students) as well as large language models (generations of GPT), revealing interesting similarities and differences. We expect that our rigorous criteria will be useful to keep track of AI's progress in such cognitive domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes