AICLNov 8, 2024

LLMs as Method Actors: A Model for Prompt Engineering and Architecture

arXiv:2411.05778v22 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing LLM reasoning for complex tasks like puzzle-solving, though it appears incremental as it builds on existing prompt engineering methods.

The paper tackles the problem of improving LLM performance on the challenging Connections word puzzle by introducing the 'Method Actors' mental model for prompt engineering, resulting in a significant increase in puzzle-solving accuracy from 41% with Chain of Thoughts to 86% with their approach using GPT-4o.

We introduce "Method Actors" as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; and LLM responses as performances. We apply this mental model to the task of improving LLM performance at playing Connections, a New York Times word puzzle game that prior research identified as a challenging benchmark for evaluating LLM reasoning. Our experiments with GPT-4o show that a "Method Actors" approach can significantly improve LLM performance over both a vanilla and "Chain of Thoughts" approach. A vanilla approach solves 27% of Connections puzzles in our dataset and a "Chain of Thoughts" approach solves 41% of puzzles, whereas our strongest "Method Actor" approach solves 86% of puzzles. We also test OpenAI's newest model designed specifically for complex reasoning tasks, o1-preview. When asked to solve a puzzle all at once, o1-preview solves 79% of Connections puzzles in our dataset, and when allowed to build puzzle solutions one guess at a time over multiple API calls, o1-preview solves 100% of the puzzles. Incorporating a "Method Actor" prompt architecture increases the percentage of puzzles that o1-preview solves perfectly from 76% to 87%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes