AI CL LGMay 13, 2025

Automated Meta Prompt Engineering for Alignment with the Theory of Mind

Aaron Baughman, Rahul Agarwal, Eduardo Morales, Gozde Akay

arXiv:2505.09024v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses the problem of aligning AI-generated content with human mental models for content reviewers in sports and entertainment, though it appears incremental as it builds on existing LLM and reinforcement learning techniques.

The paper tackles the Theory of Mind alignment problem by developing a meta-prompting method that optimizes similarity between human mental expectations and LLM neural states, using agentic reinforcement learning with an LLM as a Judge. The result was 100% alignment with human expectations 53.8% of the time, with an average iteration count of 4.38, and increased content quality by extending coverage of tennis action.

We introduce a method of meta-prompting that jointly produces fluent text for complex tasks while optimizing the similarity of neural states between a human's mental expectation and a Large Language Model's (LLM) neural processing. A technique of agentic reinforcement learning is applied, in which an LLM as a Judge (LLMaaJ) teaches another LLM, through in-context learning, how to produce content by interpreting the intended and unintended generated text traits. To measure human mental beliefs around content production, users modify long form AI-generated text articles before publication at the US Open 2024 tennis Grand Slam. Now, an LLMaaJ can solve the Theory of Mind (ToM) alignment problem by anticipating and including human edits within the creation of text from an LLM. Throughout experimentation and by interpreting the results of a live production system, the expectations of human content reviewers had 100% of alignment with AI 53.8% of the time with an average iteration count of 4.38. The geometric interpretation of content traits such as factualness, novelty, repetitiveness, and relevancy over a Hilbert vector space combines spatial volume (all trait importance) with vertices alignment (individual trait relevance) enabled the LLMaaJ to optimize on Human ToM. This resulted in an increase in content quality by extending the coverage of tennis action. Our work that was deployed at the US Open 2024 has been used across other live events within sports and entertainment.

View on arXiv PDF

Similar