CLAIApr 3, 2025

LLMs as Deceptive Agents: How Role-Based Prompting Induces Semantic Ambiguity in Puzzle Tasks

arXiv:2504.02254v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This research addresses ethical concerns for deploying autonomous language systems in educational and entertainment contexts by highlighting their potential to mislead users through ambiguity.

The study investigated how large language models (LLMs) acting as autonomous agents generate deceptive puzzles by exploiting semantic ambiguity, finding that adversarial prompting significantly increases ambiguity, raising cognitive load and reducing fairness in puzzle-solving tasks.

Recent advancements in Large Language Models (LLMs) have not only showcased impressive creative capabilities but also revealed emerging agentic behaviors that exploit linguistic ambiguity in adversarial settings. In this study, we investigate how an LLM, acting as an autonomous agent, leverages semantic ambiguity to generate deceptive puzzles that mislead and challenge human users. Inspired by the popular puzzle game "Connections", we systematically compare puzzles produced through zero-shot prompting, role-injected adversarial prompts, and human-crafted examples, with an emphasis on understanding the underlying agent decision-making processes. Employing computational analyses with HateBERT to quantify semantic ambiguity, alongside subjective human evaluations, we demonstrate that explicit adversarial agent behaviors significantly heighten semantic ambiguity -- thereby increasing cognitive load and reducing fairness in puzzle solving. These findings provide critical insights into the emergent agentic qualities of LLMs and underscore important ethical considerations for evaluating and safely deploying autonomous language systems in both educational technologies and entertainment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes