CL CCFeb 19, 2025

A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems

Alex Duchnowski, Ellie Pavlick, Alexander Koller

arXiv:2502.13776v213.07 citationsh-index: 9EMNLP

Originality Incremental advance

AI Analysis

This highlights LLMs' reliance on training data and poor generalization, which is incremental for understanding AI reasoning limitations.

The study investigated how problem presentation affects LLM performance on NP-hard problems, finding that LLMs solve textbook versions more accurately than real-life or inverted ones, with reasoning models showing high variance.

To investigate the effect of problem presentation on LLMs' ability to solve optimization problems, we introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard problems expressed in natural language. EHOP includes problem formulations that could be found in computer science textbooks (e.g., graph coloring), versions that are dressed up as problems that could arise in real life (e.g., party planning), and variants with inverted rules. We find that state-of-the-art LLMs, across multiple prompting strategies, systematically solve textbook problems more accurately than their real-life and inverted counterparts. While reasoning models are more capable, they nonetheless show high variance across problem presentations, suggesting they lack a truly robust reasoning mechanism. We argue that this constitutes evidence that LLMs are still heavily dependent on what was seen in training and struggle to generalize to novel problems.

View on arXiv PDF

Similar