CLJan 14, 2025

ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving

arXiv:2501.08203v211 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of LLM reliability in noisy environments for users in math and AI applications, but it is incremental as it builds on existing robustness studies.

The researchers investigated the robustness of large language models (LLMs) to noisy inputs in math problem-solving, finding that all tested models were vulnerable to punctuation-based noise, with performance degrading as noise increased.

While Large Language Models (LLMs) have shown impressive capabilities in math problem-solving tasks, their robustness to noisy inputs is not well-studied. We propose ArithmAttack to examine how robust the LLMs are when they encounter noisy prompts that contain extra noise in the form of punctuation marks. While being easy to implement, ArithmAttack does not cause any information loss since words are not added or deleted from the context. We evaluate the robustness of eight LLMs, including LLama3, Mistral, Mathstral, and DeepSeek on noisy GSM8K and MultiArith datasets. Our experiments suggest that all the studied models show vulnerability to such noise, with more noise leading to poorer performances.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes