SEMay 26

LLM Based Web Accessibility Repair: An Empirical Study of Detection, Remediation, and Cost

arXiv:2605.2771624.1h-index: 1
AI Analysis

For web developers and accessibility practitioners, this work provides empirical evidence that current LLMs are insufficient for complete, reliable accessibility remediation, advocating for hybrid approaches.

The paper evaluates LLM-based agents (Kimi K2.5) for web accessibility detection and repair, finding that while LLMs achieve comparable detection performance (F1 ~0.65) and produce syntactically valid fixes (99.7%) that reduce violations from 3.98 to 1.7 per file, fewer than 26% of cases are fully resolved and iterative refinement increases cost without improving outcomes.

Ensuring web accessibility at scale remains challenging because rule-based tools provide limited coverage while manual remediation is costly and error-prone. This paper evaluates large language model based agents, specifically Kimi K2.5, for automated accessibility detection and repair compared with rule-based approaches. For detection, the LLM achieves performance comparable to rule-based tools, with F1 around 0.65, strong semantic understanding with F1 of 0.83, but lower reliability for syntactic and layout-related violations. For remediation, LLM-generated fixes are syntactically valid in over 99.7 percent of cases and improve accessibility compliance in 80.2 percent of instances, reducing violations from 3.98 to 1.7 per file. However, fewer than 26 percent of cases are fully resolved, and about 30 percent of patches introduce structural changes. We also find that iterative agent-based refinement increases computational cost by 52 percent and API usage by 1.64 times without improving remediation outcomes. These findings indicate that while LLMs are effective for partial accessibility repair, they are insufficient for complete and reliable remediation. Scalable accessibility solutions require hybrid approaches that combine LLM capabilities with rule-based validation and constraint-aware correction mechanisms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes