Towards Scalable Web Accessibility Audit with MLLMs as Copilots
This work addresses the challenge of scaling web accessibility audits for developers and auditors, though it appears incremental as it builds on existing WCAG-EM methodology with AI assistance.
The paper tackles the problem of resource-intensive web accessibility auditing by introducing an AI-enhanced framework called AAA, which uses a multimodal large language model copilot and graph-based sampling to enable scalable, end-to-end audits, with experiments showing that fine-tuned small-scale models can serve as capable experts.
Ensuring web accessibility is crucial for advancing social welfare, justice, and equality in digital spaces, yet the vast majority of website user interfaces remain non-compliant, due in part to the resource-intensive and unscalable nature of current auditing practices. While WCAG-EM offers a structured methodology for site-wise conformance evaluation, it involves great human efforts and lacks practical support for execution at scale. In this work, we present an auditing framework, AAA, which operationalizes WCAG-EM through a human-AI partnership model. AAA is anchored by two key innovations: GRASP, a graph-based multimodal sampling method that ensures representative page coverage via learned embeddings of visual, textual, and relational cues; and MaC, a multimodal large language model-based copilot that supports auditors through cross-modal reasoning and intelligent assistance in high-effort tasks. Together, these components enable scalable, end-to-end web accessibility auditing, empowering human auditors with AI-enhanced assistance for real-world impact. We further contribute four novel datasets designed for benchmarking core stages of the audit pipeline. Extensive experiments demonstrate the effectiveness of our methods, providing insights that small-scale language models can serve as capable experts when fine-tuned.