Bridging the Interpretation Gap in Accessibility Testing: Empathetic and Legal-Aware Bug Report Generation via Large Language Models
This addresses the interpretation gap in accessibility testing for mobile applications, helping product managers and designers better understand and act on accessibility issues, though it is incremental as it builds on existing testing tools.
The paper tackles the problem of low-level technical outputs from automated accessibility testing tools being difficult for non-specialist stakeholders to interpret, by introducing HEAR, a framework that transforms these outputs into empathetic, stakeholder-oriented narratives. The results show that HEAR generates factually grounded reports and substantially improves perceived empathy, urgency, persuasiveness, and awareness of legal risk compared to raw technical logs, with a user study (N=12) demonstrating these gains.
Modern automated accessibility testing tools for mobile applications have significantly improved the detection of interface violations, yet their impact on remediation remains limited. A key reason is that existing tools typically produce low-level, technical outputs that are difficult for non-specialist stakeholders, such as product managers and designers, to interpret in terms of real user harm and compliance risk. In this paper, we present \textsc{HEAR} (\underline{H}uman-c\underline{E}ntered \underline{A}ccessibility \underline{R}eporting), a framework that bridges this interpretation gap by transforming raw accessibility bug reports into empathetic, stakeholder-oriented narratives. Given the outputs of the existing accessibility testing tool, \textsc{HEAR} first reconstructs the UI context through semantic slicing and visual grounding, then dynamically injects disability-oriented personas matched to each violation type, and finally performs multi-layer reasoning to explain the physical barrier, functional blockage, and relevant legal or compliance concerns. We evaluate the framework on real-world accessibility issues collected from four popular Android applications and conduct a user study (N=12). The results show that \textsc{HEAR} generates factually grounded reports and substantially improves perceived empathy, urgency, persuasiveness, and awareness of legal risk compared with raw technical logs, while imposing little additional cognitive burden.