48.6SEMar 25
Bridging the Interpretation Gap in Accessibility Testing: Empathetic and Legal-Aware Bug Report Generation via Large Language ModelsRyoya Koyama, Zhiyao Wang, Devi Karolita et al.
Modern automated accessibility testing tools for mobile applications have significantly improved the detection of interface violations, yet their impact on remediation remains limited. A key reason is that existing tools typically produce low-level, technical outputs that are difficult for non-specialist stakeholders, such as product managers and designers, to interpret in terms of real user harm and compliance risk. In this paper, we present \textsc{HEAR} (\underline{H}uman-c\underline{E}ntered \underline{A}ccessibility \underline{R}eporting), a framework that bridges this interpretation gap by transforming raw accessibility bug reports into empathetic, stakeholder-oriented narratives. Given the outputs of the existing accessibility testing tool, \textsc{HEAR} first reconstructs the UI context through semantic slicing and visual grounding, then dynamically injects disability-oriented personas matched to each violation type, and finally performs multi-layer reasoning to explain the physical barrier, functional blockage, and relevant legal or compliance concerns. We evaluate the framework on real-world accessibility issues collected from four popular Android applications and conduct a user study (N=12). The results show that \textsc{HEAR} generates factually grounded reports and substantially improves perceived empathy, urgency, persuasiveness, and awareness of legal risk compared with raw technical logs, while imposing little additional cognitive burden.
AIFeb 22
Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-ExpertsToshihide Ubukata, Zhiyao Wang, Enhong Mu et al.
On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that Soft-MoE substantially expands the solvable parameter space and improves robustness compared to any single expert.
CVJan 2, 2025Code
SVFR: A Unified Framework for Generalized Video Face RestorationZhiyao Wang, Xu Chen, Chengming Xu et al.
Face Restoration (FR) is a crucial area within image and video processing, focusing on reconstructing high-quality portraits from degraded inputs. Despite advancements in image FR, video FR remains relatively under-explored, primarily due to challenges related to temporal consistency, motion artifacts, and the limited availability of high-quality video data. Moreover, traditional face restoration typically prioritizes enhancing resolution and may not give as much consideration to related tasks such as facial colorization and inpainting. In this paper, we propose a novel approach for the Generalized Video Face Restoration (GVFR) task, which integrates video BFR, inpainting, and colorization tasks that we empirically show to benefit each other. We present a unified framework, termed as stable video face restoration (SVFR), which leverages the generative and motion priors of Stable Video Diffusion (SVD) and incorporates task-specific information through a unified face restoration framework. A learnable task embedding is introduced to enhance task identification. Meanwhile, a novel Unified Latent Regularization (ULR) is employed to encourage the shared feature representation learning among different subtasks. To further enhance the restoration quality and temporal stability, we introduce the facial prior learning and the self-referred refinement as auxiliary strategies used for both training and inference. The proposed framework effectively combines the complementary strengths of these tasks, enhancing temporal coherence and achieving superior restoration quality. This work advances the state-of-the-art in video FR and establishes a new paradigm for generalized video face restoration. Code and video demo are available at https://github.com/wangzhiyaoo/SVFR.git.