SEApr 21

ViBR: Automated Bug Replay from Video-based Reports using Vision-Language Models

Sidong Feng, Dingbang Wang, Nikola Tomic, Tingting Yu, Aldeida Aleti, Chunyang Chen

arXiv:2604.1990572.1h-index: 24

AI Analysis

This addresses a significant challenge in software maintenance by enabling automated bug replay from video reports, though it is an incremental improvement over existing methods.

The paper tackled the problem of automatically reproducing bugs from GUI screen capture videos, presenting ViBR, which uses vision-language models for action segmentation and state comparison, achieving a 72% success rate in bug reproduction.

Bug reports play a critical role in software maintenance by helping users convey encountered issues to developers. Recently, GUI screen capture videos have gained popularity as a bug reporting artifact due to their ease of use and ability to retain rich contextual information. However, automatically reproducing bugs from such recordings remains a significant challenge. Existing methods often rely on fragile image-processing heuristics, explicit touch indicators, or pre-constructed UI transition graphs, which require non-trivial instrumentation and app-specific setup. This paper presents ViBR, a lightweight and fully automated approach that reproduces bugs directly from GUI recordings. Specifically, ViBR combines CLIP-based embedding similarity for action boundary segmentation with Vision-Language Models (VLMs) for region-aware GUI state comparison and guided bug replay. Experimental results show that ViBR successfully reproduces 72% of bug recordings, significantly outperforming state-of-the-art baselines and ablation variants.

View on arXiv PDF

Similar