V2S: A Tool for Translating Video Recordings of Mobile App Usages into Replayable Scenarios
This addresses the challenge for resource-constrained mobile developers by automating scenario extraction from videos, though it is incremental as it builds on existing neural techniques.
The paper tackles the problem of manually extracting information from screen recordings for mobile app development by introducing V2S, an automated tool that translates video recordings into replayable scenarios, achieving approximately 89% accuracy in reproducing actions from 175 videos.
Screen recordings are becoming increasingly important as rich software artifacts that inform mobile application development processes. However, the amount of manual effort required to extract information from these graphical artifacts can hinder resource-constrained mobile developers. This paper presents Video2Scenario (V2S), an automated tool that processes video recordings of Android app usages, utilizes neural object detection and image classification techniques to classify the depicted user actions, and translates these actions into a replayable scenario. We conducted a comprehensive evaluation to demonstrate V2S's ability to reproduce recorded scenarios across a range of devices and a diverse set of usage cases and applications. The results indicate that, based on its performance with 175 videos depicting 3,534 GUI-based actions, V2S is accurate in reproducing $\approx$89\% of actions from collected videos.