Adam Roman

4.9SEApr 24Code

Jakub Zelek, Jakub Ruszil, Adam Roman et al.

Prime path coverage is a powerful structural testing criterion, but generating all prime paths in a directed graph remains computationally challenging due to the potentially exponential number of them. Existing approaches typically rely on enumerating large sets of candidate paths and filtering them, leading to high computational and memory overhead. In this paper, we present a new approach to prime path generation based on a structural characterization of prime paths in terms of strongly connected components. This characterization yields non-trivial necessary conditions for valid path endpoints and reduces the problem to constrained cycle enumeration in an augmented graph. As a result, we avoid explicitly enumerating all simple paths and instead generate only feasible candidates. Building on this insight, we design a streaming algorithm that outputs prime paths incrementally, using a Johnson-style traversal as a subroutine within a significantly reduced search space. The algorithm exploits SCC boundary crossings as natural pruning checkpoints - discarding partial paths the moment they are detected to be backward extendable, eliminating entire subtrees of the search space during traversal rather than filtering completed paths post hoc. We implement our method and evaluate it on a large dataset of real-world control-flow graphs extracted from open-source C++ and Python projects. The results demonstrate that our approach consistently outperforms existing methods, while maintaining stable inter-output delay in practice.

SDSep 1, 2023

Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI's Whisper

Tomasz Wojnar, Jaroslaw Hryszko, Adam Roman

This article introduces Mi-Go, a novel testing framework aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning models across diverse real-world scenarios. The framework leverages YouTube as a rich and continuously updated data source, accounting for multiple languages, accents, dialects, speaking styles, and audio quality levels. To demonstrate the effectiveness of the framework, the Whisper model, developed by OpenAI, was employed as a test object. The tests involve using a total of 124 YouTube videos to test all Whisper model versions. The results underscore the utility of YouTube as a valuable testing platform for speech recognition models, ensuring their robustness, accuracy, and adaptability to diverse languages and acoustic conditions. Additionally, by contrasting the machine-generated transcriptions against human-made subtitles, the Mi-Go framework can help pinpoint potential misuse of YouTube subtitles, like Search Engine Optimization.

Adam Roman

2 Papers