Characterizing and Detecting CUDA Program Bugs
This addresses bug detection for CUDA programmers, offering a novel tool for a specific domain, though it is incremental in building on existing bug detection methods.
The paper tackles the problem of CUDA program bugs by conducting the first empirical study on 319 bugs from 5 GitHub projects, revealing categories like synchronization bugs, and introduces Simulee, a lightweight framework that detects 20 out of 27 studied synchronization bugs and 26 previously unknown ones.
While CUDA has become a major parallel computing platform and programming model for general-purpose GPU computing, CUDA-induced bug patterns have not yet been well explored. In this paper, we conduct the first empirical study to reveal important categories of CUDA program bug patterns based on 319 bugs identified within 5 popular CUDA projects in GitHub. Our findings demonstrate that CUDA-specific characteristics may cause program bugs such as synchronization bugs that are rather difficult to detect. To efficiently detect such synchronization bugs, we establish the first lightweight general CUDA bug detection framework, namely Simulee, to simulate CUDA program execution by interpreting the corresponding llvm bytecode and collecting the memory-access information to automatically detect CUDA synchronization bugs. To evaluate the effectiveness and efficiency of Simulee, we conduct a set of experiments and the experimental results suggest that Simulee can detect 20 out of the 27 studied synchronization bugs and successfully detects 26 previously unknown synchronization bugs, 10 of which have been confirmed by the developers.