CLOct 5, 2022
Large Language Models are Pretty Good Zero-Shot Video Game Bug DetectorsMohammad Reza Taesiri, Finlay Macklon, Yihe Wang et al.
Video game testing requires game-specific knowledge as well as common sense reasoning about the events in the game. While AI-driven agents can satisfy the first requirement, it is not yet possible to meet the second requirement automatically. Therefore, video game testing often still relies on manual testing, and human testers are required to play the game thoroughly to detect bugs. As a result, it is challenging to fully automate game testing. In this study, we explore the possibility of leveraging the zero-shot capabilities of large language models for video game bug detection. By formulating the bug detection problem as a question-answering task, we show that large language models can identify which event is buggy in a sequence of textual descriptions of events from a game. To this end, we introduce the GameBugDescriptions benchmark dataset, which consists of 167 buggy gameplay videos and a total of 334 question-answer pairs across 8 games. We extensively evaluate the performance of six models across the OPT and InstructGPT large language model families on our benchmark dataset. Our results show promising results for employing language models to detect video game bugs. With the proper prompting technique, we could achieve an accuracy of 70.66%, and on some video games, up to 78.94%. Our code, evaluation data and the benchmark can be found on https://asgaardlab.github.io/LLMxBugs
CVMar 21, 2022
CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learningMohammad Reza Taesiri, Finlay Macklon, Cor-Paul Bezemer
Gameplay videos contain rich information about how players interact with the game and how the game responds. Sharing gameplay videos on social media platforms, such as Reddit, has become a common practice for many players. Often, players will share gameplay videos that showcase video game bugs. Such gameplay videos are software artifacts that can be utilized for game testing, as they provide insight for bug analysis. Although large repositories of gameplay videos exist, parsing and mining them in an effective and structured fashion has still remained a big challenge. In this paper, we propose a search method that accepts any English text query as input to retrieve relevant videos from large repositories of gameplay videos. Our approach does not rely on any external information (such as video metadata); it works solely based on the content of the video. By leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the $\texttt{GamePhysics}$ dataset consisting of 26,954 videos from 1,873 games, that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple queries, compound queries, and bug queries, indicating that our approach is useful for object and event detection in gameplay videos. An example application of our approach is as a gameplay video search engine to aid in reproducing video game bugs. Please visit the following link for the code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/
SEJan 18, 2022Code
A Taxonomy of Testable HTML5 Canvas IssuesFinlay Macklon, Markos Viggiato, Natalia Romanova et al.
The HTML5 <canvas> is widely used to display high quality graphics in web applications. However, the combination of web, GUI, and visual techniques that are required to build <canvas> applications, together with the lack of testing and debugging tools, makes developing such applications very challenging. To help direct future research on testing <canvas> applications, in this paper we present a taxonomy of testable <canvas> issues. First, we extracted 2,403 <canvas>-related issue reports from 123 open-source GitHub projects that use the HTML5 <canvas>. Second, we constructed our taxonomy by manually classifying a random sample of 332 issue reports. Our manual classification identified five broad categories of testable <canvas> issues, such as Visual and Performance issues. We found that Visual issues are the most frequent (35%), while Performance issues are relatively infrequent (5%). We also found that many testable <canvas> issues that present themselves visually on the <canvas> are actually caused by other components of the web application. Our taxonomy of testable <canvas> issues can be used to steer future research into <canvas> issues and testing.