SE LG PLMay 8, 2023

Modelling Concurrency Bugs Using Machine Learning

arXiv:2305.05531v1

Originality Synthesis-oriented

AI Analysis

This work addresses software safety and security for programmers by potentially improving debugging efficiency, but it is incremental as it focuses on comparing existing methods rather than introducing a new paradigm.

The paper tackles the problem of automatically detecting concurrency bugs in parallel programs by comparing machine learning approaches, using a synthetic dataset to simulate real-life programs and validating hypotheses about model limitations.

Artificial Intelligence has gained a lot of traction in the recent years, with machine learning notably starting to see more applications across a varied range of fields. One specific machine learning application that is of interest to us is that of software safety and security, especially in the context of parallel programs. The issue of being able to detect concurrency bugs automatically has intrigued programmers for a long time, as the added layer of complexity makes concurrent programs more prone to failure. The development of such automatic detection tools provides considerable benefits to programmers in terms of saving time while debugging, as well as reducing the number of unexpected bugs. We believe machine learning may help achieve this goal by providing additional advantages over current approaches, in terms of both overall tool accuracy as well as programming language flexibility. However, due to the presence of numerous challenges specific to the machine learning approach (correctly labelling a sufficiently large dataset, finding the best model types/architectures and so forth), we have to approach each issue of developing such a tool separately. Therefore, the focus of this project is on comparing both common and recent machine learning approaches. We abstract away the complexity of procuring a labelled dataset of concurrent programs under the form of a synthetic dataset that we define and generate with the scope of simulating real-life (concurrent) programs. We formulate hypotheses about fundamental limits of various machine learning model types which we then validate by running extensive tests on our synthetic dataset. We hope that our findings provide more insight in the advantages and disadvantages of various model types when modelling programs using machine learning, as well as any other related field (e.g. NLP).

View on arXiv PDF

Similar