SE AIMar 1, 2024

FlaKat: A Machine Learning-Based Categorization Framework for Flaky Tests

Shizhe Lin, Ryan Zheng He Liu, Ladan Tahvildari

arXiv:2403.01003v13.33 citationsh-index: 30

Originality Synthesis-oriented

AI Analysis

This work addresses flaky test categorization for software developers, offering an incremental improvement by applying existing ML methods to a new dataset with sampling techniques.

The paper tackles the problem of categorizing flaky tests by root cause using machine learning, proposing the FlaKat framework which achieves accurate predictions with a new evaluation metric, Flakiness Detection Capacity (FDC), that aligns with F1 scores.

Flaky tests can pass or fail non-deterministically, without alterations to a software system. Such tests are frequently encountered by developers and hinder the credibility of test suites. State-of-the-art research incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy. Moreover, the majority of automated flaky test repair solutions are designed for specific types of flaky tests. This research work proposes a novel categorization framework, called FlaKat, which uses machine-learning classifiers for fast and accurate prediction of the category of a given flaky test that reflects its root cause. Sampling techniques are applied to address the imbalance between flaky test categories in the International Dataset of Flaky Test (IDoFT). A new evaluation metric, called Flakiness Detection Capacity (FDC), is proposed for measuring the accuracy of classifiers from the perspective of information theory and provides proof for its effectiveness. The final FDC results are also in agreement with F1 score regarding which classifier yields the best flakiness classification.

View on arXiv PDF

Similar