LG MLAug 12, 2024

Overcoming Imbalanced Safety Data Using Extended Accident Triangle

Kailai Sun, Tianxiang Lan, Yang Miang Goh, Yueng-Hsiang Huang

arXiv:2408.07094v14.65 citationsh-index: 38Has Code

Originality Incremental advance

AI Analysis

This work addresses a common data imbalance issue in safety analytics for high-risk industries like construction and trucking, but it is incremental as it builds on existing oversampling techniques.

The paper tackles the problem of imbalanced datasets in safety analytics, which leads to inaccurate predictions and poor management decisions, by proposing three oversampling methods based on the extended accident triangle theory, resulting in robust improvements across different machine learning algorithms.

There is growing interest in using safety analytics and machine learning to support the prevention of workplace incidents, especially in high-risk industries like construction and trucking. Although existing safety analytics studies have made remarkable progress, they suffer from imbalanced datasets, a common problem in safety analytics, resulting in prediction inaccuracies. This can lead to management problems, e.g., incorrect resource allocation and improper interventions. To overcome the imbalanced data problem, we extend the theory of accident triangle to claim that the importance of data samples should be based on characteristics such as injury severity, accident frequency, and accident type. Thus, three oversampling methods are proposed based on assigning different weights to samples in the minority class. We find robust improvements among different machine learning algorithms. For the lack of open-source safety datasets, we are sharing three imbalanced datasets, e.g., a 9-year nationwide construction accident record dataset, and their corresponding codes.

View on arXiv PDF

Similar