CL LGJan 11, 2025

Sequential Classification of Aviation Safety Occurrences with Natural Language Processing

Aziida Nanyonga, Hassan Wasswa, Ugur Turhan, Oleksandra Molloy, Graham Wild

arXiv:2501.06490v114.716 citationsh-index: 10AIAA AVIATION 2023 Forum

Originality Synthesis-oriented

AI Analysis

This work addresses the need for automated classification of safety reports to aid aviation industry stakeholders in making safety-critical decisions, but it is incremental as it applies existing deep learning methods to a specific domain dataset.

The study tackled the problem of classifying aviation safety occurrences from unstructured text narratives using natural language processing models, achieving competitive performance with accuracies over 87.9% and high precision, recall, and F1 scores above 80%, 88%, and 85%, respectively.

Safety is a critical aspect of the air transport system given even slight operational anomalies can result in serious consequences. To reduce the chances of aviation safety occurrences, accidents and incidents are reported to establish the root cause, propose safety recommendations etc. However, analysis narratives of the pre-accident events are presented using human-understandable, raw, unstructured, text that a computer system cannot understand. The ability to classify and categorise safety occurrences from their textual narratives would help aviation industry stakeholders make informed safety-critical decisions. To classify and categorise safety occurrences, we applied natural language processing (NLP) and AI (Artificial Intelligence) models to process text narratives. The study aimed to answer the question. How well can the damage level caused to the aircraft in a safety occurrence be inferred from the text narrative using natural language processing. The classification performance of various deep learning models including LSTM, BLSTM, GRU, sRNN, and combinations of these models including LSTM and GRU, BLSTM+GRU, sRNN and LSTM, sRNN and BLSTM, sRNN and GRU, sRNN and BLSTM and GRU, and sRNN and LSTM and GRU was evaluated on a set of 27,000 safety occurrence reports from the NTSB. The results of this study indicate that all models investigated performed competitively well recording an accuracy of over 87.9% which is well above the random guess of 25% for a four-class classification problem. Also, the models recorded high precision, recall, and F1 scores above 80%, 88%, and 85%, respectively. sRNN slightly outperformed other single models in terms of recall (90%) and accuracy (90%) while LSTM reported slightly better performance in terms of precision (87%).

View on arXiv PDF

Similar