Large Language Models for Failure Mode Classification: An Investigation
This work addresses the need for reliability engineers to reduce manual analysis of work orders in the maintenance domain, but it is incremental as it applies existing LLM methods to a new task.
The paper tackled the problem of automating Failure Mode Classification (FMC) for maintenance by using Large Language Models (LLMs), achieving an F1 score of 0.80 with a fine-tuned GPT-3.5 model, which significantly improved over a baseline text classification model (F1=0.60) and the out-of-the-box GPT-3.5 (F1=0.46).
In this paper we present the first investigation into the effectiveness of Large Language Models (LLMs) for Failure Mode Classification (FMC). FMC, the task of automatically labelling an observation with a corresponding failure mode code, is a critical task in the maintenance domain as it reduces the need for reliability engineers to spend their time manually analysing work orders. We detail our approach to prompt engineering to enable an LLM to predict the failure mode of a given observation using a restricted code list. We demonstrate that the performance of a GPT-3.5 model (F1=0.80) fine-tuned on annotated data is a significant improvement over a currently available text classification model (F1=0.60) trained on the same annotated data set. The fine-tuned model also outperforms the out-of-the box GPT-3.5 (F1=0.46). This investigation reinforces the need for high quality fine-tuning data sets for domain-specific tasks using LLMs.