CLMay 29
Wind Turbine Maintenance Log Labelling Framework: LLM-Driven Data Correction and Enrichment via Semantic Extraction of Reliability IntelligenceMax Malyi, Jonathan Shek, Alasdair McDonald et al.
As wind turbine fleets age, data-driven reliability engineering is essential to optimise their operation and maintenance for service life extension and levelised cost of energy reduction. Failure event descriptions within historical maintenance logs are a source of valuable reliability intelligence. However, they typically appear as unstructured natural language entries, rendering them inaccessible for quantitative analysis. This paper presents a novel methodology leveraging a large language model (LLM) to systematically standardise and structure maintenance logs based on their free-text descriptors. Operating on a dataset of 16,316 maintenance logs from 280 turbines monitored over nine years, the developed model-agnostic framework autonomously corrected hierarchical system codes and extracted evidence-based taxonomies of maintenance actions and failure modes. The automated pipeline successfully structured over 70% of the dataset. It resolved pervasive misclassification issues, such as isolating previously unclassified pitch system faults and restoring missing system codes, and enriched the records by applying empirical taxonomies to label specific actions taken and failure modes addressed. By using system-based log batches to construct empirical dictionaries of failure modes, observable symptoms, dominant mechanisms, and candidate causes, this approach reduces the inherent subjectivity of manual failure modes and effects analysis (FMEA). Ultimately, the methodology provides a highly scalable, cost-effective blueprint for translating large sets of qualitative field observations into quantitative reliability metrics, laying the foundation for integrated root-cause analysis across the renewable energy sector, improved FMEA, and advanced predictive maintenance.
CLSep 8, 2025Code
A Comparative Benchmark of Large Language Models for Labelling Wind Turbine Maintenance LogsMax Malyi, Jonathan Shek, Alasdair McDonald et al.
Effective Operation and Maintenance (O&M) is critical to reducing the Levelised Cost of Energy (LCOE) from wind power, yet the unstructured, free-text nature of turbine maintenance logs presents a significant barrier to automated analysis. Our paper addresses this by presenting a novel and reproducible framework for benchmarking Large Language Models (LLMs) on the task of classifying these complex industrial records. To promote transparency and encourage further research, this framework has been made publicly available as an open-source tool. We systematically evaluate a diverse suite of state-of-the-art proprietary and open-source LLMs, providing a foundational assessment of their trade-offs in reliability, operational efficiency, and model calibration. Our results quantify a clear performance hierarchy, identifying top models that exhibit high alignment with a benchmark standard and trustworthy, well-calibrated confidence scores. We also demonstrate that classification performance is highly dependent on the task's semantic ambiguity, with all models showing higher consensus on objective component identification than on interpretive maintenance actions. Given that no model achieves perfect accuracy and that calibration varies dramatically, we conclude that the most effective and responsible near-term application is a Human-in-the-Loop system, where LLMs act as a powerful assistant to accelerate and standardise data labelling for human experts, thereby enhancing O&M data quality and downstream reliability analysis.
SYNov 25, 2025
Analysis and Control of Acoustic Emissions from Marine Energy ConvertersJiaqin He, Max Malyi, Jonathan Shek
Environmental licensing related to underwater acoustic emissions represents a critical bottleneck for the commercial deployment of marine renewable energy. This study presents a control engineering framework to mitigate acoustic risks from tidal current converters without compromising project viability. A MATLAB/Simulink model of a tidal current converter was utilised to evaluate two distinct mitigation tiers: (1) architectural modification, comparing a geared induction generator against a direct-drive permanent magnet synchronous generator, and (2) operational control, analysing the impact of switching frequencies and maximum power point tracking coefficient tuning. Results indicate that lowering switching frequencies is ineffective, increasing power electronic losses by over 2000% with negligible acoustic benefit. Conversely, the direct-drive permanent magnet synchronous generator architecture reduced sound pressure levels, effectively eliminating mechanical tonal noise. For existing geared systems, de-tuning the maximum power point tracking coefficient by a factor of 1.2 reduced the probability of exceeding temporary threshold shift limits for marine mammals, with a quantified energy yield reduction of 3.58%. These findings propose a hierarchical mitigation strategy: selecting direct-drive topologies for acoustically sensitive sites, and utilising maximum power point tracking coefficient based power curtailment as a transient operational mode during critical biological migration periods.
CLSep 26, 2025
Exploratory Semantic Reliability Analysis of Wind Turbine Maintenance Logs using Large Language ModelsMax Malyi, Jonathan Shek, Andre Biscaya
A wealth of operational intelligence is locked within the unstructured free-text of wind turbine maintenance logs, a resource largely inaccessible to traditional quantitative reliability analysis. While machine learning has been applied to this data, existing approaches typically stop at classification, categorising text into predefined labels. This paper addresses the gap in leveraging modern large language models (LLMs) for more complex reasoning tasks. We introduce an exploratory framework that uses LLMs to move beyond classification and perform deep semantic analysis. We apply this framework to a large industrial dataset to execute four analytical workflows: failure mode identification, causal chain inference, comparative site analysis, and data quality auditing. The results demonstrate that LLMs can function as powerful "reliability co-pilots," moving beyond labelling to synthesise textual information and generate actionable, expert-level hypotheses. This work contributes a novel and reproducible methodology for using LLMs as a reasoning tool, offering a new pathway to enhance operational intelligence in the wind energy sector by unlocking insights previously obscured in unstructured data.