DBOct 31, 2024
Case ID detection based on time series data -- the mining use caseEdyta Brzychczy, Tomasz Pełech-Pilichowski, Ziemowit Dworakowski
Process mining gains increasing popularity in business process analysis, also in heavy industry. It requires a specific data format called an event log, with the basic structure including a case identifier (case ID), activity (event) name, and timestamp. In the case of industrial processes, data is very often provided by a monitoring system as time series of low level sensor readings. This data cannot be directly used for process mining since there is no explicit marking of activities in the event log, and sometimes, case ID is not provided. We propose a novel rule-based algorithm for identification patterns, based on the identification of significant changes in short-term mean values of selected variable to detect case ID. We present our solution on the mining use case. We compare computed results (identified patterns) with expert labels of the same dataset. Experiments show that the developed algorithm in the most of the cases correctly detects IDs in datasets with and without outliers reaching F1 score values: 96.8% and 97% respectively. We also evaluate our algorithm on dataset from manufacturing domain reaching value 92.6% for F1 score.
AINov 19, 2025
Balancing Natural Language Processing Accuracy and Normalisation in Extracting Medical InsightsPaulina Tworek, Miłosz Bargieł, Yousef Khan et al.
Extracting structured medical insights from unstructured clinical text using Natural Language Processing (NLP) remains an open challenge in healthcare, particularly in non-English contexts where resources are scarce. This study presents a comparative analysis of NLP low-compute rule-based methods and Large Language Models (LLMs) for information extraction from electronic health records (EHR) obtained from the Voivodeship Rehabilitation Hospital for Children in Ameryka, Poland. We evaluate both approaches by extracting patient demographics, clinical findings, and prescribed medications while examining the effects of lack of text normalisation and translation-induced information loss. Results demonstrate that rule-based methods provide higher accuracy in information retrieval tasks, particularly for age and sex extraction. However, LLMs offer greater adaptability and scalability, excelling in drug name recognition. The effectiveness of the LLMs was compared with texts originally in Polish and those translated into English, assessing the impact of translation. These findings highlight the trade-offs between accuracy, normalisation, and computational cost when deploying NLP in healthcare settings. We argue for hybrid approaches that combine the precision of rule-based systems with the adaptability of LLMs, offering a practical path toward more reliable and resource-efficient clinical NLP in real-world hospitals.