Sjaak Brinkkemper

5papers

42citations

Novelty33%

AI Score23

Ranked #181,446 of 201,326 authors (top 90%)#30,209 in CL (top 93%)

5 Papers

CLAug 19, 2024

Summarizing long regulatory documents with a multi-step pipeline

Mika Sie, Ruby Beek, Michiel Bots et al.

Due to their length and complexity, long regulatory texts are challenging to summarize. To address this, a multi-step extractive-abstractive architecture is proposed to handle lengthy regulatory documents more effectively. In this paper, we show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies significantly depending on the model used. Specifically, the two-step architecture improves the performance of decoder-only models. For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance. This research also highlights the challenges of evaluating generated texts, as evidenced by the differing results from human and automated evaluations. Most notably, human evaluations favoured language models pretrained on legal text, while automated metrics rank general-purpose language models higher. The results underscore the importance of selecting the appropriate summarization strategy based on model architecture and context length.

CLNov 22, 2023

Enhancing Summarization Performance through Transformer-Based Prompt Engineering in Automated Medical Reporting

Daphne van Zandvoort, Laura Wiersema, Tom Huibers et al.

Customized medical prompts enable Large Language Models (LLM) to effectively address medical dialogue summarization. The process of medical reporting is often time-consuming for healthcare professionals. Implementing medical dialogue summarization techniques presents a viable solution to alleviate this time constraint by generating automated medical reports. The effectiveness of LLMs in this process is significantly influenced by the formulation of the prompt, which plays a crucial role in determining the quality and relevance of the generated reports. In this research, we used a combination of two distinct prompting strategies, known as shot prompting and pattern prompting to enhance the performance of automated medical reporting. The evaluation of the automated medical reports is carried out using the ROUGE score and a human evaluation with the help of an expert panel. The two-shot prompting approach in combination with scope and domain context outperforms other methods and achieves the highest score when compared to the human reference set by a general practitioner. However, the automated reports are approximately twice as long as the human references, due to the addition of both redundant and relevant statements that are added to the report.

CLNov 22, 2023

Comparative Experimentation of Accuracy Metrics in Automated Medical Reporting: The Case of Otitis Consultations

Wouter Faber, Renske Eline Bootsma, Tom Huibers et al.

Generative Artificial Intelligence (AI) can be used to automatically generate medical reports based on transcripts of medical consultations. The aim is to reduce the administrative burden that healthcare professionals face. The accuracy of the generated reports needs to be established to ensure their correctness and usefulness. There are several metrics for measuring the accuracy of AI generated reports, but little work has been done towards the application of these metrics in medical reporting. A comparative experimentation of 10 accuracy metrics has been performed on AI generated medical reports against their corresponding General Practitioner's (GP) medical reports concerning Otitis consultations. The number of missing, incorrect, and additional statements of the generated reports have been correlated with the metric scores. In addition, we introduce and define a Composite Accuracy Score which produces a single score for comparing the metrics within the field of automated medical reporting. Findings show that based on the correlation study and the Composite Accuracy Score, the ROUGE-L and Word Mover's Distance metrics are the preferred metrics, which is not in line with previous work. These findings help determine the accuracy of an AI generated medical report, which aids the development of systems that generate medical reports for GPs to reduce the administrative burden.

SENov 30, 2021

A Semi-automated Method for Domain-Specific Ontology Creation from Medical Guidelines

Omar ElAssy, Rik de Vendt, Fabiano Dalpiaz et al.

The automated capturing and summarization of medical consultations has the potential to reduce the administrative burden in healthcare. Consultations are structured conversations that broadly follow a guideline with a systematic examination of predefined observations and symptoms to diagnose and treat well-defined medical conditions. A key component in automated conversation summarization is the matching of the knowledge graph of the consultation transcript with a medical domain ontology for the interpretation of the consultation conversation. Existing general medical ontologies such as SNOMED CT provide a taxonomic view on the terminology, but they do not capture the essence of the guidelines that define consultations. As part of our research on medical conversation summarization, this paper puts forward a semi-automated method for generating an ontological representation of a medical guideline. The method, which takes as input the well-known SNOMED CT nomenclature and a medical guideline, maps the guidelines to a so-called Medical Guideline Ontology (MGO), a machine-processable version of the guideline that can be used for interpreting the conversation during a consultation. We illustrate our approach by discussing the creation of an MGO of the medical condition of ear canal inflammation (Otitis Externa) given the corresponding guideline from a Dutch medical authority.

SEApr 2, 2021

An Empirical Characterization of Event Sourced Systems and Their Schema Evolution -- Lessons from Industry

Michiel Overeem, Marten Spoor, Slinger Jansen et al.

Event sourced systems are increasing in popularity because they are reliable, flexible, and scalable. In this article, we point a microscope at a software architecture pattern that is rapidly gaining popularity in industry, but has not received as much attention from the scientific community. We do so through constructivist grounded theory, which proves a suitable qualitative method for extracting architectural knowledge from practitioners. Based on the discussion of 19 event sourced systems we explore the rationale for and the context of the event sourcing pattern. A description of the pattern itself and its relation to other patterns as discussed with practitioners is given. The description itself is grounded in the experience of 25 engineers, making it a reliable source for both new practitioners and scientists. We identify five challenges that practitioners experience: event system evolution, the steep learning curve, lack of available technology, rebuilding projections, and data privacy. For the first challenge of event system evolution, we uncover five tactics and solutions that support practitioners in their design choices when developing evolving event sourced systems: versioned events, weak schema, upcasting, in-place transformation, and copy-and-transform.