Milad Moradi

h-index13

23papers

1,813citations

Novelty31%

AI Score45

Ranked #41,799 of 194,257 authors (top 22%)#8,456 in CL (top 27%)

23 Papers

9.9CLJan 27, 2023Code

ThoughtSource: A central hub for large language model reasoning data

Simon Ott, Konstantin Hebenstreit, Valentin Liévin et al.

Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results across a wide range of tasks. LLMs are still limited, however, in that they frequently fail at complex reasoning, their reasoning processes are opaque, they are prone to 'hallucinate' facts, and there are concerns about their underlying biases. Letting models verbalize reasoning steps as natural language, a technique known as chain-of-thought prompting, has recently been proposed as a way to address some of these issues. Here we present ThoughtSource, a meta-dataset and software library for chain-of-thought (CoT) reasoning. The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs, enabling empirical evaluations, and providing training data. This first release of ThoughtSource integrates seven scientific/medical, three general-domain and five math word question answering datasets.

32.2CLApr 25, 2022Code

A global analysis of metrics used for measuring performance in natural language processing

Kathrin Blagec, Georg Dorffner, Milad Moradi et al.

Measuring the performance of natural language processing models is challenging. Traditionally used metrics, such as BLEU and ROUGE, originally devised for machine translation and summarization, have been shown to suffer from low correlation with human judgment and a lack of transferability to other tasks and languages. In the past 15 years, a wide range of alternative metrics have been proposed. However, it is unclear to what extent this has had an impact on NLP benchmarking efforts. Here we provide the first large-scale cross-sectional analysis of metrics used for measuring performance in natural language processing. We curated, mapped and systematized more than 3500 machine learning model performance results from the open repository 'Papers with Code' to enable a global and comprehensive analysis. Our results suggest that the large majority of natural language processing metrics currently used have properties that may result in an inadequate reflection of a models' performance. Furthermore, we found that ambiguities and inconsistencies in the reporting of metrics may lead to difficulties in interpreting and comparing model performances, impairing transparency and reproducibility in NLP research.

6.8CVMar 30, 2023Code

Model-agnostic explainable artificial intelligence for object detection in image data

Milad Moradi, Ke Yan, David Colwell et al.

In recent years, deep neural networks have been widely used for building high-performance Artificial Intelligence (AI) systems for computer vision applications. Object detection is a fundamental task in computer vision, which has been greatly progressed through developing large and intricate AI models. However, the lack of transparency is a big challenge that may not allow the widespread adoption of these models. Explainable artificial intelligence is a field of research where methods are developed to help users understand the behavior, decision logics, and vulnerabilities of AI systems. Previously, few explanation methods were developed for object detection based on random masking. However, random masks may raise some issues regarding the actual importance of pixels within an image. In this paper, we design and implement a black-box explanation method named Black-box Object Detection Explanation by Masking (BODEM) through adopting a hierarchical random masking approach for object detection systems. We propose a hierarchical random masking framework in which coarse-grained masks are used in lower levels to find salient regions within an image, and fine-grained mask are used to refine the salient regions in higher levels. Experimentations on various object detection datasets and models showed that BODEM can effectively explain the behavior of object detectors. Moreover, our method outperformed Detector Randomized Input Sampling for Explanation (D-RISE) and Local Interpretable Model-agnostic Explanations (LIME) with respect to different quantitative measures of explanation effectiveness. The experimental results demonstrate that BODEM can be an effective method for explaining and validating object detection systems in black-box testing scenarios.

10.6SEApr 8Code

An empirical study of LoRA-based fine-tuning of large language models for automated test case generation

Milad Moradi, Ke Yan, David Colwell et al.

Automated test case generation from natural language requirements remains a challenging problem in software engineering due to the ambiguity of requirements and the need to produce structured, executable test artifacts. Recent advances in LLMs have shown promise in addressing this task; however, their effectiveness depends on task-specific adaptation and efficient fine-tuning strategies. In this paper, we present a comprehensive empirical study on the use of parameter-efficient fine-tuning, specifically LoRA, for requirement-based test case generation. We evaluate multiple LLM families, including open-source and proprietary models, under a unified experimental pipeline. The study systematically explores the impact of key LoRA hyperparameters, including rank, scaling factor, and dropout, on downstream performance. We propose an automated evaluation framework based on GPT-4o, which assesses generated test cases across nine quality dimensions. Experimental results demonstrate that LoRA-based fine-tuning significantly improves the performance of all open-source models, with Ministral-8B achieving the best results among them. Furthermore, we show that a fine-tuned 8B open-source model can achieve performance comparable to pre-fine-tuned GPT-4.1 models, highlighting the effectiveness of parameter-efficient adaptation. While GPT-4.1 models achieve the highest overall performance, the performance gap between proprietary and open-source models is substantially reduced after fine-tuning. These findings provide important insights into model selection, fine-tuning strategies, and evaluation methods for automated test generation. In particular, they demonstrate that cost-efficient, locally deployable open-source models can serve as viable alternatives to proprietary systems when combined with well-designed fine-tuning approaches.

4.7CVApr 8

Multi-modal user interface control detection using cross-attention

Milad Moradi, Ke Yan, David Colwell et al.

Detecting user interface (UI) controls from software screenshots is a critical task for automated testing, accessibility, and software analytics, yet it remains challenging due to visual ambiguities, design variability, and the lack of contextual cues in pixel-only approaches. In this paper, we introduce a novel multi-modal extension of YOLOv5 that integrates GPT-generated textual descriptions of UI images into the detection pipeline through cross-attention modules. By aligning visual features with semantic information derived from text embeddings, our model enables more robust and context-aware UI control detection. We evaluate the proposed framework on a large dataset of over 16,000 annotated UI screenshots spanning 23 control classes. Extensive experiments compare three fusion strategies, i.e. element-wise addition, weighted sum, and convolutional fusion, demonstrating consistent improvements over the baseline YOLOv5 model. Among these, convolutional fusion achieved the strongest performance, with significant gains in detecting semantically complex or visually ambiguous classes. These results establish that combining visual and textual modalities can substantially enhance UI element detection, particularly in edge cases where visual information alone is insufficient. Our findings open promising opportunities for more reliable and intelligent tools in software testing, accessibility support, and UI analytics, setting the stage for future research on efficient, robust, and generalizable multi-modal detection systems.

0.3CLAug 6, 2019Code

Clustering of Deep Contextualized Representations for Summarization of Biomedical Texts

Milad Moradi, Matthias Samwald

In recent years, summarizers that incorporate domain knowledge into the process of text summarization have outperformed generic methods, especially for summarization of biomedical texts. However, construction and maintenance of domain knowledge bases are resource-intense tasks requiring significant manual annotation. In this paper, we demonstrate that contextualized representations extracted from the pre-trained deep language model BERT, can be effectively used to measure the similarity between sentences and to quantify the informative content. The results show that our BERT-based summarizer can improve the performance of biomedical summarization. Although the summarizer does not use any sources of domain knowledge, it can capture the context of sentences more accurately than the comparison methods. The source code and data are available at https://github.com/BioTextSumm/BERT-based-Summ.

7.3AIApr 18, 2024

A critical review of methods and challenges in large language models

Milad Moradi, Ke Yan, David Colwell et al.

This critical review provides an in-depth analysis of Large Language Models (LLMs), encompassing their foundational principles, diverse applications, and advanced training methodologies. We critically examine the evolution from Recurrent Neural Networks (RNNs) to Transformer models, highlighting the significant advancements and innovations in LLM architectures. The review explores state-of-the-art techniques such as in-context learning and various fine-tuning approaches, with an emphasis on optimizing parameter efficiency. We also discuss methods for aligning LLMs with human preferences, including reinforcement learning frameworks and human feedback mechanisms. The emerging technique of retrieval-augmented generation, which integrates external knowledge into LLMs, is also evaluated. Additionally, we address the ethical considerations of deploying LLMs, stressing the importance of responsible and mindful application. By identifying current gaps and suggesting future research directions, this review provides a comprehensive and critical overview of the present state and potential advancements in LLMs. This work serves as an insightful guide for researchers and practitioners in artificial intelligence, offering a unified perspective on the strengths, limitations, and future prospects of LLMs.

3.3SEMay 1, 2024

Artificial intelligence for context-aware visual change detection in software test automation

Milad Moradi, Ke Yan, David Colwell et al.

Automated software testing is integral to the software development process, streamlining workflows and ensuring product reliability. Visual testing, particularly for user interface (UI) and user experience (UX) validation, plays a vital role in maintaining software quality. However, conventional techniques such as pixel-wise comparison and region-based visual change detection often fail to capture contextual similarities, subtle variations, and spatial relationships between UI elements. In this paper, we propose a novel graph-based approach for context-aware visual change detection in software test automation. Our method leverages a machine learning model (YOLOv5) to detect UI controls from software screenshots and constructs a graph that models their contextual and spatial relationships. This graph structure is then used to identify correspondences between UI elements across software versions and to detect meaningful changes. The proposed method incorporates a recursive similarity computation that combines structural, visual, and textual cues, offering a robust and holistic model of UI changes. We evaluate our approach on a curated dataset of real-world software screenshots and demonstrate that it reliably detects both simple and complex UI changes. Our method significantly outperforms pixel-wise and region-based baselines, especially in scenarios requiring contextual understanding. We also discuss current limitations related to dataset diversity, baseline complexity, and model generalization, and outline planned future improvements. Overall, our work advances the state of the art in visual change detection and provides a practical solution for enhancing the reliability and maintainability of evolving software interfaces.

0.5CLDec 24, 2023

Multi-level biomedical NER through multi-granularity embeddings and enhanced labeling

Fahime Shahrokh, Nasser Ghadiri, Rasoul Samani et al.

Biomedical Named Entity Recognition (NER) is a fundamental task of Biomedical Natural Language Processing for extracting relevant information from biomedical texts, such as clinical records, scientific publications, and electronic health records. The conventional approaches for biomedical NER mainly use traditional machine learning techniques, such as Conditional Random Fields and Support Vector Machines or deep learning-based models like Recurrent Neural Networks and Convolutional Neural Networks. Recently, Transformer-based models, including BERT, have been used in the domain of biomedical NER and have demonstrated remarkable results. However, these models are often based on word-level embeddings, limiting their ability to capture character-level information, which is effective in biomedical NER due to the high variability and complexity of biomedical texts. To address these limitations, this paper proposes a hybrid approach that integrates the strengths of multiple models. In this paper, we proposed an approach that leverages fine-tuned BERT to provide contextualized word embeddings, a pre-trained multi-channel CNN for character-level information capture, and following by a BiLSTM + CRF for sequence labelling and modelling dependencies between the words in the text. In addition, also we propose an enhanced labelling method as part of pre-processing to enhance the identification of the entity's beginning word and thus improve the identification of multi-word entities, a common challenge in biomedical NER. By integrating these models and the pre-processing method, our proposed model effectively captures both contextual information and detailed character-level information. We evaluated our model on the benchmark i2b2/2010 dataset, achieving an F1-score of 90.11. These results illustrate the proficiency of our proposed model in performing biomedical Named Entity Recognition.

7.7AIFeb 25, 2022

Deep Learning, Natural Language Processing, and Explainable Artificial Intelligence in the Biomedical Domain

Milad Moradi, Matthias Samwald

In this article, we first give an introduction to artificial intelligence and its applications in biology and medicine in Section 1. Deep learning methods are then described in Section 2. We narrow down the focus of the study on textual data in Section 3, where natural language processing and its applications in the biomedical domain are described. In Section 4, we give an introduction to explainable artificial intelligence and discuss the importance of explainability of artificial intelligence systems, especially in the biomedical domain.

1.4CLNov 16, 2021Code

Improving the robustness and accuracy of biomedical language models through adversarial training

Milad Moradi, Matthias Samwald

Deep transformer neural network models have improved the predictive accuracy of intelligent text processing systems in the biomedical domain. They have obtained state-of-the-art performance scores on a wide variety of biomedical and clinical Natural Language Processing (NLP) benchmarks. However, the robustness and reliability of these models has been less explored so far. Neural NLP models can be easily fooled by adversarial samples, i.e. minor changes to input that preserve the meaning and understandability of the text but force the NLP system to make erroneous decisions. This raises serious concerns about the security and trust-worthiness of biomedical NLP systems, especially when they are intended to be deployed in real-world use cases. We investigated the robustness of several transformer neural language models, i.e. BioBERT, SciBERT, BioMed-RoBERTa, and Bio-ClinicalBERT, on a wide range of biomedical and clinical text processing tasks. We implemented various adversarial attack methods to test the NLP systems in different attack scenarios. Experimental results showed that the biomedical NLP models are sensitive to adversarial samples; their performance dropped in average by 21 and 18.9 absolute percent on character-level and word-level adversarial noise, respectively. Conducting extensive adversarial training experiments, we fine-tuned the NLP models on a mixture of clean samples and adversarial inputs. Results showed that adversarial training is an effective defense mechanism against adversarial noise; the models robustness improved in average by 11.3 absolute percent. In addition, the models performance on clean data increased in average by 2.4 absolute present, demonstrating that adversarial training can boost generalization abilities of biomedical NLP systems.

5.1CLSep 6, 2021Code

GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain

Milad Moradi, Kathrin Blagec, Florian Haberl et al.

Deep neural language models have set new breakthroughs in many tasks of Natural Language Processing (NLP). Recent work has shown that deep transformer language models (pretrained on large amounts of texts) can achieve high levels of task-specific few-shot performance comparable to state-of-the-art models. However, the ability of these large language models in few-shot transfer learning has not yet been explored in the biomedical domain. We investigated the performance of two powerful transformer language models, i.e. GPT-3 and BioBERT, in few-shot settings on various biomedical NLP tasks. The experimental results showed that, to a great extent, both the models underperform a language model fine-tuned on the full training data. Although GPT-3 had already achieved near state-of-the-art results in few-shot knowledge transfer on open-domain NLP tasks, it could not perform as effectively as BioBERT, which is orders of magnitude smaller than GPT-3. Regarding that BioBERT was already pretrained on large biomedical text corpora, our study suggests that language models may largely benefit from in-domain pretraining in task-specific few-shot learning. However, in-domain pretraining seems not to be sufficient; novel pretraining and few-shot learning strategies are required in the biomedical NLP domain.

1.6CLAug 27, 2021Code

Deep learning models are not robust against noise in clinical text

Milad Moradi, Kathrin Blagec, Matthias Samwald

Artificial Intelligence (AI) systems are attracting increasing interest in the medical domain due to their ability to learn complicated tasks that require human intelligence and expert knowledge. AI systems that utilize high-performance Natural Language Processing (NLP) models have achieved state-of-the-art results on a wide variety of clinical text processing benchmarks. They have even outperformed human accuracy on some tasks. However, performance evaluation of such AI systems have been limited to accuracy measures on curated and clean benchmark datasets that may not properly reflect how robustly these systems can operate in real-world situations. In order to address this challenge, we introduce and implement a wide variety of perturbation methods that simulate different types of noise and variability in clinical text data. While noisy samples produced by these perturbation methods can often be understood by humans, they may cause AI systems to make erroneous decisions. Conducting extensive experiments on several clinical text processing tasks, we evaluated the robustness of high-performance NLP models against various types of character-level and word-level noise. The results revealed that the NLP models performance degrades when the input contains small amounts of noise. This study is a significant step towards exposing vulnerabilities of AI models utilized in clinical text processing systems. The proposed perturbation methods can be used in performance evaluation tests to assess how robustly clinical NLP models can operate on noisy data, in real-world settings.

31.8CLAug 27, 2021Code

Evaluating the Robustness of Neural Language Models to Input Perturbations

Milad Moradi, Matthias Samwald

High-performance neural language models have obtained state-of-the-art results on a wide range of Natural Language Processing (NLP) tasks. However, results for common benchmark datasets often do not reflect model reliability and robustness when applied to noisy, real-world data. In this study, we design and implement various types of character-level and word-level perturbation methods to simulate realistic scenarios in which input texts may be slightly noisy or different from the data distribution on which NLP systems were trained. Conducting comprehensive experiments on different NLP tasks, we investigate the ability of high-performance language models such as BERT, XLNet, RoBERTa, and ELMo in handling different types of input perturbations. The results suggest that language models are sensitive to input perturbations and their performance can decrease even when small changes are introduced. We highlight that models need to be further improved and that current benchmarks are not reflecting model robustness well. We argue that evaluations on perturbed inputs should routinely complement widely-used benchmarks in order to yield a more realistic understanding of NLP systems robustness.

0.2CLAug 16, 2021

Hybrid deep learning methods for phenotype prediction from clinical notes

Sahar Khalafi, Nasser Ghadiri, Milad Moradi

Identifying patient cohorts from clinical notes in secondary electronic health records is a fundamental task in clinical information management. However, with the growing number of clinical notes, it becomes challenging to analyze the data manually for phenotype detection. Automatic extraction of clinical concepts would helps to identify the patient phenotypes correctly. This paper proposes a novel hybrid model for automatically extracting patient phenotypes using natural language processing and deep learning models to determine the patient phenotypes without dictionaries and human intervention. The model is based on a neural bidirectional sequence model (BiLSTM or BiGRU) and a CNN layer for phenotypes identification. An extra CNN layer is run parallel to the hybrid model to extract more features related to each phenotype. We used pre-trained embeddings such as FastText and Word2vec separately as the input layers to evaluate other embedding's performance. Experimental results using MIMIC III database in internal comparison demonstrate that the proposed model achieved significant performance improvement over existing models. The enhanced version of our model with an extra CNN layer obtained a relatively higher F1-score than the original hybrid model. We also showed that BiGRU layer with FastText embedding had better performance than BiLSTM layer to identify patient phenotypes.

5.7AIOct 21, 2020

Explaining black-box text classifiers for disease-treatment information extraction

Milad Moradi, Matthias Samwald

Deep neural networks and other intricate Artificial Intelligence (AI) models have reached high levels of accuracy on many biomedical natural language processing tasks. However, their applicability in real-world use cases may be limited due to their vague inner working and decision logic. A post-hoc explanation method can approximate the behavior of a black-box AI model by extracting relationships between feature values and outcomes. In this paper, we introduce a post-hoc explanation method that utilizes confident itemsets to approximate the behavior of black-box classifiers for medical information extraction. Incorporating medical concepts and semantics into the explanation process, our explanator finds semantic relations between inputs and outputs in different parts of the decision space of a black-box classifier. The experimental results show that our explanation method can outperform perturbation and decision set based explanators in terms of fidelity and interpretability of explanations produced for predictions on a disease-treatment information extraction task.

13.8AIAug 6, 2020

A critical analysis of metrics used for measuring progress in artificial intelligence

Kathrin Blagec, Georg Dorffner, Milad Moradi et al.

Comparing model performances on benchmark datasets is an integral part of measuring and driving progress in artificial intelligence. A model's performance on a benchmark dataset is commonly assessed based on a single or a small set of performance metrics. While this enables quick comparisons, it may entail the risk of inadequately reflecting model performance if the metric does not sufficiently cover all performance characteristics. It is unknown to what extent this might impact benchmarking efforts. To address this question, we analysed the current landscape of performance metrics based on data covering 3867 machine learning model performance results from the open repository 'Papers with Code'. Our results suggest that the large majority of metrics currently used have properties that may result in an inadequate reflection of a models' performance. While alternative metrics that address problematic properties have been proposed, they are currently rarely used. Furthermore, we describe ambiguities in reported metrics, which may lead to difficulties in interpreting and comparing model performances.

19.4AIMay 5, 2020

Post-hoc explanation of black-box classifiers using confident itemsets

Milad Moradi, Matthias Samwald

Black-box Artificial Intelligence (AI) methods, e.g. deep neural networks, have been widely utilized to build predictive models that can extract complex relationships in a dataset and make predictions for new unseen data records. However, it is difficult to trust decisions made by such methods since their inner working and decision logic is hidden from the user. Explainable Artificial Intelligence (XAI) refers to systems that try to explain how a black-box AI model produces its outcomes. Post-hoc XAI methods approximate the behavior of a black-box by extracting relationships between feature values and the predictions. Perturbation-based and decision set methods are among commonly used post-hoc XAI systems. The former explanators rely on random perturbations of data records to build local or global linear models that explain individual predictions or the whole model. The latter explanators use those feature values that appear more frequently to construct a set of decision rules that produces the same outcomes as the target black-box. However, these two classes of XAI methods have some limitations. Random perturbations do not take into account the distribution of feature values in different subspaces, leading to misleading approximations. Decision sets only pay attention to frequent feature values and miss many important correlations between features and class labels that appear less frequently but accurately represent decision boundaries of the model. In this paper, we address the above challenges by proposing an explanation method named Confident Itemsets Explanation (CIE). We introduce confident itemsets, a set of feature values that are highly correlated to a specific class label. CIE utilizes confident itemsets to discretize the whole decision space of a model to smaller subspaces.

0.3CLAug 6, 2019

Text Summarization in the Biomedical Domain

Milad Moradi, Nasser Ghadiri

This chapter gives an overview of recent advances in the field of biomedical text summarization. Different types of challenges are introduced, and methods are discussed concerning the type of challenge that they address. Biomedical literature summarization is explored as a leading trend in the field, and some future lines of work are pointed out. Underlying methods of recent summarization systems are briefly explained and the most significant evaluation results are mentioned. The primary purpose of this chapter is to review the most significant research efforts made in the current decade toward new methods of biomedical text summarization. As the main parts of this chapter, current trends are discussed and new challenges are introduced.

0.6CLMar 7, 2019

Small-world networks for summarization of biomedical articles

Milad Moradi

In recent years, many methods have been developed to identify important portions of text documents. Summarization tools can utilize these methods to extract summaries from large volumes of textual information. However, to identify concepts representing central ideas within a text document and to extract the most informative sentences that best convey those concepts still remain two crucial tasks in summarization methods. In this paper, we introduce a graph-based method to address these two challenges in the context of biomedical text summarization. We show that how a summarizer can discover meaningful concepts within a biomedical text document using the Helmholtz principle. The summarizer considers the meaningful concepts as the main topics and constructs a graph based on the topics that the sentences share. The summarizer can produce an informative summary by extracting those sentences having higher values of the degree. We assess the performance of our method for summarization of biomedical articles using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit. The results show that the degree can be a useful centrality measure to identify important sentences in this type of graph-based modelling. Our method can improve the performance of biomedical text summarization compared to some state-of-the-art and publicly available summarizers. Combining a concept-based modelling strategy and a graph-based approach to sentence extraction, our summarizer can produce summaries with the highest scores of informativeness among the comparison methods. This research work can be regarded as a start point to the study of small-world networks in summarization of biomedical texts.

5.1DCSep 11, 2016

A centralized reinforcement learning method for multi-agent job scheduling in Grid

Milad Moradi

One of the main challenges in Grid systems is designing an adaptive, scalable, and model-independent method for job scheduling to achieve a desirable degree of load balancing and system efficiency. Centralized job scheduling methods have some drawbacks, such as single point of failure and lack of scalability. Moreover, decentralized methods require a coordination mechanism with limited communications. In this paper, we propose a multi-agent approach to job scheduling in Grid, named Centralized Learning Distributed Scheduling (CLDS), by utilizing the reinforcement learning framework. The CLDS is a model free approach that uses the information of jobs and their completion time to estimate the efficiency of resources. In this method, there are a learner agent and several scheduler agents that perform the task of learning and job scheduling with the use of a coordination strategy that maintains the communication cost at a limited level. We evaluated the efficiency of the CLDS method by designing and performing a set of experiments on a simulated Grid system under different system scales and loads. The results show that the CLDS can effectively balance the load of system even in large scale and heavy loaded Grids, while maintains its adaptive performance and scalability.

10.8IRSep 10, 2016

Quantifying the informativeness for biomedical literature summarization: An itemset mining method

Milad Moradi, Nasser Ghadiri

Objective: Automatic text summarization tools can help users in the biomedical domain to access information efficiently from a large volume of scientific literature and other sources of text documents. In this paper, we propose a summarization method that combines itemset mining and domain knowledge to construct a concept-based model and to extract the main subtopics from an input document. Our summarizer quantifies the informativeness of each sentence using the support values of itemsets appearing in the sentence. Methods: To address the concept-level analysis of text, our method initially maps the original document to biomedical concepts using the UMLS. Then, it discovers the essential subtopics of the text using a data mining technique, namely itemset mining, and constructs the summarization model. The employed itemset mining algorithm extracts a set of frequent itemsets containing correlated and recurrent concepts of the input document. The summarizer selects the most related and informative sentences and generates the final summary. Results: We evaluate the performance of our itemset-based summarizer using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, performing a set of experiments. The results show that the itemset-based summarizer performs better than the compared methods. The itemset-based summarizer achieves the best scores for all the assessed ROUGE metrics . Conclusion: Compared to the statistical, similarity, and word frequency methods, the proposed method demonstrates that the summarization model obtained from the concept extraction and itemset mining provides the summarizer with an effective metric for measuring the informative content of sentences. This can lead to an improvement in the performance of biomedical literature summarization.

6.3CLMay 10, 2016

Different approaches for identifying important concepts in probabilistic biomedical text summarization

Milad Moradi, Nasser Ghadiri

Automatic text summarization tools help users in biomedical domain to acquire their intended information from various textual resources more efficiently. Some of the biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the frequency for identifying the valuable content of the input document, and considering the correlations existing between concepts may be more useful for this type of summarization. In this paper, we describe a Bayesian summarizer for biomedical text documents. The Bayesian summarizer initially maps the input text to the Unified Medical Language System (UMLS) concepts, then it selects the important ones to be used as classification features. We introduce different feature selection approaches to identify the most important concepts of the text and to select the most informative content according to the distribution of these concepts. We show that with the use of an appropriate feature selection approach, the Bayesian biomedical summarizer can improve the performance of summarization. We perform extensive evaluations on a corpus of scientific papers in biomedical domain. The results show that the Bayesian summarizer outperforms the biomedical summarizers that rely on the frequency of concepts, the domain-independent and baseline methods based on the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. Moreover, the results suggest that using the meaningfulness measure and considering the correlations of concepts in the feature selection step lead to a significant increase in the performance of summarization.