Quim Motger

CL
4papers
64citations
Novelty23%
AI Score33

4 Papers

CLAug 2, 2024
Leveraging Encoder-only Large Language Models for Mobile App Review Feature Extraction

Quim Motger, Alessio Miaschi, Felice Dell'Orletta et al.

Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment analysis, but it remains a challenging task. Meanwhile, encoder-only models based on the Transformer architecture have shown promising results for classification and information extraction tasks for multiple software engineering processes. This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews. By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task. Our approach includes extending the pre-training of these models with a large corpus of user reviews to improve contextual understanding and employing instance selection techniques to optimize model fine-tuning. Empirical evaluations demonstrate that this method improves the precision and recall of extracted features and enhances performance efficiency. Key contributions include a novel approach to feature extraction, annotated datasets, extended pre-trained models, and an instance selection mechanism for cost-effective fine-tuning. This research provides practical methods and empirical evidence in applying large language models to natural language processing tasks within mobile app reviews, offering improved performance in feature extraction.

20.7SEApr 15
Characterizing Datasets for LLM-based Requirements Engineering: A Systematic Mapping Study

Quim Motger, Carlota Catot, Xavier Franch

Large Language Models (LLMs) depend on high-quality, domain-specific natural language datasets. This dependency is particularly pronounced in Requirements Engineering (RE), where core activities rely on textual artifacts such as requirements, specifications, and stakeholder feedback. Despite the increasing use of LLMs in RE, data scarcity remains a widely reported limitation. While several datasets support LLM-based RE research, they are scattered across studies and lack systematic characterization, hindering reuse, comparability and assessment. This paper addresses this gap by examining which public datasets are used in LLM-based RE, how they can be consistently characterized, and which RE tasks and dataset properties remain under-represented. We report on a systematic mapping study of 45 primary studies referencing 62 publicly available datasets. Each dataset is characterized using a structured scheme covering multiple dimensions, including relevant descriptors such as artifact type, granularity, RE activity, supported task, application domain, and language, among others. The results reveal notable imbalances, including an incomplete adoption of open-science practices, limited dataset support for elicitation activities, and a lack of language and socio-technical diversity. The resulting catalogue and characterisation scheme support informed dataset selection, comparison, and reuse, contributing to stronger empirical foundations for LLM-based RE research and evaluation.

CLJun 21, 2021
Software-Based Dialogue Systems: Survey, Taxonomy and Challenges

Quim Motger, Xavier Franch, Jordi Marco

The use of natural language interfaces in the field of human-computer interaction is undergoing intense study through dedicated scientific and industrial research. The latest contributions in the field, including deep learning approaches like recurrent neural networks, the potential of context-aware strategies and user-centred design approaches, have brought back the attention of the community to software-based dialogue systems, generally known as conversational agents or chatbots. Nonetheless, and given the novelty of the field, a generic, context-independent overview on the current state of research of conversational agents covering all research perspectives involved is missing. Motivated by this context, this paper reports a survey of the current state of research of conversational agents through a systematic literature review of secondary studies. The conducted research is designed to develop an exhaustive perspective through a clear presentation of the aggregated knowledge published by recent literature within a variety of domains, research focuses and contexts. As a result, this research proposes a holistic taxonomy of the different dimensions involved in the conversational agents' field, which is expected to help researchers and to lay the groundwork for future research in the field of natural language interfaces.

SEFeb 16, 2021
Improved management of issue dependencies in issue trackers of large collaborative projects

Mikko Raatikainen, Quim Motger, Clara Marie Lüders et al.

Issue trackers, such as Jira, have become the prevalent collaborative tools in software engineering for managing issues, such as requirements, development tasks, and software bugs. However, issue trackers inherently focus on the lifecycle of single issues, although issues have and express dependencies on other issues that constitute issue dependency networks in large complex collaborative projects. The objective of this study is to develop supportive solutions for the improved management of dependent issues in an issue tracker. This study follows the Design Science methodology, consisting of eliciting drawbacks and constructing and evaluating a solution and system. The study was carried out in the context of The Qt Company's Jira, which exemplifies an actively used, almost two-decade-old issue tracker with over 100,000 issues. The drawbacks capture how users operate with issue trackers to handle issue information in large, collaborative, and long-lived projects. The basis of the solution is to keep issues and dependencies as separate objects and automatically construct an issue graph. Dependency detections complement the issue graph by proposing missing dependencies, while consistency checks and diagnoses identify conflicting issue priorities and release assignments. Jira's plugin and service-based system architecture realize the functional and quality concerns of the system implementation. We show how to adopt the intelligent supporting techniques of an issue tracker in a complex use context and a large data-set. The solution considers an integrated and holistic system view, practical applicability and utility, and the practical characteristics of issue data, such as inherent incompleteness.