CLApr 2, 2023
A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension DatasetsIva Bojic, Josef Halim, Verena Suharman et al.
Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their creation. Thus, it is vital to ensure high-quality domain-specific training data. In this paper, we propose a framework for enhancing the data quality of original datasets. We applied the proposed framework to four biomedical datasets and showed relative improvement of up to 33%/40% for fine-tuning of retrieval/reader models on the BioASQ dataset when using back translation to enhance the original dataset quality.
CLOct 3, 2023
Hierarchical Evaluation Framework: Best Practices for Human EvaluationIva Bojic, Jessica Chen, Si Yuan Chang et al.
Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards. Through an extensive analysis of existing literature on human evaluation metrics, we identified several gaps in NLP evaluation methodologies. These gaps served as motivation for developing our own hierarchical evaluation framework. The proposed framework offers notable advantages, particularly in providing a more comprehensive representation of the NLP system's performance. We applied this framework to evaluate the developed Machine Reading Comprehension system, which was utilized within a human-AI symbiosis model. The results highlighted the associations between the quality of inputs and outputs, underscoring the necessity to evaluate both components rather than solely focusing on outputs. In future work, we will investigate the potential time-saving benefits of our proposed framework for evaluators assessing NLP systems.
CLMay 31, 2023
Building Extractive Question Answering System to Support Human-AI Health Coaching Model for Sleep DomainIva Bojic, Qi Chwen Ong, Shafiq Joty et al.
Non-communicable diseases (NCDs) are a leading cause of global deaths, necessitating a focus on primary prevention and lifestyle behavior change. Health coaching, coupled with Question Answering (QA) systems, has the potential to transform preventive healthcare. This paper presents a human-Artificial Intelligence (AI) health coaching model incorporating a domain-specific extractive QA system. A sleep-focused dataset, SleepQA, was manually assembled and used to fine-tune domain-specific BERT models. The QA system was evaluated using automatic and human methods. A data-centric framework enhanced the system's performance by improving passage retrieval and question reformulation. Although the system did not outperform the baseline in automatic evaluation, it excelled in the human evaluation of real-world questions. Integration into a Human-AI health coaching model was tested in a pilot Randomized Controlled Trial (RCT).
CYNov 28, 2016
Online tools for public engagement: case studies from ReykjavikIva Bojic, Giulia Marra, Vera Naydenova
With the ubiquity of Internet technologies and growing demands for transparency and open data policies, the role of social networking and online deliberation tools for public engagement in decision-making has increased substantially in the last decades. In this paper, we present the analysis of how social media are used by different public bodies to enhance public participation in deliberative democracy. We collected and reviewed published information on the subject and carried out a field base assessment, involving structured interviews with different government representatives and urban policymakers. In order to compare collected data, we used a framework for systematic analysis and comparison of e-participation platforms called the participatory cube. The results we got were the following. Participatory decision-making on matters of public concern justly consumes time and resources, therefore online tools should be applied with consideration of scale and efficiency, i.e. on burning issues for a majority of citizens or small-scale local platforms, and in combination with meetings in real time and space. The budget and workforce allocated to managing online engagement tools should be proportionate to other political and administrative efforts to bring to execution proposed ideas and act on collected feedback in order to satisfy the needs expressed by the communities and not undermine their beliefs about their power to influence decisions.