Nicholas Micallef

CR
h-index13
13papers
263citations
Novelty31%
AI Score43

13 Papers

11.9CLApr 17
SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough et al.

Recent advances in language models have substantially improved Natural Language Understanding (NLU). Although widely used benchmarks suggest that Large Language Models (LLMs) can effectively disambiguate, their practical applicability in real-world narrative contexts remains underexplored. SemEval-2026 Task 5 addresses this gap by introducing a task that predicts the human-perceived plausibility of a word sense within a short story. In this work, we propose an LLM-based framework for plausibility scoring of homonymous word senses in narrative texts using a structured reasoning mechanism. We examine the impact of fine-tuning low-parameter LLMs with diverse reasoning strategies, alongside dynamic few-shot prompting for large-parameter models, on accurate sense identification and plausibility estimation. Our results show that commercial large-parameter LLMs with dynamic few-shot prompting closely replicate human-like plausibility judgments. Furthermore, model ensembling slightly improves performance, better simulating the agreement patterns of five human annotators compared to single-model predictions

CLMar 5Code
An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough

Word Sense Disambiguation (WSD) remains a key challenge in Natural Language Processing (NLP), especially when dealing with rare or domain-specific senses that are often misinterpreted. While modern high-parameter Large Language Models (LLMs) such as GPT-4-Turbo have shown state-of-the-art WSD performance, their computational and energy demands limit scalability. This study investigates whether low-parameter LLMs (<4B parameters) can achieve comparable results through fine-tuning strategies that emphasize reasoning-driven sense identification. Using the FEWS dataset augmented with semi-automated, rationale-rich annotations, we fine-tune eight small-scale open-source LLMs (e.g. Gemma and Qwen). Our results reveal that Chain-of-Thought (CoT)-based reasoning combined with neighbour-word analysis achieves performance comparable to GPT-4-Turbo in zero-shot settings. Importantly, Gemma-3-4B and Qwen-3-4B models consistently outperform all medium-parameter baselines and state-of-the-art models on FEWS, with robust generalization to unseen senses. Furthermore, evaluation on the unseen "Fool Me If You Can'' dataset confirms strong cross-domain adaptability without task-specific fine-tuning. This work demonstrates that with carefully crafted reasoning-centric fine-tuning, low-parameter LLMs can deliver accurate WSD while substantially reducing computational and energy demands.

CLJan 10, 2025
IndoNLP 2025: Shared Task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages

Deshan Sumanathilaka, Isuri Anuradha, Ruvan Weerasinghe et al.

The paper overviews the shared task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages. It focuses on the reverse transliteration of low-resourced languages in the Indo-Aryan family to their native scripts. Typing Romanized Indo-Aryan languages using ad-hoc transliterals and achieving accurate native scripts are complex and often inaccurate processes with the current keyboard systems. This task aims to introduce and evaluate a real-time reverse transliterator that converts Romanized Indo-Aryan languages to their native scripts, improving the typing experience for users. Out of 11 registered teams, four teams participated in the final evaluation phase with transliteration models for Sinhala, Hindi and Malayalam. These proposed solutions not only solve the issue of ad-hoc transliteration but also empower low-resource language usability in the digital arena.

CLNov 27, 2024
Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation

T. G. D. K. Sumanathilaka, Nicholas Micallef, Julian Hough

Ambiguous words are often found in modern digital communications. Lexical ambiguity challenges traditional Word Sense Disambiguation (WSD) methods, due to limited data. Consequently, the efficiency of translation, information retrieval, and question-answering systems is hindered by these limitations. This study investigates the use of Large Language Models (LLMs) to improve WSD using a novel approach combining a systematic prompt augmentation mechanism with a knowledge base (KB) consisting of different sense interpretations. The proposed method incorporates a human-in-loop approach for prompt augmentation where prompt is supported by Part-of-Speech (POS) tagging, synonyms of ambiguous words, aspect-based sense filtering and few-shot prompting to guide the LLM. By utilizing a few-shot Chain of Thought (COT) prompting-based approach, this work demonstrates a substantial improvement in performance. The evaluation was conducted using FEWS test data and sense tags. This research advances accurate word interpretation in social media and digital communication.

CLOct 4, 2025
Prompt Balance Matters: Understanding How Imbalanced Few-Shot Learning Affects Multilingual Sense Disambiguation in LLMs

Deshan Sumanathilaka, Nicholas Micallef, Julian Hough

Recent advances in Large Language Models (LLMs) have significantly reshaped the landscape of Natural Language Processing (NLP). Among the various prompting techniques, few-shot prompting has gained considerable attention for its practicality and effectiveness. This study investigates how few-shot prompting strategies impact the Word Sense Disambiguation (WSD) task, particularly focusing on the biases introduced by imbalanced sample distributions. We use the GLOSSGPT prompting method, an advanced approach for English WSD, to test its effectiveness across five languages: English, German, Spanish, French, and Italian. Our results show that imbalanced few-shot examples can cause incorrect sense predictions in multilingual languages, but this issue does not appear in English. To assess model behavior, we evaluate both the GPT-4o and LLaMA-3.1-70B models and the results highlight the sensitivity of multilingual WSD to sample distribution in few-shot settings, emphasizing the need for balanced and representative prompting strategies.

SINov 11, 2020
The Role of the Crowd in Countering Misinformation: A Case Study of the COVID-19 Infodemic

Nicholas Micallef, Bing He, Srijan Kumar et al.

Fact checking by professionals is viewed as a vital defense in the fight against misinformation.While fact checking is important and its impact has been significant, fact checks could have limited visibility and may not reach the intended audience, such as those deeply embedded in polarized communities. Concerned citizens (i.e., the crowd), who are users of the platforms where misinformation appears, can play a crucial role in disseminating fact-checking information and in countering the spread of misinformation. To explore if this is the case, we conduct a data-driven study of misinformation on the Twitter platform, focusing on tweets related to the COVID-19 pandemic, analyzing the spread of misinformation, professional fact checks, and the crowd response to popular misleading claims about COVID-19. In this work, we curate a dataset of false claims and statements that seek to challenge or refute them. We train a classifier to create a novel dataset of 155,468 COVID-19-related tweets, containing 33,237 false claims and 33,413 refuting arguments.Our findings show that professional fact-checking tweets have limited volume and reach. In contrast, we observe that the surge in misinformation tweets results in a quick response and a corresponding increase in tweets that refute such misinformation. More importantly, we find contrasting differences in the way the crowd refutes tweets, some tweets appear to be opinions, while others contain concrete evidence, such as a link to a reputed source. Our work provides insights into how misinformation is organically countered in social platforms by some of their users and the role they play in amplifying professional fact checks.These insights could lead to development of tools and mechanisms that can empower concerned citizens in combating misinformation. The code and data can be found in http://claws.cc.gatech.edu/covid_counter_misinformation.html.

CRAug 24, 2019
That's Not Me! Designing Fictitious Profiles to Answer Security Questions

Nicholas Micallef, Nalin Asanka Gamagedara Arachchilage

Although security questions are still widely adopted, they still have several limitations. Previous research found that using system-generated information to answer security questions could be more secure than users' own answers. However, using system-generated information has usability limitations. To improve usability, previous research proposed the design of system-generated fictitious profiles. The information from these profiles would be used to answer security questions. However, no research has studied the elements that could influence the design of fictitious profiles or systems that use them to answer security questions. To address this research gap, we conducted an empirical investigation through 20 structured interviews. Our main findings revealed that to improve the design of fictitious profiles, users should be given the option to configure the profiles to make them relatable, interesting and memorable. We also found that the security questions currently provided by websites would need to be enhanced to cater for fictitious profiles.

CROct 11, 2017
Involving Users in the Design of a Serious Game for Security Questions Education

Nicholas Micallef, Nalin Asanka Gamagedara Arachchilage

When using security questions most users still trade-off security for the convenience of memorability. This happens because most users find strong answers to security questions difficult to remember. Previous research in security education was successful in motivating users to change their behaviour towards security issues, through the use of serious games (i.e. games designed for a primary purpose other than pure entertainment). Hence, in this paper we evaluate the design of a serious game, to investigate the features and functionalities that users would find desirable in a game that aims to educate them to provide strong and memorable answers to security questions. Our findings reveal that: (1) even for security education games, rewards seem to motivate users to have a better learning experience; (2) functionalities which contain a social element (e.g. getting help from other players) do not seem appropriate for serious games related to security questions, because users fear that their acquaintances could gain access to their security questions; (3) even users who do not usually play games would seem to prefer to play security education games on a mobile device.

CRSep 24, 2017
A Serious Game Design: Nudging Users' Memorability of Security Questions

Nicholas Micallef, Nalin Asanka Gamagedara Arachchilage

Security questions are one of the techniques used to recover passwords. The main limitation of security questions is that users find strong answers difficult to remember. This leads users to trade-off security for the convenience of an improved memorability. Previous research found that increased fun and enjoyment can lead to an enhanced memorability, which provides a better learning experience. Hence, we empirically investigate whether a serious game has the potential of improving the memorability of strong answers to security questions. For our serious game, we adapted the popular "4 Pics 1 word" mobile game because of its use of pictures and cues, which psychology research found to be important to help with memorability. Our findings indicate that the proposed serious game could potentially improve the memorability of answers to security questions. This potential improvement in memorability, could eventually help reduce the trade-off between usability and security in fall-back authentication.

CRSep 24, 2017
A Model for Enhancing Human Behaviour with Security Questions: A Theoretical Perspective

Nicholas Micallef, Nalin Asanka Gamagedara Arachchilage

Security questions are one of the mechanisms used to recover passwords. Strong answers to security questions (i.e. high entropy) are hard for attackers to guess or obtain using social engineering techniques (e.g. monitoring of social networking profiles), but at the same time are difficult to remember. Instead, weak answers to security questions (i.e. low entropy) are easy to remember, which makes them more vulnerable to cyber-attacks. Convenience leads users to use the same answers to security questions on multiple accounts, which exposes these accounts to numerous cyber-threats. Hence, current security questions implementations rarely achieve the required security and memorability requirements. This research study is the first step in the development of a model which investigates the determinants that influence users' behavioural intentions through motivation to select strong and memorable answers to security questions. This research also provides design recommendations for novel security questions mechanisms.

CRSep 24, 2017
Changing users' security behaviour towards security questions: A game based learning approach

Nicholas Micallef, Nalin Asanka Gamagedara Arachchilage

Fallback authentication is used to retrieve forgotten passwords. Security questions are one of the main techniques used to conduct fallback authentication. In this paper, we propose a serious game design that uses system-generated security questions with the aim of improving the usability of fallback authentication. For this purpose, we adopted the popular picture-based "4 Pics 1 word" mobile game. This game was selected because of its use of pictures and cues, which previous psychology research found to be crucial to aid memorability. This game asks users to pick the word that relates to the given pictures. We then customized this game by adding features which help maximize the following memory retrieval skills: (a) verbal cues - by providing hints with verbal descriptions, (b) spatial cues - by maintaining the same order of pictures, (c) graphical cues - by showing 4 images for each challenge, (d) interactivity/engaging nature of the game.

CRJul 25, 2017
A Gamified Approach to Improve Users' Memorability of Fall-back Authentication

Nicholas Micallef, Nalin Asanka Gamagedara Arachchilage

Security questions are one of the techniques used in fall-back authentication to retrieve forgotten passwords. This paper proposes a game design which aims to improve usability of system-generated security questions. In our game design, we adapted the popular picture-based "4 Pics 1 word" mobile game. This game asks users to pick the word that relates the given pictures. We selected this game because of its use of pictures and cues, in which, psychology research has found to be important to help with memorability. The proposed game design focuses on encoding information to users' long- term memory and to aide memorability by using the follow- ing memory retrieval skills: (a) graphical cues - by using images in each challenge; (b) verbal cues - by using verbal descriptions as hints; (c) spatial cues - by keeping same or- der of pictures; (d) interactivity - engaging nature of the game through the use of persuasive technology principles.

CROct 28, 2014
Data Driven Authentication: On the Effectiveness of User Behaviour Modelling with Mobile Device Sensors

Hilmi Gunes Kayacik, Mike Just, Lynne Baillie et al.

We propose a lightweight, and temporally and spatially aware user behaviour modelling technique for sensor-based authentication. Operating in the background, our data driven technique compares current behaviour with a user profile. If the behaviour deviates sufficiently from the established norm, actions such as explicit authentication can be triggered. To support a quick and lightweight deployment, our solution automatically switches from training mode to deployment mode when the user's behaviour is sufficiently learned. Furthermore, it allows the device to automatically determine a suitable detection threshold. We use our model to investigate practical aspects of sensor-based authentication by applying it to three publicly available data sets, computing expected times for training duration and behaviour drift. We also test our model with scenarios involving an attacker with varying knowledge and capabilities.