AIOct 18, 2023
A Unifying Framework for Learning Argumentation SemanticsZlatina Mileva, Antonis Bikakis, Fabio Aurelio D'Asaro et al.
Argumentation is a very active research field of Artificial Intelligence concerned with the representation and evaluation of arguments used in dialogues between humans and/or artificial agents. Acceptability semantics of formal argumentation systems define the criteria for the acceptance or rejection of arguments. Several software systems, known as argumentation solvers, have been developed to compute the accepted/rejected arguments using such criteria. These include systems that learn to identify the accepted arguments using non-interpretable methods. In this paper we present a novel framework, which uses an Inductive Logic Programming approach to learn the acceptability semantics for several abstract and structured argumentation frameworks in an interpretable way. Through an empirical evaluation we show that our framework outperforms existing argumentation solvers, thus opening up new future research directions in the area of formal argumentation and human-machine dialogues.
AISep 20, 2024
Measuring Error Alignment for Decision-Making SystemsBinxia Xu, Antonis Bikakis, Daniel Onah et al.
Given that AI systems are set to play a pivotal role in future decision-making processes, their trustworthiness and reliability are of critical concern. Due to their scale and complexity, modern AI systems resist direct interpretation, and alternative ways are needed to establish trust in those systems, and determine how well they align with human values. We argue that good measures of the information processing similarities between AI and humans, may be able to achieve these same ends. While Representational alignment (RA) approaches measure similarity between the internal states of two systems, the associated data can be expensive and difficult to collect for human systems. In contrast, Behavioural alignment (BA) comparisons are cheaper and easier, but questions remain as to their sensitivity and reliability. We propose two new behavioural alignment metrics misclassification agreement which measures the similarity between the errors of two systems on the same instances, and class-level error similarity which measures the similarity between the error distributions of two systems. We show that our metrics correlate well with RA metrics, and provide complementary information to another BA metric, within a range of domains, and set the scene for a new approach to value alignment.
AIJun 15, 2023
A Graphical Formalism for Commonsense Reasoning with RecipesAntonis Bikakis, Aissatou Diallo, Luke Dickens et al.
Whilst cooking is a very important human activity, there has been little consideration given to how we can formalize recipes for use in a reasoning framework. We address this need by proposing a graphical formalization that captures the comestibles (ingredients, intermediate food items, and final products), and the actions on comestibles in the form of a labelled bipartite graph. We then propose formal definitions for comparing recipes, for composing recipes from subrecipes, and for deconstructing recipes into subrecipes. We also introduce and compare two formal definitions for substitution into recipes which are required when there are missing ingredients, or some actions are not possible, or because there is a need to change the final product somehow.
CLJan 12, 2024
PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking RecipesAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
Understanding procedural texts, such as cooking recipes, is essential for enabling machines to follow instructions and reason about tasks, a key aspect of intelligent reasoning. In cooking, these instructions can be interpreted as a series of modifications to a food preparation. For a model to effectively reason about cooking recipes, it must accurately discern and understand the inputs and outputs of intermediate steps within the recipe. We present a new corpus of cooking recipes enriched with descriptions of intermediate steps that describe the input and output for each step. PizzaCommonsense serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit input-output descriptions to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-4 achieves only 26\% human-evaluated preference for generations, leaving room for future improvements.
CLMar 14, 2025
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language ModelsAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
In this paper, we introduce Rule-Guided Feedback (RGF), a framework designed to enhance Large Language Model (LLM) performance through structured rule adherence and strategic information seeking. RGF implements a teacher-student paradigm where rule-following is forced through established guidelines. Our framework employs a Teacher model that rigorously evaluates each student output against task-specific rules, providing constructive guidance rather than direct answers when detecting deviations. This iterative feedback loop serves two crucial purposes: maintaining solutions within defined constraints and encouraging proactive information seeking to resolve uncertainties. We evaluate RGF on diverse tasks including Checkmate-in-One puzzles, Sonnet Writing, Penguins-In-a-Table classification, GSM8k, and StrategyQA. Our findings suggest that structured feedback mechanisms can significantly enhance LLMs' performance across various domains.
CLMar 14, 2025
RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis SituationAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
An interesting class of commonsense reasoning problems arises when people are faced with natural disasters. To investigate this topic, we present \textsf{RESPONSE}, a human-curated dataset containing 1789 annotated instances featuring 6037 sets of questions designed to assess LLMs' commonsense reasoning in disaster situations across different time frames. The dataset includes problem descriptions, missing resources, time-sensitive solutions, and their justifications, with a subset validated by environmental engineers. Through both automatic metrics and human evaluation, we compare LLM-generated recommendations against human responses. Our findings show that even state-of-the-art models like GPT-4 achieve only 37\% human-evaluated correctness for immediate response actions, highlighting significant room for improvement in LLMs' ability for commonsense reasoning in crises.
CLFeb 5, 2025
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning TemplatesAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
While Large Language Models (LLMs) demonstrate impressive reasoning capabilities, understanding and validating their knowledge utilization remains challenging. Chain-of-thought (CoT) prompting partially addresses this by revealing intermediate reasoning steps, but the knowledge flow and application remain implicit. We introduce IAO (Input-Action-Output) prompting, a structured template-based method that explicitly models how LLMs access and apply their knowledge during complex reasoning tasks. IAO decomposes problems into sequential steps, each clearly identifying the input knowledge being used, the action being performed, and the resulting output. This structured decomposition enables us to trace knowledge flow, verify factual consistency, and identify potential knowledge gaps or misapplications. Through experiments across diverse reasoning tasks, we demonstrate that IAO not only improves zero-shot performance but also provides transparency in how LLMs leverage their stored knowledge. Human evaluation confirms that this structured approach enhances our ability to verify knowledge utilization and detect potential hallucinations or reasoning errors. Our findings provide insights into both knowledge representation within LLMs and methods for more reliable knowledge application.
CLJan 22, 2024
Unsupervised Learning of Graph from RecipesAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
Cooking recipes are one of the most readily available kinds of procedural text. They consist of natural language instructions that can be challenging to interpret. In this paper, we propose a model to identify relevant information from recipes and generate a graph to represent the sequence of actions in the recipe. In contrast with other approaches, we use an unsupervised approach. We iteratively learn the graph structure and the parameters of a $\mathsf{GNN}$ encoding the texts (text-to-graph) one sequence at a time while providing the supervision by decoding the graph into text (graph-to-text) and comparing the generated text to the input. We evaluate the approach by comparing the identified entities with annotated datasets, comparing the difference between the input and output texts, and comparing our generated graphs with those generated by state of the art methods.
AISep 17, 2021
Repurposing of Resources: from Everyday Problem Solving through to Crisis ManagementAntonis Bikakis, Luke Dickens, Anthony Hunter et al.
The human ability to repurpose objects and processes is universal, but it is not a well-understood aspect of human intelligence. Repurposing arises in everyday situations such as finding substitutes for missing ingredients when cooking, or for unavailable tools when doing DIY. It also arises in critical, unprecedented situations needing crisis management. After natural disasters and during wartime, people must repurpose the materials and processes available to make shelter, distribute food, etc. Repurposing is equally important in professional life (e.g. clinicians often repurpose medicines off-license) and in addressing societal challenges (e.g. finding new roles for waste products,). Despite the importance of repurposing, the topic has received little academic attention. By considering examples from a variety of domains such as every-day activities, drug repurposing and natural disasters, we identify some principle characteristics of the process and describe some technical challenges that would be involved in modelling and simulating it. We consider cases of both substitution, i.e. finding an alternative for a missing resource, and exploitation, i.e. identifying a new role for an existing resource. We argue that these ideas could be developed into general formal theory of repurposing, and that this could then lead to the development of AI methods based on commonsense reasoning, argumentation, ontological reasoning, and various machine learning methods, to develop tools to support repurposing in practice.
CLOct 5, 2020
On the Effects of Knowledge-Augmented Data in Word EmbeddingsDiego Ramirez-Echavarria, Antonis Bikakis, Luke Dickens et al.
This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains with differing language distributions or usages. We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings that enforce semantic relationships from the data, and systematically evaluate the impact it has on the resulting representations. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
AIJun 19, 2020
Contextual and Possibilistic Reasoning for Coalition FormationAntonis Bikakis, Patrice Caire
In multiagent systems, agents often have to rely on other agents to reach their goals, for example when they lack a needed resource or do not have the capability to perform a required action. Agents therefore need to cooperate. Then, some of the questions raised are: Which agent(s) to cooperate with? What are the potential coalitions in which agents can achieve their goals? As the number of possibilities is potentially quite large, how to automate the process? And then, how to select the most appropriate coalition, taking into account the uncertainty in the agents' abilities to carry out certain tasks? In this article, we address the question of how to find and evaluate coalitions among agents in multiagent systems using MCS tools, while taking into consideration the uncertainty around the agents' actions. Our methodology is the following: We first compute the solution space for the formation of coalitions using a contextual reasoning approach. Second, we model agents as contexts in Multi-Context Systems (MCS), and dependence relations among agents seeking to achieve their goals, as bridge rules. Third, we systematically compute all potential coalitions using algorithms for MCS equilibria, and given a set of functional and non-functional requirements, we propose ways to select the best solutions. Finally, in order to handle the uncertainty in the agents' actions, we extend our approach with features of possibilistic reasoning. We illustrate our approach with an example from robotics.
AIMar 20, 2017
Foundations for a Probabilistic Event CalculusFabio Aurelio D'Asaro, Antonis Bikakis, Luke Dickens et al.
We present PEC, an Event Calculus (EC) style action language for reasoning about probabilistic causal and narrative information. It has an action language style syntax similar to that of the EC variant Modular-E. Its semantics is given in terms of possible worlds which constitute possible evolutions of the domain, and builds on that of EFEC, an epistemic extension of EC. We also describe an ASP implementation of PEC and show the sense in which this is sound and complete.