AIJun 15, 2023
A Graphical Formalism for Commonsense Reasoning with RecipesAntonis Bikakis, Aissatou Diallo, Luke Dickens et al.
Whilst cooking is a very important human activity, there has been little consideration given to how we can formalize recipes for use in a reasoning framework. We address this need by proposing a graphical formalization that captures the comestibles (ingredients, intermediate food items, and final products), and the actions on comestibles in the form of a labelled bipartite graph. We then propose formal definitions for comparing recipes, for composing recipes from subrecipes, and for deconstructing recipes into subrecipes. We also introduce and compare two formal definitions for substitution into recipes which are required when there are missing ingredients, or some actions are not possible, or because there is a need to change the final product somehow.
CLJan 12, 2024
PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking RecipesAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
Understanding procedural texts, such as cooking recipes, is essential for enabling machines to follow instructions and reason about tasks, a key aspect of intelligent reasoning. In cooking, these instructions can be interpreted as a series of modifications to a food preparation. For a model to effectively reason about cooking recipes, it must accurately discern and understand the inputs and outputs of intermediate steps within the recipe. We present a new corpus of cooking recipes enriched with descriptions of intermediate steps that describe the input and output for each step. PizzaCommonsense serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit input-output descriptions to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-4 achieves only 26\% human-evaluated preference for generations, leaving room for future improvements.
CLMar 14, 2025
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language ModelsAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
In this paper, we introduce Rule-Guided Feedback (RGF), a framework designed to enhance Large Language Model (LLM) performance through structured rule adherence and strategic information seeking. RGF implements a teacher-student paradigm where rule-following is forced through established guidelines. Our framework employs a Teacher model that rigorously evaluates each student output against task-specific rules, providing constructive guidance rather than direct answers when detecting deviations. This iterative feedback loop serves two crucial purposes: maintaining solutions within defined constraints and encouraging proactive information seeking to resolve uncertainties. We evaluate RGF on diverse tasks including Checkmate-in-One puzzles, Sonnet Writing, Penguins-In-a-Table classification, GSM8k, and StrategyQA. Our findings suggest that structured feedback mechanisms can significantly enhance LLMs' performance across various domains.
CLMar 14, 2025
RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis SituationAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
An interesting class of commonsense reasoning problems arises when people are faced with natural disasters. To investigate this topic, we present \textsf{RESPONSE}, a human-curated dataset containing 1789 annotated instances featuring 6037 sets of questions designed to assess LLMs' commonsense reasoning in disaster situations across different time frames. The dataset includes problem descriptions, missing resources, time-sensitive solutions, and their justifications, with a subset validated by environmental engineers. Through both automatic metrics and human evaluation, we compare LLM-generated recommendations against human responses. Our findings show that even state-of-the-art models like GPT-4 achieve only 37\% human-evaluated correctness for immediate response actions, highlighting significant room for improvement in LLMs' ability for commonsense reasoning in crises.
CLFeb 5, 2025
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning TemplatesAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
While Large Language Models (LLMs) demonstrate impressive reasoning capabilities, understanding and validating their knowledge utilization remains challenging. Chain-of-thought (CoT) prompting partially addresses this by revealing intermediate reasoning steps, but the knowledge flow and application remain implicit. We introduce IAO (Input-Action-Output) prompting, a structured template-based method that explicitly models how LLMs access and apply their knowledge during complex reasoning tasks. IAO decomposes problems into sequential steps, each clearly identifying the input knowledge being used, the action being performed, and the resulting output. This structured decomposition enables us to trace knowledge flow, verify factual consistency, and identify potential knowledge gaps or misapplications. Through experiments across diverse reasoning tasks, we demonstrate that IAO not only improves zero-shot performance but also provides transparency in how LLMs leverage their stored knowledge. Human evaluation confirms that this structured approach enhances our ability to verify knowledge utilization and detect potential hallucinations or reasoning errors. Our findings provide insights into both knowledge representation within LLMs and methods for more reliable knowledge application.
CLJan 22, 2024
Unsupervised Learning of Graph from RecipesAissatou Diallo, Antonis Bikakis, Luke Dickens et al.
Cooking recipes are one of the most readily available kinds of procedural text. They consist of natural language instructions that can be challenging to interpret. In this paper, we propose a model to identify relevant information from recipes and generate a graph to represent the sequence of actions in the recipe. In contrast with other approaches, we use an unsupervised approach. We iteratively learn the graph structure and the parameters of a $\mathsf{GNN}$ encoding the texts (text-to-graph) one sequence at a time while providing the supervision by decoding the graph into text (graph-to-text) and comparing the generated text to the input. We evaluate the approach by comparing the identified entities with annotated datasets, comparing the difference between the input and output texts, and comparing our generated graphs with those generated by state of the art methods.
AISep 17, 2021
Repurposing of Resources: from Everyday Problem Solving through to Crisis ManagementAntonis Bikakis, Luke Dickens, Anthony Hunter et al.
The human ability to repurpose objects and processes is universal, but it is not a well-understood aspect of human intelligence. Repurposing arises in everyday situations such as finding substitutes for missing ingredients when cooking, or for unavailable tools when doing DIY. It also arises in critical, unprecedented situations needing crisis management. After natural disasters and during wartime, people must repurpose the materials and processes available to make shelter, distribute food, etc. Repurposing is equally important in professional life (e.g. clinicians often repurpose medicines off-license) and in addressing societal challenges (e.g. finding new roles for waste products,). Despite the importance of repurposing, the topic has received little academic attention. By considering examples from a variety of domains such as every-day activities, drug repurposing and natural disasters, we identify some principle characteristics of the process and describe some technical challenges that would be involved in modelling and simulating it. We consider cases of both substitution, i.e. finding an alternative for a missing resource, and exploitation, i.e. identifying a new role for an existing resource. We argue that these ideas could be developed into general formal theory of repurposing, and that this could then lead to the development of AI methods based on commonsense reasoning, argumentation, ontological reasoning, and various machine learning methods, to develop tools to support repurposing in practice.
SPApr 1, 2021
Reservoir Based Edge Training on RF Data To Deliver Intelligent and Efficient IoT Spectrum SensorsSilvija Kokalj-Filipovic, Paul Toliver, William Johnson et al.
Current radio frequency (RF) sensors at the Edge lack the computational resources to support practical, in-situ training for intelligent spectrum monitoring, and sensor data classification in general. We propose a solution via Deep Delay Loop Reservoir Computing (DLR), a processing architecture that supports general machine learning algorithms on compact mobile devices by leveraging delay-loop reservoir computing in combination with innovative electrooptical hardware. With both digital and photonic realizations of our design of the loops, DLR delivers reductions in form factor, hardware complexity and latency, compared to the State-of-the-Art (SoA). The main impact of the reservoir is to project the input data into a higher dimensional space of reservoir state vectors in order to linearly separate the input classes. Once the classes are well separated, traditionally complex, power-hungry classification models are no longer needed for the learning process. Yet, even with simple classifiers based on Ridge regression (RR), the complexity grows at least quadratically with the input size. Hence, the hardware reduction required for training on compact devices is in contradiction with the large dimension of state vectors. DLR employs a RR-based classifier to exceed the SoA accuracy, while further reducing power consumption by leveraging the architecture of parallel (split) loops. We present DLR architectures composed of multiple smaller loops whose state vectors are linearly combined to create a lower dimensional input into Ridge regression. We demonstrate the advantages of using DLR for two distinct applications: RF Specific Emitter Identification (SEI) for IoT authentication, and wireless protocol recognition for IoT situational awareness.
LGApr 1, 2021
Reservoir-Based Distributed Machine Learning for Edge OperationSilvija Kokalj-Filipovic, Paul Toliver, William Johnson et al.
We introduce a novel design for in-situ training of machine learning algorithms built into smart sensors, and illustrate distributed training scenarios using radio frequency (RF) spectrum sensors. Current RF sensors at the Edge lack the computational resources to support practical, in-situ training for intelligent signal classification. We propose a solution using Deepdelay Loop Reservoir Computing (DLR), a processing architecture that supports machine learning algorithms on resource-constrained edge-devices by leveraging delayloop reservoir computing in combination with innovative hardware. DLR delivers reductions in form factor, hardware complexity and latency, compared to the State-ofthe- Art (SoA) neural nets. We demonstrate DLR for two applications: RF Specific Emitter Identification (SEI) and wireless protocol recognition. DLR enables mobile edge platforms to authenticate and then track emitters with fast SEI retraining. Once delay loops separate the data classes, traditionally complex, power-hungry classification models are no longer needed for the learning process. Yet, even with simple classifiers such as Ridge Regression (RR), the complexity grows at least quadratically with the input size. DLR with a RR classifier exceeds the SoA accuracy, while further reducing power consumption by leveraging the architecture of parallel (split) loops. To authenticate mobile devices across large regions, DLR can be trained in a distributed fashion with very little additional processing and a small communication cost, all while maintaining accuracy. We illustrate how to merge locally trained DLR classifiers in use cases of interest.
CLOct 5, 2020
On the Effects of Knowledge-Augmented Data in Word EmbeddingsDiego Ramirez-Echavarria, Antonis Bikakis, Luke Dickens et al.
This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains with differing language distributions or usages. We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings that enforce semantic relationships from the data, and systematically evaluate the impact it has on the resulting representations. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
NIApr 13, 2019
AutoEncoders for Training Compact Deep Learning RF Classifiers for Wireless ProtocolsSilvija Kokalj-Filipovic, Rob Miller, Joshua Morman
We show that compact fully connected (FC) deep learning networks trained to classify wireless protocols using a hierarchy of multiple denoising autoencoders (AEs) outperform reference FC networks trained in a typical way, i.e., with a stochastic gradient based optimization of a given FC architecture. Not only is the complexity of such FC network, measured in number of trainable parameters and scalar multiplications, much lower than the reference FC and residual models, its accuracy also outperforms both models for nearly all tested SNR values (0 dB to 50dB). Such AE-trained networks are suited for in-situ protocol inference performed by simple mobile devices based on noisy signal measurements. Training is based on the data transmitted by real devices, and collected in a controlled environment, and systematically augmented by a policy-based data synthesis process by adding to the signal any subset of impairments commonly seen in a wireless receiver.
SPFeb 16, 2019
Mitigation of Adversarial Examples in RF Deep Classifiers Utilizing AutoEncoder Pre-trainingSilvija Kokalj-Filipovic, Rob Miller, Nicholas Chang et al.
Adversarial examples in machine learning for images are widely publicized and explored. Illustrations of misclassifications caused by slightly perturbed inputs are abundant and commonly known (e.g., a picture of panda imperceptibly perturbed to fool the classifier into incorrectly labeling it as a gibbon). Similar attacks on deep learning (DL) for radio frequency (RF) signals and their mitigation strategies are scarcely addressed in the published work. Yet, RF adversarial examples (AdExs) with minimal waveform perturbations can cause drastic, targeted misclassification results, particularly against spectrum sensing/survey applications (e.g. BPSK is mistaken for 8-PSK). Our research on deep learning AdExs and proposed defense mechanisms are RF-centric, and incorporate physical world, over-the-air (OTA) effects. We herein present defense mechanisms based on pre-training the target classifier using an autoencoder. Our results validate this approach as a viable mitigation method to subvert adversarial attacks against deep learning-based communications and radar sensing systems.
LGFeb 16, 2019
Adversarial Examples in RF Deep Learning: Detection of the Attack and its Physical RobustnessSilvija Kokalj-Filipovic, Rob Miller
While research on adversarial examples in machine learning for images has been prolific, similar attacks on deep learning (DL) for radio frequency (RF) signals and their mitigation strategies are scarcely addressed in the published work, with only one recent publication in the RF domain [1]. RF adversarial examples (AdExs) can cause drastic, targeted misclassification results mostly in spectrum sensing/ survey applications (e.g. BPSK mistaken for 8-PSK) with minimal waveform perturbation. It is not clear if the RF AdExs maintain their effects in the physical world, i.e., when AdExs are delivered over-the-air (OTA). Our research on deep learning AdExs and proposed defense mechanisms are RF-centric, and incorporate physical world, OTA effects. We here present defense mechanisms based on statistical tests. One test to detect AdExs utilizes Peak-to- Average-Power-Ratio (PAPR) of the DL data points delivered OTA, while another statistical test uses the Softmax outputs of the DL classifier, which corresponds to the probabilities the classifier assigns to each of the trained classes. The former test leverages the RF nature of the data, and the latter is universally applicable to AdExs regardless of their origin. Both solutions are shown as viable mitigation methods to subvert adversarial attacks against communications and radar sensing systems.
AIMar 20, 2017
Foundations for a Probabilistic Event CalculusFabio Aurelio D'Asaro, Antonis Bikakis, Luke Dickens et al.
We present PEC, an Event Calculus (EC) style action language for reasoning about probabilistic causal and narrative information. It has an action language style syntax similar to that of the EC variant Modular-E. Its semantics is given in terms of possible worlds which constitute possible evolutions of the domain, and builds on that of EFEC, an epistemic extension of EC. We also describe an ASP implementation of PEC and show the sense in which this is sound and complete.
HCSep 23, 2014
Attendee-Sourcing: Exploring The Design Space of Community-Informed Conference SchedulingAnant Bhardwaj, Juho Kim, Steven Dow et al.
Constructing a good conference schedule for a large multi-track conference needs to take into account the preferences and constraints of organizers, authors, and attendees. Creating a schedule which has fewer conflicts for authors and attendees, and thematically coherent sessions is a challenging task. Cobi introduced an alternative approach to conference scheduling by engaging the community to play an active role in the planning process. The current Cobi pipeline consists of committee-sourcing and author-sourcing to plan a conference schedule. We further explore the design space of community-sourcing by introducing attendee-sourcing -- a process that collects input from conference attendees and encodes them as preferences and constraints for creating sessions and schedule. For CHI 2014, a large multi-track conference in human-computer interaction with more than 3,000 attendees and 1,000 authors, we collected attendees' preferences by making available all the accepted papers at the conference on a paper recommendation tool we built called Confer, for a period of 45 days before announcing the conference program (sessions and schedule). We compare the preferences marked on Confer with the preferences collected from Cobi's author-sourcing approach. We show that attendee-sourcing can provide insights beyond what can be discovered by author-sourcing. For CHI 2014, the results show value in the method and attendees' participation. It produces data that provides more alternatives in scheduling and complements data collected from other methods for creating coherent sessions and reducing conflicts.
AIJul 14, 2014
Non-Monotonic Reasoning and Story ComprehensionIrene-Anna Diakidoy, Antonis Kakas, Loizos Michael et al.
This paper develops a Reasoning about Actions and Change framework integrated with Default Reasoning, suitable as a Knowledge Representation and Reasoning framework for Story Comprehension. The proposed framework, which is guided strongly by existing knowhow from the Psychology of Reading and Comprehension, is based on the theory of argumentation from AI. It uses argumentation to capture appropriate solutions to the frame, ramification and qualification problems and generalizations of these problems required for text comprehension. In this first part of the study the work concentrates on the central problem of integration (or elaboration) of the explicit information from the narrative in the text with the implicit (in the readers mind) common sense world knowledge pertaining to the topic(s) of the story given in the text. We also report on our empirical efforts to gather background common sense world knowledge used by humans when reading a story and to evaluate, through a prototype system, the ability of our approach to capture both the majority and the variability of understanding of a story by the human readers in the experiments.