CRJun 2
High-Precision APT Malware Attribution with Out-of-Scope ResiliencePeter Williams, Adam Sobey, Erisa Karafili
Early attribution of Advanced Persistent Threat (APT) activity can help defenders prioritise investigation, select countermeasures, and reduce the impact of an intrusion. Malware provides useful attribution evidence, but automated APT malware attribution remains difficult in practice. Existing approaches are typically trained and evaluated as closed-set classifiers over a limited number of known APT groups. In operational environments, however, classifiers are likely to encounter samples from groups not represented during training. Closed-set classifiers are then forced to assign such samples to known groups, producing unsupported and potentially misleading attributions. We present a high-precision APT malware attribution method based on ranked binary classifiers with explicit abstention. Rather than training a single multi-class classifier, our approach trains and tunes two binary classifiers per APT group, ranks the classifiers by validation performance, and applies them sequentially. A sample is attributed only when a classifier provides sufficient evidence; otherwise, it abstains. We evaluate the method on the APT Malware dataset and on a larger combined dataset designed to stress-test out-of-scope behaviour. On the APT Malware dataset, the method achieves higher precision than previously published results on the same dataset. In the most challenging setting, where 87% of test samples came from 60 APT groups excluded from training, the method abstained on 94% of out-of-scope samples while maintaining 92% precision and 95% selective accuracy on the samples it classified.
MAOct 5, 2022
From Intelligent Agents to Trustworthy Human-Centred Multiagent SystemsMohammad Divband Soorati, Enrico H. Gerding, Enrico Marchioni et al.
The Agents, Interaction and Complexity research group at the University of Southampton has a long track record of research in multiagent systems (MAS). We have made substantial scientific contributions across learning in MAS, game-theoretic techniques for coordinating agent systems, and formal methods for representation and reasoning. We highlight key results achieved by the group and elaborate on recent work and open research challenges in developing trustworthy autonomous systems and deploying human-centred AI systems that aim to support societal good.
AIMar 31
Computational Hermeneutics: Evaluating generative AI as a cultural technologyCody Kommers, Ruth Ahnert, Maria Antoniak et al.
Generative AI systems are increasingly recognized as cultural technologies, yet current evaluation frameworks often treat culture as a variable to be measured rather than fundamental to the system's operation. Drawing on hermeneutic theory from the humanities, we argue that GenAI systems function as "context machines" that must inherently address three interpretive challenges: situatedness (meaning only emerges in context), plurality (multiple valid interpretations coexist), and ambiguity (interpretations naturally conflict). We present computational hermeneutics as an emerging framework offering an interpretive account of what GenAI systems do, and how they might do it better. We offer three principles for hermeneutic evaluation -- that benchmarks should be iterative, not one-off; include people, not just machines; and measure cultural context, not just model output. This perspective offers a nascent paradigm for designing and evaluating contemporary AI systems: shifting from standardized questions about accuracy to contextual ones about meaning.
LGSep 5, 2024
CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement LearningJohn Birkbeck, Adam Sobey, Federico Cerutti et al.
Reinforcement learning (RL) agents are costly to train and fragile to environmental changes. They often perform poorly when there are many changing tasks, prohibiting their widespread deployment in the real world. Many Lifelong RL agent designs have been proposed to mitigate issues such as catastrophic forgetting or demonstrate positive characteristics like forward transfer when change occurs. However, no prior work has established whether the impact on agent performance can be predicted from the change itself. Understanding this relationship will help agents proactively mitigate a change's impact for improved learning performance. We propose Change-Induced Regret Proxy (CHIRP) metrics to link change to agent performance drops and use two environments to demonstrate a CHIRP's utility in lifelong learning. A simple CHIRP-based agent achieved $48\%$ higher performance than the next best method in one benchmark and attained the best success rates in 8 of 10 tasks in a second benchmark which proved difficult for existing lifelong RL agents.
CLMay 23, 2025
Meaning Is Not A Metric: Using LLMs to make cultural context legible at scaleCody Kommers, Drew Hemment, Maria Antoniak et al.
This position paper argues that large language models (LLMs) can make cultural context, and therefore human meaning, legible at an unprecedented scale in AI-based sociotechnical systems. We argue that such systems have previously been unable to represent human meaning because they rely on thin descriptions: numerical representations that enforce standardization and therefore strip human activity of the cultural context that gives it meaning. By contrast, scholars in the humanities and qualitative social sciences have developed frameworks for representing meaning through thick description: verbal representations that accommodate heterogeneity and retain contextual information needed to represent human meaning. While these methods can effectively codify meaning, they are difficult to deploy at scale. However, the verbal capabilities of LLMs now provide a means of (at least partially) automating the generation and processing of thick descriptions, potentially overcoming this bottleneck. We argue that the problem of rendering human meaning legible is not just about selecting better metrics, but about developing new representational formats (based on thick description). We frame this as a crucial direction for the application of generative AI and identify five key challenges: preserving context, maintaining interpretive pluralism, integrating perspectives based on lived experience and critical distance, distinguishing qualitative content from quantitative magnitude, and acknowledging meaning as dynamic rather than static. Furthermore, we suggest that thick description has the potential to serve as a unifying framework to address a number of emerging concerns about the difficulties of representing culture in (or using) LLMs.
LGSep 25, 2025
It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RLMadeleine Dwyer, Adam Sobey, Adriane Chapman
Training large language models (LLMs) with reinforcement learning (RL) methods such as PPO and GRPO commonly relies on ratio clipping to stabilise updates. While effective at preventing instability, clipping discards information and introduces gradient discontinuities. We propose Probability Smoothing Policy Optimisation (PSPO), which smooths the current policy's probabilities toward the old (behaviour) policy before computing the importance ratio, analogous to label smoothing. Unlike clipping, PSPO preserves gradient signal, while interpolation toward the old policy creates a soft trust region that discourages large, destabilising updates, with formal guarantees. We instantiate PSPO within GRPO (GR-PSPO) and fine-tune Qwen2.5-0.5B and Qwen2.5-1.5B on GSM8K, evaluating on GSM8K test and the cross-dataset generalisation on SVAMP, ASDiv, and MATH-500. Relative to unclipped GRPO (single iteration; no data reuse, ratio always = 1), GR-PSPO achieves similar performance but improves the reasoning leading to clearer and more concise responses which are more logical. Compared to clipped GRPO, GR-PSPO substantially improves performance both the 0.5B and 1.5B models, with a boost of over 20% on GSM8K (39.7% vs. 17.6% for 0.5B, 59.4% vs. 37.8% for 1.5B).
AIMay 21, 2025
HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement LearningKryspin Varys, Federico Cerutti, Adam Sobey et al.
Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value-alignment is to create agents that not only do their tasks but through their behaviours also promote these values. Many of the norms are written as laws or rules (legal / safety norms) but even more remain unwritten (social norms). Furthermore, the techniques used to represent these norms also differ. Safety / legal norms are often represented explicitly, for example, in some logical language while social norms are typically learned and remain hidden in the parameter space of a neural network. There is a lack of approaches in the literature that could combine these various norm representations into a single algorithm. We propose a novel method that integrates these norms into the reinforcement learning process. Our method monitors the agent's compliance with the given norms and summarizes it in a quantity we call the agent's reputation. This quantity is used to weigh the received rewards to motivate the agent to become value-aligned. We carry out a series of experiments including a continuous state space traffic problem to demonstrate the importance of the written and unwritten norms and show how our method can find the value-aligned policies. Furthermore, we carry out ablations to demonstrate why it is better to combine these two groups of norms rather than using either separately.