Jim Davies

AI
h-index34
5papers
38citations
Novelty32%
AI Score37

5 Papers

AIMar 28
CounterMoral: Editing Morals in Language Models

Michael Ripa, Jim Davies

Recent advancements in language model technology have significantly enhanced the ability to edit factual information. Yet, the modification of moral judgments, a crucial aspect of aligning models with human values, has garnered less attention. In this work, we introduce CounterMoral, a benchmark dataset crafted to assess how well current model editing techniques modify moral judgments across diverse ethical frameworks. We apply various editing techniques to multiple language models and evaluate their performance. Our findings contribute to the evaluation of language models designed to be ethical.

IVSep 8, 2025
Validation of a CT-brain analysis tool for measuring global cortical atrophy in older patient cohorts

Sukhdeep Bal, Emma Colbourne, Jasmine Gan et al.

Quantification of brain atrophy currently requires visual rating scales which are time consuming and automated brain image analysis is warranted. We validated our automated deep learning (DL) tool measuring the Global Cerebral Atrophy (GCA) score against trained human raters, and associations with age and cognitive impairment, in representative older (>65 years) patients. CT-brain scans were obtained from patients in acute medicine (ORCHARD-EPR), acute stroke (OCS studies) and a legacy sample. Scans were divided in a 60/20/20 ratio for training, optimisation and testing. CT-images were assessed by two trained raters (rater-1=864 scans, rater-2=20 scans). Agreement between DL tool-predicted GCA scores (range 0-39) and the visual ratings was evaluated using mean absolute error (MAE) and Cohen's weighted kappa. Among 864 scans (ORCHARD-EPR=578, OCS=200, legacy scans=86), MAE between the DL tool and rater-1 GCA scores was 3.2 overall, 3.1 for ORCHARD-EPR, 3.3 for OCS and 2.6 for the legacy scans and half had DL-predicted GCA error between -2 and 2. Inter-rater agreement was Kappa=0.45 between the DL-tool and rater-1, and 0.41 between the tool and rater- 2 whereas it was lower at 0.28 for rater-1 and rater-2. There was no difference in GCA scores from the DL-tool and the two raters (one-way ANOVA, p=0.35) or in mean GCA scores between the DL-tool and rater-1 (paired t-test, t=-0.43, p=0.66), the tool and rater-2 (t=1.35, p=0.18) or between rater-1 and rater-2 (t=0.99, p=0.32). DL-tool GCA scores correlated with age and cognitive scores (both p<0.001). Our DL CT-brain analysis tool measured GCA score accurately and without user input in real-world scans acquired from older patients. Our tool will enable extraction of standardised quantitative measures of atrophy at scale for use in health data research and will act as proof-of-concept towards a point-of-care clinically approved tool.

AIAug 19, 2018
Let CONAN tell you a story: Procedural quest generation

Vincent Breault, Sebastien Ouellet, Jim Davies

This work proposes an engine for the Creation Of Novel Adventure Narrative (CONAN), which is a procedural quest generator. It uses a planning approach to story generation. The engine is tested on its ability to create quests, which are sets of actions that must be performed in order to achieve a certain goal, usually for a reward. The engine takes in a world description represented as a set of facts, including characters, locations, and items, and generates quests according to the state of the world and the preferences of the characters. We evaluate quests through the classification of the motivations behind the quests, based on the sequences of actions required to complete the quests. We also compare different world descriptions and analyze the difference in motivations for the quests produced by the engine. Compared against human structural quest analysis, the current engine was found to be able to replicate the quest structures found in commercial video game quests.

CVAug 6, 2017
Image Quality Assessment Techniques Show Improved Training and Evaluation of Autoencoder Generative Adversarial Networks

Michael O. Vertolli, Jim Davies

We propose a training and evaluation approach for autoencoder Generative Adversarial Networks (GANs), specifically the Boundary Equilibrium Generative Adversarial Network (BEGAN), based on methods from the image quality assessment literature. Our approach explores a multidimensional evaluation criterion that utilizes three distance functions: an $l_1$ score, the Gradient Magnitude Similarity Mean (GMSM) score, and a chrominance score. We show that each of the different distance functions captures a slightly different set of properties in image space and, consequently, requires its own evaluation criterion to properly assess whether the relevant property has been adequately learned. We show that models using the new distance functions are able to produce better images than the original BEGAN model in predicted ways.

SEJan 1, 2013
Formal Model-Driven Engineering: Generating Data and Behavioural Components

Chen-Wei Wang, Jim Davies

Model-driven engineering is the automatic production of software artefacts from abstract models of structure and functionality. By targeting a specific class of system, it is possible to automate aspects of the development process, using model transformations and code generators that encode domain knowledge and implementation strategies. Using this approach, questions of correctness for a complex, software system may be answered through analysis of abstract models of lower complexity, under the assumption that the transformations and generators employed are themselves correct. This paper shows how formal techniques can be used to establish the correctness of model transformations used in the generation of software components from precise object models. The source language is based upon existing, formal techniques; the target language is the widely-used SQL notation for database programming. Correctness is established by giving comparable, relational semantics to both languages, and checking that the transformations are semantics-preserving.