Reilly Haskins

LG
h-index1
3papers
2citations
Novelty52%
AI Score41

3 Papers

LGMay 14
Training on Documents About Monitoring Leads to CoT Obfuscation

Reilly Haskins, Bilal Chughtai, Joshua Engels

Chain-of-thought (CoT) monitoring is one of the most promising tools we have for detecting model misbehavior, but its effectiveness depends on models faithfully externalizing their reasoning. Motivated by this vulnerability, we study whether monitor-aware models are capable of obfuscating their reasoning to evade detection. We use synthetic document finetuning to expose eight models to realistic pre-training-style documents describing a CoT monitor and find that monitor-aware models consistently achieve higher rates of undetected misbehavior compared to unaware controls. This effect is weaker but still present on a harder agentic task. We also show that CoT controllability, a model's ability to reshape its own reasoning trace under an imposed constraint, is closely correlated with obfuscation success across the eight models studied ($r=0.800$, $p=0.017$). Monitor-aware models placed under equal reinforcement learning optimization pressure also learn to reward-hack without triggering a CoT monitor substantially faster than unaware controls. Together, these results suggest that knowledge of monitoring combined with high CoT controllability poses a risk to CoT-based monitoring.

CVJul 25, 2024
Automated Romberg Test: Leveraging a CNN and Centre of Mass Analysis for Sensory Ataxia Diagnosis

Reilly Haskins, Richard Green

This paper proposes a novel method to diagnose sensory ataxia via an automated Romberg Test - the current de facto medical procedure used to diagnose this condition. It utilizes a convolutional neural network to predict joint locations, used for the calculation of various bio-mechanical markers such as the center of mass of the subject and various joint angles. This information is used in combination with data filtering techniques such as Kalman Filters, and center of mass analysis which helped make accurate inferences about the relative weight distribution in the lateral and anterior-posterior axes, and provide an objective, mathematically based diagnosis of this condition. In order to evaluate the performance of this method, testing was performed using dual weight scales and pre-annotated diagnosis videos taken from medical settings. These two methods both quantified the veritable weight distribution upon the ground surface with a ground truth and provided a real-world estimate of accuracy for the proposed method. A mean absolute error of 0.2912 percent was found for the calculated relative weight distribution difference, and an accuracy of 83.33 percent was achieved on diagnoses.

LGJul 5, 2025
KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis

Reilly Haskins, Benjamin Adams

Large Language Models (LLMs) frequently generate hallucinations: statements that are syntactically plausible but lack factual grounding. This research presents KEA (Kernel-Enriched AI) Explain: a neurosymbolic framework that detects and explains such hallucinations by comparing knowledge graphs constructed from LLM outputs with ground truth data from Wikidata or contextual documents. Using graph kernels and semantic clustering, the method provides explanations for detected hallucinations, ensuring both robustness and interpretability. Our framework achieves competitive accuracy in detecting hallucinations across both open- and closed-domain tasks, and is able to generate contrastive explanations, enhancing transparency. This research advances the reliability of LLMs in high-stakes domains and provides a foundation for future work on precision improvements and multi-source knowledge integration.