CVAug 16, 2024Code
Deep Generative Classification of Blood Cell MorphologySimon Deltadahl, Julian Gilbey, Christine Van Laer et al.
Accurate classification of haematological cells is critical for diagnosing blood disorders, but presents significant challenges for machine automation owing to the complexity of cell morphology, heterogeneities of biological, pathological, and imaging characteristics, and the imbalance of cell type frequencies. We introduce CytoDiffusion, a diffusion-based classifier that effectively models blood cell morphology, combining accurate classification with robust anomaly detection, resistance to distributional shifts, interpretability, data efficiency, and superhuman uncertainty quantification. Our approach outperforms state-of-the-art discriminative models in anomaly detection (AUC 0.990 vs. 0.918), resistance to domain shifts (85.85% vs. 74.38% balanced accuracy), and performance in low-data regimes (95.88% vs. 94.95% balanced accuracy). Notably, our model generates synthetic blood cell images that are nearly indistinguishable from real images, as demonstrated by an authenticity test in which expert haematologists achieved only 52.3% accuracy (95% CI: [50.5%, 54.2%]) in distinguishing real from generated images. Furthermore, we enhance model explainability through the generation of directly interpretable counterfactual heatmaps. Our comprehensive evaluation framework, encompassing these multiple performance dimensions, establishes a new benchmark for medical image analysis in haematology, ultimately enabling improved diagnostic accuracy in clinical settings. Our code is available at https://github.com/CambridgeCIA/CytoDiffusion.
LGFeb 6, 2023
A Strong Baseline for Batch Imitation LearningMatthew Smith, Lucas Maystre, Zhenwen Dai et al.
Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our algorithm requires no additional hyper-parameter tuning beyond any standard batch reinforcement learning (RL) algorithm, making it an ideal baseline for such data-strict regimes. Furthermore, we provide formal sample complexity guarantees for the algorithm in finite Markov Decision Problems. In doing so, we formally demonstrate an unproven claim from Kearns & Singh (1998). On the empirical side, our contribution is twofold. First, we develop a practical, robust and principled evaluation protocol for offline RL methods, making use of only the dataset provided for model selection. This stands in contrast to the vast majority of previous works in offline RL, which tune hyperparameters on the evaluation environment, limiting the practical applicability when deployed in new, cost-critical environments. As such, we establish precedent for the development and fair evaluation of offline RL algorithms. Second, we evaluate our own algorithm on challenging continuous control benchmarks, demonstrating its practical applicability and competitiveness with state-of-the-art performance, despite being a simpler algorithm.
LGJul 30, 2019Code
Classi-Fly: Inferring Aircraft Categories from Open Data using Machine LearningMartin Strohmeier, Matthew Smith, Vincent Lenders et al.
In recent years, air traffic communication data has become easy to access, enabling novel research in many fields. Exploiting this new data source, a wide range of applications have emerged, from weather forecasting to stock market prediction, or the collection of information about military and government movements. Typically these applications require knowledge about the metadata of the aircraft, specifically its operator and the aircraft category. armasuisse Science + Technology, the R\&D agency for the Swiss Armed Forces, has been developing Classi-Fly, a novel approach to obtain metadata about aircraft based on their movement patterns. We validate Classi-Fly using several hundred thousand flights collected through open source means, in conjunction with ground truth from publicly available aircraft registries containing more than two million aircraft. Classi-Fly obtains the correct aircraft category with an accuracy of over 88%, demonstrating that it can improve the meta data necessary for applications working with air traffic communication. Finally, we show that it is feasible to automatically detect specific flights such as police and surveillance missions.
LGNov 16, 2017Code
Towards Deep Learning Models for Psychological State Prediction using Smartphone Data: Challenges and OpportunitiesGatis Mikelsons, Matthew Smith, Abhinav Mehrotra et al.
There is an increasing interest in exploiting mobile sensing technologies and machine learning techniques for mental health monitoring and intervention. Researchers have effectively used contextual information, such as mobility, communication and mobile phone usage patterns for quantifying individuals' mood and wellbeing. In this paper, we investigate the effectiveness of neural network models for predicting users' level of stress by using the location information collected by smartphones. We characterize the mobility patterns of individuals using the GPS metrics presented in the literature and employ these metrics as input to the network. We evaluate our approach on the open-source StudentLife dataset. Moreover, we discuss the challenges and trade-offs involved in building machine learning models for digital mental health and highlight potential future work in this direction.
CVDec 3, 2025
Real-time Cricket Sorting By SexJuan Manuel Cantarero Angulo, Matthew Smith
The global demand for sustainable protein sources is driving increasing interest in edible insects, with Acheta domesticus (house cricket) identified as one of the most suitable species for industrial production. Current farming practices typically rear crickets in mixed-sex populations without automated sex sorting, despite potential benefits such as selective breeding, optimized reproduction ratios, and nutritional differentiation. This work presents a low-cost, real-time system for automated sex-based sorting of Acheta domesticus, combining computer vision and physical actuation. The device integrates a Raspberry Pi 5 with the official Raspberry AI Camera and a custom YOLOv8 nano object detection model, together with a servo-actuated sorting arm. The model reached a mean Average Precision at IoU 0.5 (mAP@0.5) of 0.977 during testing, and real-world experiments with groups of crickets achieved an overall sorting accuracy of 86.8%. These results demonstrate the feasibility of deploying lightweight deep learning models on resource-constrained devices for insect farming applications, offering a practical solution to improve efficiency and sustainability in cricket production.
LGOct 14, 2025
Influence Dynamics and Stagewise Data AttributionJin Hwa Lee, Matthew Smith, Maxwell Adam et al.
Current training data attribution (TDA) methods treat the influence one sample has on another as static, but neural networks learn in distinct stages that exhibit changing patterns of influence. In this work, we introduce a framework for stagewise data attribution grounded in singular learning theory. We predict that influence can change non-monotonically, including sign flips and sharp peaks at developmental transitions. We first validate these predictions analytically and empirically in a toy model, showing that dynamic shifts in influence directly map to the model's progressive learning of a semantic hierarchy. Finally, we demonstrate these phenomena at scale in language models, where token-level influence changes align with known developmental stages.
CYAug 12, 2025
Toward an African Agenda for AI SafetySamuel T. Segun, Rachel Adams, Ana Florido et al.
This paper maps Africa's distinctive AI risk profile, from deepfake fuelled electoral interference and data colonial dependency to compute scarcity, labour disruption and disproportionate exposure to climate driven environmental costs. While major benefits are promised to accrue, the availability, development and adoption of AI also mean that African people and countries face particular AI safety risks, from large scale labour market disruptions to the nefarious use of AI to manipulate public opinion. To date, African perspectives have not been meaningfully integrated into global debates and processes regarding AI safety, leaving African stakeholders with limited influence over the emerging global AI safety governance agenda. While there are Computer Incident Response Teams on the continent, none hosts a dedicated AI Safety Institute or office. We propose a five-point action plan centred on (i) a policy approach that foregrounds the protection of the human rights of those most vulnerable to experiencing the harmful socio-economic effects of AI; (ii) the establishment of an African AI Safety Institute; (iii) promote public AI literacy and awareness; (iv) development of early warning system with inclusive benchmark suites for 25+ African languages; and (v) an annual AU-level AI Safety & Security Forum.
CVFeb 26, 2025
3D Nephrographic Image Synthesis in CT Urography with the Diffusion Model and Swin TransformerHongkun Yu, Syed Jamal Safdar Gardezi, E. Jason Abel et al.
Purpose: This study aims to develop and validate a method for synthesizing 3D nephrographic phase images in CT urography (CTU) examinations using a diffusion model integrated with a Swin Transformer-based deep learning approach. Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset comprising 327 patients who underwent three-phase CTU (mean $\pm$ SD age, 63 $\pm$ 15 years; 174 males, 153 females) was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom deep learning model coined dsSNICT (diffusion model with a Swin transformer for synthetic nephrographic phase images in CT) was developed and implemented to synthesize the nephrographic images. Performance was assessed using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Mean Absolute Error (MAE), and Fréchet Video Distance (FVD). Qualitative evaluation by two fellowship-trained abdominal radiologists was performed. Results: The synthetic nephrographic images generated by our proposed approach achieved high PSNR (26.3 $\pm$ 4.4 dB), SSIM (0.84 $\pm$ 0.069), MAE (12.74 $\pm$ 5.22 HU), and FVD (1323). Two radiologists provided average scores of 3.5 for real images and 3.4 for synthetic images (P-value = 0.5) on a Likert scale of 1-5, indicating that our synthetic images closely resemble real images. Conclusion: The proposed approach effectively synthesizes high-quality 3D nephrographic phase images. This model can be used to reduce radiation dose in CTU by 33.3\% without compromising image quality, which thereby enhances the safety and diagnostic utility of CT urography.
SDMay 21, 2021
Semi-Supervised Audio Representation Learning for Modeling Beehive StrengthsTony Zhang, Szymon Zmyslony, Sergei Nozdrenkov et al.
Honey bees are critical to our ecosystem and food security as a pollinator, contributing 35% of our global agriculture yield. In spite of their importance, beekeeping is exclusively dependent on human labor and experience-derived heuristics, while requiring frequent human checkups to ensure the colony is healthy, which can disrupt the colony. Increasingly, pollinator populations are declining due to threats from climate change, pests, environmental toxicity, making their management even more critical than ever before in order to ensure sustained global food security. To start addressing this pressing challenge, we developed an integrated hardware sensing system for beehive monitoring through audio and environment measurements, and a hierarchical semi-supervised deep learning model, composed of an audio modeling module and a predictor, to model the strength of beehives. The model is trained jointly on audio reconstruction and prediction losses based on human inspections, in order to model both low-level audio features and circadian temporal dynamics. We show that this model performs well despite limited labels, and can learn an audio embedding that is useful for characterizing different sound profiles of beehives. This is the first instance to our knowledge of applying audio-based deep learning to model beehives and population size in an observational setting across a large number of hives.
HCMar 7, 2021
Do you really code? Designing and Evaluating Screening Questions for Online Surveys with ProgrammersAnastasia Danilova, Alena Naiakshina, Stefan Horstmann et al.
Recruiting professional programmers in sufficient numbers for research studies can be challenging because they often cannot spare the time, or due to their geographical distribution and potentially the cost involved. Online platforms such as Clickworker or Qualtrics do provide options to recruit participants with programming skill; however, misunderstandings and fraud can be an issue. This can result in participants without programming skill taking part in studies and surveys. If these participants are not detected, they can cause detrimental noise in the survey data. In this paper, we develop screener questions that are easy and quick to answer for people with programming skill but difficult to answer correctly for those without. In order to evaluate our questionnaire for efficacy and efficiency, we recruited several batches of participants with and without programming skill and tested the questions. In our batch 42% of Clickworkers stating that they have programming skill did not meet our criteria and we would recommend filtering these from studies. We also evaluated the questions in an adversarial setting. We conclude with a set of recommended questions which researchers can use to recruit participants with programming skill from online platforms.
CROct 2, 2020
Understanding Realistic Attacks on Airborne Collision Avoidance SystemsMatthew Smith, Martin Strohmeier, Vincent Lenders et al.
Airborne collision avoidance systems provide an onboard safety net should normal air traffic control procedures fail to keep aircraft separated. These systems are widely deployed and have been constantly refined over the past three decades, usually in response to near misses or mid-air collisions. Recent years have seen security research increasingly focus on aviation, identifying that key wireless links---some of which are used in collision avoidance---are vulnerable to attack. In this paper, we go one step further to understand whether an attacker can remotely trigger false collision avoidance alarms. Primarily considering the next-generation Airborne Collision Avoidance System X (ACAS X), we adopt a modelling approach to extract attacker constraints from technical standards before simulating collision avoidance attacks against standardized ACAS X code. We find that in 44% of cases, an attacker can successfully trigger a collision avoidance alert which on average results in a 590 ft altitude deviation; when the aircraft is at lower altitudes, this success rate rises considerably to 79%. Furthermore, we show how our simulation approach can be used to help defend against attacks by identifying where attackers are most likely to be successful.
CRMay 20, 2019
Safety vs. Security: Attacking Avionic Systems with Humans in the LoopMatthew Smith, Martin Strohmeier, Jon Harman et al.
Many wireless communications systems found in aircraft lack standard security mechanisms, leaving them fundamentally vulnerable to attack. With affordable software-defined radios available, a novel threat has emerged, allowing a wide range of attackers to easily interfere with wireless avionic systems. Whilst these vulnerabilities are known, concrete attacks that exploit them are still novel and not yet well understood. This is true in particular with regards to their kinetic impact on the handling of the attacked aircraft and consequently its safety. To investigate this, we invited 30 Airbus A320 type-rated pilots to fly simulator scenarios in which they were subjected to attacks on their avionics. We implement and analyse novel wireless attacks on three safety-related systems: Traffic Collision Avoidance System (TCAS), Ground Proximity Warning System (GPWS) and the Instrument Landing System (ILS). We found that all three analysed attack scenarios created significant control impact and cost of disruption through turnarounds, avoidance manoeuvres, and diversions. They further increased workload, distrust in the affected system, and in 38% of cases caused the attacked safety system to be switched off entirely. All pilots felt the scenarios were useful, with 93.3% feeling that simulator training for wireless attacks could be valuable.
CRAug 29, 2017
Why Do Developers Get Password Storage Wrong? A Qualitative Usability StudyAlena Naiakshina, Anastasia Danilova, Christian Tiefenau et al.
Passwords are still a mainstay of various security systems, as well as the cause of many usability issues. For end-users, many of these issues have been studied extensively, highlighting problems and informing design decisions for better policies and motivating research into alternatives. However, end-users are not the only ones who have usability problems with passwords! Developers who are tasked with writing the code by which passwords are stored must do so securely. Yet history has shown that this complex task often fails due to human error with catastrophic results. While an end-user who selects a bad password can have dire consequences, the consequences of a developer who forgets to hash and salt a password database can lead to far larger problems. In this paper we present a first qualitative usability study with 20 computer science students to discover how developers deal with password storage and to inform research into aiding developers in the creation of secure password systems.
CVMay 23, 2017
Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer TissuesTalha Qaiser, Abhik Mukherjee, Chaitanya Reddy Pb et al.
Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variability. Given the prognostic and therapeutic implications of Her2 scoring, a more objective method is required. In this paper, we report on a recent automated Her2 scoring contest, held in conjunction with the annual PathSoc meeting held in Nottingham in June 2016, aimed at systematically comparing and advancing the state-of-the-art Artificial Intelligence (AI) based automated methods for Her2 scoring. The contest dataset comprised of digitised whole slide images (WSI) of sections from 86 cases of invasive breast carcinoma stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting algorithms automatically predicted scores of the IHC slides for an unseen subset of the dataset and the predicted scores were compared with the 'ground truth' (a consensus score from at least two experts). We also report on a simple Man vs Machine contest for the scoring of Her2 and show that the automated methods could beat the pathology experts on this contest dataset. This paper presents a benchmark for comparing the performance of automated algorithms for scoring of Her2. It also demonstrates the enormous potential of automated algorithms in assisting the pathologist with objective IHC scoring.
CRMay 19, 2017
Analyzing Privacy Breaches in the Aircraft Communications Addressing and Reporting System (ACARS)Matthew Smith, Daniel Moser, Martin Strohmeier et al.
The manner in which Aircraft Communications, Addressing and Reporting System (ACARS) is being used has significantly changed over time. Whilst originally used by commercial airliners to track their flights and provide automated timekeeping on crew, today it serves as a multi-purpose air-ground data link for many aviation stakeholders including private jet owners, state actors and military. Since ACARS messages are still mostly sent in the clear over a wireless channel, any sensitive information sent with ACARS can potentially lead to a privacy breach for users. Naturally, different stakeholders consider different types of data sensitive. In this paper we propose a privacy framework matching aviation stakeholders to a range of sensitive information types and assess the impact for each. Based on more than one million ACARS messages, collected over several months, we then demonstrate that current ACARS usage systematically breaches privacy for all stakeholder groups. We further support our findings with a number of cases of significant privacy issues for each group and analyze the impact of such leaks. While it is well-known that ACARS messages are susceptible to eavesdropping attacks, this work is the first to quantify the extent and impact of privacy leakage in the real world for the relevant aviation stakeholders.