CVMay 20, 2022
Deep transfer learning for image classification: a surveyJo Plested, Musa Phiri, Tom Gedeon
Deep neural networks such as convolutional neural networks (CNNs) and transformers have achieved many successes in image classification in recent years. It has been consistently demonstrated that best practice for image classification is when large deep models can be trained on abundant labelled data. However there are many real world scenarios where the requirement for large amounts of training data to get the best performance cannot be met. In these scenarios transfer learning can help improve performance. To date there have been no surveys that comprehensively review deep transfer learning as it relates to image classification overall. However, several recent general surveys of deep transfer learning and ones that relate to particular specialised target image classification tasks have been published. We believe it is important for the future progress in the field that all current knowledge is collated and the overarching patterns analysed and discussed. In this survey we formally define deep transfer learning and the problem it attempts to solve in relation to image classification. We survey the current state of the field and identify where recent progress has been made. We show where the gaps in current knowledge are and make suggestions for how to progress the field to fill in these knowledge gaps. We present a new taxonomy of the applications of transfer learning for image classification. This taxonomy makes it easier to see overarching patterns of where transfer learning has been effective and, where it has failed to fulfill its potential. This also allows us to suggest where the problems lie and how it could be used more effectively. We show that under this new taxonomy, many of the applications where transfer learning has been shown to be ineffective or even hinder performance are to be expected when taking into account the source and target datasets and the techniques used.
CVJul 26, 2022
AMF: Adaptable Weighting Fusion with Multiple Fine-tuning for Image ClassificationXuyang Shen, Jo Plested, Sabrina Caldwell et al.
Fine-tuning is widely applied in image classification tasks as a transfer learning approach. It re-uses the knowledge from a source task to learn and obtain a high performance in target tasks. Fine-tuning is able to alleviate the challenge of insufficient training data and expensive labelling of new data. However, standard fine-tuning has limited performance in complex data distributions. To address this issue, we propose the Adaptable Multi-tuning method, which adaptively determines each data sample's fine-tuning strategy. In this framework, multiple fine-tuning settings and one policy network are defined. The policy network in Adaptable Multi-tuning can dynamically adjust to an optimal weighting to feed different samples into models that are trained using different fine-tuning strategies. Our method outperforms the standard fine-tuning approach by 1.69%, 2.79% on the datasets FGVC-Aircraft, and Describable Texture, yielding comparable performance on the datasets Stanford Cars, CIFAR-10, and Fashion-MNIST.
34.2QUANT-PHApr 12
Training single-electron and single-photon stochastic physical neural networksTong Dou, Shiro Kumara, Josh Burns et al.
The computational demands of deep learning motivate the investigation of alternative approaches to computation. One alternative is physical neural networks~(PNNs), in which learning and inference are performed directly via physical processes. Stochastic PNNs arise when the underlying neurons are realized by the dynamics of a stochastic activation switch. Here we propose novel electronic and photonic stochastic neurons. The electronic realization is implemented by single-electron tunneling through a quantum dot. The photonic realization is implemented via a single-photon source driving one of two modes coupled via a controllable beam-splitter-like interaction. In the electronic case, the charge state of the quantum dot forms the basis for the stochastic neuron, whereas in the photonic case the occupation of the undriven mode serves as the basis for the stochastic neuron. Training of stochastic PNNs is performed with models of stochastic neurons, as well as with coherently-driven, single-photon detector stochastic neurons previously introduced. Several training strategies for MNIST handwritten digit classification have been investigated using single-hidden-layer stochastic PNNs, including varying the number of trials in each layer to control forward pass stochasticity and employing either true probability or empirical outputs in the backward pass to evaluate their influence on gradient estimation. We show that when empirical outputs are used in the backward pass, the network achieves more than 97\% test accuracy with few trials per layer. Despite the simplicity of the model architecture, high test accuracy is maintained in the presence of a high degree of noise and model uncertainty. The results demonstrate the potential of embracing stochastic PNNs for deep learning.
CVDec 18, 2025
YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic ImagesHuma Hafeez, Matthew Garratt, Jo Plested et al.
The processing of omnidirectional 360-degree images poses significant challenges for object detection due to inherent spatial distortions, wide fields of view, and ultra-high-resolution inputs. Conventional detectors such as YOLO are optimised for standard image sizes (for example, 640x640 pixels) and often struggle with the computational demands of 4K or higher-resolution imagery typical of 360-degree vision. To address these limitations, we introduce YOLO11-4K, an efficient real-time detection framework tailored for 4K panoramic images. The architecture incorporates a novel multi-scale detection head with a P2 layer to improve sensitivity to small objects often missed at coarser scales, and a GhostConv-based backbone to reduce computational complexity without sacrificing representational power. To enable evaluation, we manually annotated the CVIP360 dataset, generating 6,876 frame-level bounding boxes and producing a publicly available, detection-ready benchmark for 4K panoramic scenes. YOLO11-4K achieves 0.95 mAP at 0.50 IoU with 28.3 milliseconds inference per frame, representing a 75 percent latency reduction compared to YOLO11 (112.3 milliseconds), while also improving accuracy (mAP at 0.50 of 0.95 versus 0.908). This balance of efficiency and precision enables robust object detection in expansive 360-degree environments, making the framework suitable for real-world high-resolution panoramic applications. While this work focuses on 4K omnidirectional images, the approach is broadly applicable to high-resolution detection tasks in autonomous navigation, surveillance, and augmented reality.
CRDec 29, 2025
Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side BenchmarkManu, Yi Guo, Kanchana Thilakarathna et al.
Large Language Models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This degrades answer quality, inflates latency and cost, and can be weaponized as a denial-of-service (DoS) attack. Recent work has begun to study DoS-style prompt attacks, but typically focuses on a single attack algorithm or assumes white-box access, without an attack-side benchmark that compares prompt-based attackers in a black-box, query-only regime with a known tokenizer. We introduce such a benchmark and study two prompt-only attackers. The first is an Evolutionary Over-Generation Prompt Search (EOGen) that searches the token space for prefixes that suppress EOS and induce long continuations. The second is a goal-conditioned reinforcement learning attacker (RL-GOAL) that trains a network to generate prefixes conditioned on a target length. To characterize behavior, we introduce Over-Generation Factor (OGF): the ratio of produced tokens to a model's context window, along with stall and latency summaries. EOGen discovers short-prefix attacks that raise Phi-3 to OGF = 1.39 +/- 1.14 (Success@>=2: 25.2%); RL-GOAL nearly doubles severity to OGF = 2.70 +/- 1.43 (Success@>=2: 64.3%) and drives budget-hit non-termination in 46% of trials.
CRJan 27
SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion AttacksNirhoshan Sivaroopan, Kanchana Thilakarathna, Albert Zomaya et al.
Sponge attacks increasingly threaten LLM systems by inducing excessive computation and DoS. Existing defenses either rely on statistical filters that fail on semantically meaningful attacks or use static LLM-based detectors that struggle to adapt as attack strategies evolve. We introduce SHIELD, a multi-agent, auto-healing defense framework centered on a three-stage Defense Agent that integrates semantic similarity retrieval, pattern matching, and LLM-based reasoning. Two auxiliary agents, a Knowledge Updating Agent and a Prompt Optimization Agent, form a closed self-healing loop, when an attack bypasses detection, the system updates an evolving knowledgebase, and refines defense instructions. Extensive experiments show that SHIELD consistently outperforms perplexity-based and standalone LLM defenses, achieving high F1 scores across both non-semantic and semantic sponge attacks, demonstrating the effectiveness of agentic self-healing against evolving resource-exhaustion threats.
QUANT-PHMay 6, 2025
Quantum Feature Space of a Qubit Coupled to an Arbitrary BathChris Wise, Akram Youssry, Alberto Peruzzo et al.
Qubit control protocols have traditionally leveraged a characterisation of the qubit-bath coupling via its power spectral density. Previous work proposed the inference of noise operators that characterise the influence of a classical bath using a grey-box approach that combines deep neural networks with physics-encoded layers. This overall structure is complex and poses challenges in scaling and real-time operations. Here, we show that no expensive neural networks are needed and that this noise operator description admits an efficient parameterisation. We refer to the resulting parameter space as the \textit{quantum feature space} of the qubit dynamics resulting from the coupled bath. We show that the Euclidean distance defined over the quantum feature space provides an effective method for classifying noise processes in the presence of a given set of controls. Using the quantum feature space as the input space for a simple machine learning algorithm (random forest, in this case), we demonstrate that it can effectively classify the stationarity and the broad class of noise processes perturbing a qubit. Finally, we explore how control pulse parameters map to the quantum feature space.
CVFeb 17, 2022
Developing Imperceptible Adversarial Patches to Camouflage Military Assets From Computer Vision Enabled TechnologiesChris Wise, Jo Plested
Convolutional neural networks (CNNs) have demonstrated rapid progress and a high level of success in object detection. However, recent evidence has highlighted their vulnerability to adversarial attacks. These attacks are calculated image perturbations or adversarial patches that result in object misclassification or detection suppression. Traditional camouflage methods are impractical when applied to disguise aircraft and other large mobile assets from autonomous detection in intelligence, surveillance and reconnaissance technologies and fifth generation missiles. In this paper we present a unique method that produces imperceptible patches capable of camouflaging large military assets from computer vision-enabled technologies. We developed these patches by maximising object detection loss whilst limiting the patch's colour perceptibility. This work also aims to further the understanding of adversarial examples and their effects on object detection algorithms.
NESep 8, 2021
Feature Selection on Thermal-stress DatasetXuyang Shen, Jo Plested, Tom Gedeon
Physical symptoms caused by high stress commonly happen in our daily lives, leading to the importance of stress recognition systems. This study aims to improve stress classification by selecting appropriate features from Thermal-stress data, ANUstressDB. We explored three different feature selection techniques: correlation analysis, magnitude measure, and genetic algorithm. Support Vector Machine (SVM) and Artificial Neural Network (ANN) models were involved in measuring these three algorithms. Our result indicates that the genetic algorithm combined with ANNs can improve the prediction accuracy by 19.1% compared to the baseline. Moreover, the magnitude measure performed best among the three feature selection algorithms regarding the balance of computation time and performance. These findings are likely to improve the accuracy of current stress recognition systems.
CVAug 23, 2021
Exploring Biases and Prejudice of Facial Synthesis via Semantic Latent SpaceXuyang Shen, Jo Plested, Sabrina Caldwell et al.
Deep learning (DL) models are widely used to provide a more convenient and smarter life. However, biased algorithms will negatively influence us. For instance, groups targeted by biased algorithms will feel unfairly treated and even fearful of negative consequences of these biases. This work targets biased generative models' behaviors, identifying the cause of the biases and eliminating them. We can (as expected) conclude that biased data causes biased predictions of face frontalization models. Varying the proportions of male and female faces in the training data can have a substantial effect on behavior on the test data: we found that the seemingly obvious choice of 50:50 proportions was not the best for this dataset to reduce biased behavior on female faces, which was 71% unbiased as compared to our top unbiased rate of 84%. Failure in generation and generating incorrect gender faces are two behaviors of these models. In addition, only some layers in face frontalization models are vulnerable to biased datasets. Optimizing the skip-connections of the generator in face frontalization models can make models less biased. We conclude that it is likely to be impossible to eliminate all training bias without an unlimited size dataset, and our experiments show that the bias can be reduced and quantified. We believe the next best to a perfect unbiased predictor is one that has minimized the remaining known bias.
CVJul 19, 2021
Non-binary deep transfer learning for image classificationJo Plested, Xuyang Shen, Tom Gedeon
The current standard for a variety of computer vision tasks using smaller numbers of labelled training examples is to fine-tune from weights pre-trained on a large image classification dataset such as ImageNet. The application of transfer learning and transfer learning methods tends to be rigidly binary. A model is either pre-trained or not pre-trained. Pre-training a model either increases performance or decreases it, the latter being defined as negative transfer. Application of L2-SP regularisation that decays the weights towards their pre-trained values is either applied or all weights are decayed towards 0. This paper re-examines these assumptions. Our recommendations are based on extensive empirical evaluation that demonstrate the application of a non-binary approach to achieve optimal results. (1) Achieving best performance on each individual dataset requires careful adjustment of various transfer learning hyperparameters not usually considered, including number of layers to transfer, different learning rates for different layers and different combinations of L2SP and L2 regularization. (2) Best practice can be achieved using a number of measures of how well the pre-trained weights fit the target dataset to guide optimal hyperparameters. We present methods for non-binary transfer learning including combining L2SP and L2 regularization and performing non-traditional fine-tuning hyperparameter searches. Finally we suggest heuristics for determining the optimal transfer learning hyperparameters. The benefits of using a non-binary approach are supported by final results that come close to or exceed state of the art performance on a variety of tasks that have traditionally been more difficult for transfer learning.
CVSep 13, 2020
Pairwise-GAN: Pose-based View Synthesis through Pair-Wise TrainingXuyang Shen, Jo Plested, Yue Yao et al.
Three-dimensional face reconstruction is one of the popular applications in computer vision. However, even state-of-the-art models still require frontal face as inputs, which restricts its usage scenarios in the wild. A similar dilemma also happens in face recognition. New research designed to recover the frontal face from a single side-pose facial image has emerged. The state-of-the-art in this area is the Face-Transformation generative adversarial network, which is based on the CycleGAN. This inspired our research which explores the performance of two models from pixel transformation in frontal facial synthesis, Pix2Pix and CycleGAN. We conducted the experiments on five different loss functions on Pix2Pix to improve its performance, then followed by proposing a new network Pairwise-GAN in frontal facial synthesis. Pairwise-GAN uses two parallel U-Nets as the generator and PatchGAN as the discriminator. The detailed hyper-parameters are also discussed. Based on the quantitative measurement by face similarity comparison, our results showed that Pix2Pix with L1 loss, gradient difference loss, and identity loss results in 2.72% of improvement at average similarity compared to the default Pix2Pix model. Additionally, the performance of Pairwise-GAN is 5.4% better than the CycleGAN and 9.1% than the Pix2Pix at average similarity.