76.9AIApr 20Code
Training and Agentic Inference Strategies for LLM-based Manim Animation GenerationRavidu Suien Rammuni Silva, Ahmad Lotfi, Isibor Kennedy Ihianle et al.
Generating programmatic animation using libraries such as Manim presents unique challenges for Large Language Models (LLMs), requiring spatial reasoning, temporal sequencing, and familiarity with domain-specific APIs that are underrepresented in general pre-training data. A systematic study of how training and inference strategies interact in this setting is lacking in current research. This study introduces ManimTrainer, a training pipeline that combines Supervised Fine-tuning (SFT) with Reinforcement Learning (RL) based Group Relative Policy Optimisation (GRPO) using a unified reward signal that fuses code and visual assessment signals, and ManimAgent, an inference pipeline featuring Renderer-in-the-loop (RITL) and API documentation-augmented RITL (RITL-DOC) strategies. Using these techniques, this study presents the first unified training and inference study for text-to-code-to-video transformation with Manim. It evaluates 17 open-source sub-30B LLMs across nine combinations of training and inference strategies using ManimBench. Results show that SFT generally improves code quality, while GRPO enhances visual outputs and increases the models' responsiveness to extrinsic signals during self-correction at inference time. The Qwen 3 Coder 30B model with GRPO and RITL-DOC achieved the highest overall performance, with a 94% Render Success Rate (RSR) and 85.7% Visual Similarity (VS) to reference videos, surpassing the baseline GPT-4.1 model by +3 percentage points in VS. Additionally, the analysis shows that the correlation between code and visual metrics strengthens with SFT and GRPO but weakens with inference-time enhancements, highlighting the complementary roles of training and agentic inference strategies in Manim animation generation.
CVJul 22, 2022
Taguchi based Design of Sequential Convolution Neural Network for Classification of Defective FastenersManjeet Kaur, Krishan Kumar Chauhan, Tanya Aggarwal et al.
Fasteners play a critical role in securing various parts of machinery. Deformations such as dents, cracks, and scratches on the surface of fasteners are caused by material properties and incorrect handling of equipment during production processes. As a result, quality control is required to ensure safe and reliable operations. The existing defect inspection method relies on manual examination, which consumes a significant amount of time, money, and other resources; also, accuracy cannot be guaranteed due to human error. Automatic defect detection systems have proven impactful over the manual inspection technique for defect analysis. However, computational techniques such as convolutional neural networks (CNN) and deep learning-based approaches are evolutionary methods. By carefully selecting the design parameter values, the full potential of CNN can be realised. Using Taguchi-based design of experiments and analysis, an attempt has been made to develop a robust automatic system in this study. The dataset used to train the system has been created manually for M14 size nuts having two labeled classes: Defective and Non-defective. There are a total of 264 images in the dataset. The proposed sequential CNN comes up with a 96.3% validation accuracy, 0.277 validation loss at 0.001 learning rate.
LGFeb 22, 2023
Mitigating Adversarial Attacks in Deepfake Detection: An Exploration of Perturbation and AI TechniquesSaminder Dhesi, Laura Fontes, Pedro Machado et al.
Deep learning constitutes a pivotal component within the realm of machine learning, offering remarkable capabilities in tasks ranging from image recognition to natural language processing. However, this very strength also renders deep learning models susceptible to adversarial examples, a phenomenon pervasive across a diverse array of applications. These adversarial examples are characterized by subtle perturbations artfully injected into clean images or videos, thereby causing deep learning algorithms to misclassify or produce erroneous outputs. This susceptibility extends beyond the confines of digital domains, as adversarial examples can also be strategically designed to target human cognition, leading to the creation of deceptive media, such as deepfakes. Deepfakes, in particular, have emerged as a potent tool to manipulate public opinion and tarnish the reputations of public figures, underscoring the urgent need to address the security and ethical implications associated with adversarial examples. This article delves into the multifaceted world of adversarial examples, elucidating the underlying principles behind their capacity to deceive deep learning algorithms. We explore the various manifestations of this phenomenon, from their insidious role in compromising model reliability to their impact in shaping the contemporary landscape of disinformation and misinformation. To illustrate progress in combating adversarial examples, we showcase the development of a tailored Convolutional Neural Network (CNN) designed explicitly to detect deepfakes, a pivotal step towards enhancing model robustness in the face of adversarial threats. Impressively, this custom CNN has achieved a precision rate of 76.2% on the DFDC dataset.
ARJul 13, 2022
Estimating the Power Consumption of Heterogeneous Devices when performing AI InferencePedro Machado, Ivica Matic, Francisco de Lemos et al.
Modern-day life is driven by electronic devices connected to the internet. The emerging research field of the Internet-of-Things (IoT) has become popular, just as there has been a steady increase in the number of connected devices. Since many of these devices are utilised to perform CV tasks, it is essential to understand their power consumption against performance. We report the power consumption profile and analysis of the NVIDIA Jetson Nano board while performing object classification. The authors present an extensive analysis regarding power consumption per frame and the output in frames per second using YOLOv5 models. The results show that the YOLOv5n outperforms other YOLOV5 variants in terms of throughput (i.e. 12.34 fps) and low power consumption (i.e. 0.154 mWh/frame).
CVJul 6, 2022
Deep Learning approach for Classifying Trusses and Runners of StrawberriesJakub Pomykala, Francisco de Lemos, Isibor Kennedy Ihianle et al.
The use of artificial intelligence in the agricultural sector has been growing at a rapid rate to automate farming activities. Emergent farming technologies focus on mapping and classification of plants, fruits, diseases, and soil types. Although, assisted harvesting and pruning applications using deep learning algorithms are in the early development stages, there is a demand for solutions to automate such processes. This paper proposes the use of Deep Learning for the classification of trusses and runners of strawberry plants using semantic segmentation and dataset augmentation. The proposed approach is based on the use of noises (i.e. Gaussian, Speckle, Poisson and Salt-and-Pepper) to artificially augment the dataset and compensate the low number of data samples and increase the overall classification performance. The results are evaluated using mean average of precision, recall and F1 score. The proposed approach achieved 91%, 95% and 92% on precision, recall and F1 score, respectively, for truss detection using the ResNet101 with dataset augmentation utilising Salt-and-Pepper noise; and 83%, 53% and 65% on precision, recall and F1 score, respectively, for truss detection using the ResNet50 with dataset augmentation utilising Poisson noise.
CVJul 6, 2022
Real-Time Gesture Recognition with Virtual Glove MarkersFinlay McKinnon, David Ada Adama, Pedro Machado et al.
Due to the universal non-verbal natural communication approach that allows for effective communication between humans, gesture recognition technology has been steadily developing over the previous few decades. Many different strategies have been presented in research articles based on gesture recognition to try to create an effective system to send non-verbal natural communication information to computers, using both physical sensors and computer vision. Hyper accurate real-time systems, on the other hand, have only recently began to occupy the study field, with each adopting a range of methodologies due to past limits such as usability, cost, speed, and accuracy. A real-time computer vision-based human-computer interaction tool for gesture recognition applications that acts as a natural user interface is proposed. Virtual glove markers on users hands will be created and used as input to a deep learning model for the real-time recognition of gestures. The results obtained show that the proposed system would be effective in real-time applications including social interaction through telepresence and rehabilitation.
CRJan 15, 2023
Secure Video Streaming Using Dedicated HardwareNicholas Murray-Hill, Laura Fontes, Pedro Machado et al.
Purpose: The purpose of this article is to present a system that enhances the security, efficiency, and reconfigurability of an Internet-of-Things (IoT) system used for surveillance and monitoring. Methods: A Multi-Processor System-On-Chip (MPSoC) composed of Central Processor Unit (CPU) and Field-Programmable Gate Array (FPGA) is proposed for increasing the security and the frame rate of a smart IoT edge device. The private encryption key is safely embedded in the FPGA unit to avoid being exposed in the Random Access Memory (RAM). This allows the edge device to securely store and authenticate the key, protecting the data transmitted from the same Integrated Circuit (IC). Additionally, the edge device can simultaneously publish and route a camera stream using a lightweight communication protocol, achieving a frame rate of 14 frames per Second (fps). The performance of the MPSoC is compared to a NVIDIA Jetson Nano (NJN) and a Raspberry Pi 4 (RPI4) and it is found that the RPI4 is the most cost-effective solution but with lower frame rate, the NJN is the fastest because it can achieve higher frame-rate but it is not secure, and the MPSoC is the optimal solution because it offers a balanced frame rate and it is secure because it never exposes the secure key into the memory. Results: The proposed system successfully addresses the challenges of security, scalability, and efficiency in an IoT system used for surveillance and monitoring. The encryption key is securely stored and authenticated, and the edge device is able to simultaneously publish and route a camera stream feed high-definition images at 14 fps.
CRNov 9, 2025
SteganoSNN: SNN-Based Audio-in-Image Steganography with EncryptionBiswajit Kumar Sahoo, Pedro Machado, Isibor Kennedy Ihianle et al.
Secure data hiding remains a fundamental challenge in digital communication, requiring a careful balance between computational efficiency and perceptual transparency. The balance between security and performance is increasingly fragile with the emergence of generative AI systems capable of autonomously generating and optimising sophisticated cryptanalysis and steganalysis algorithms, thereby accelerating the exposure of vulnerabilities in conventional data-hiding schemes. This work introduces SteganoSNN, a neuromorphic steganographic framework that exploits spiking neural networks (SNNs) to achieve secure, low-power, and high-capacity multimedia data hiding. Digitised audio samples are converted into spike trains using leaky integrate-and-fire (LIF) neurons, encrypted via a modulo-based mapping scheme, and embedded into the least significant bits of RGBA image channels using a dithering mechanism to minimise perceptual distortion. Implemented in Python using NEST and realised on a PYNQ-Z2 FPGA, SteganoSNN attains real-time operation with an embedding capacity of 8 bits per pixel. Experimental evaluations on the DIV2K 2017 dataset demonstrate image fidelity between 40.4 dB and 41.35 dB in PSNR and SSIM values consistently above 0.97, surpassing SteganoGAN in computational efficiency and robustness. SteganoSNN establishes a foundation for neuromorphic steganography, enabling secure, energy-efficient communication for Edge-AI, IoT, and biomedical applications.
AIDec 2, 2024
ArtBrain: An Explainable end-to-end Toolkit for Classification and Attribution of AI-Generated Art and StyleRavidu Suien Rammuni Silva, Ahmad Lotfi, Isibor Kennedy Ihianle et al.
Recently, the quality of artworks generated using Artificial Intelligence (AI) has increased significantly, resulting in growing difficulties in detecting synthetic artworks. However, limited studies have been conducted on identifying the authenticity of synthetic artworks and their source. This paper introduces AI-ArtBench, a dataset featuring 185,015 artistic images across 10 art styles. It includes 125,015 AI-generated images and 60,000 pieces of human-created artwork. This paper also outlines a method to accurately detect AI-generated images and trace them to their source model. This work proposes a novel Convolutional Neural Network model based on the ConvNeXt model called AttentionConvNeXt. AttentionConvNeXt was implemented and trained to differentiate between the source of the artwork and its style with an F1-Score of 0.869. The accuracy of attribution to the generative model reaches 0.999. To combine the scientific contributions arising from this study, a web-based application named ArtBrain was developed to enable both technical and non-technical users to interact with the model. Finally, this study presents the results of an Artistic Turing Test conducted with 50 participants. The findings reveal that humans could identify AI-generated images with an accuracy of approximately 58%, while the model itself achieved a significantly higher accuracy of around 99%.
ROMay 12, 2024
WeedScout: Real-Time Autonomous blackgrass Classification and Mapping using dedicated hardwareMatthew Gazzard, Helen Hicks, Isibor Kennedy Ihianle et al.
Blackgrass (Alopecurus myosuroides) is a competitive weed that has wide-ranging impacts on food security by reducing crop yields and increasing cultivation costs. In addition to the financial burden on agriculture, the application of herbicides as a preventive to blackgrass can negatively affect access to clean water and sanitation. The WeedScout project introduces a Real-Rime Autonomous Black-Grass Classification and Mapping (RT-ABGCM), a cutting-edge solution tailored for real-time detection of blackgrass, for precision weed management practices. Leveraging Artificial Intelligence (AI) algorithms, the system processes live image feeds, infers blackgrass density, and covers two stages of maturation. The research investigates the deployment of You Only Look Once (YOLO) models, specifically the streamlined YOLOv8 and YOLO-NAS, accelerated at the edge with the NVIDIA Jetson Nano (NJN). By optimising inference speed and model performance, the project advances the integration of AI into agricultural practices, offering potential solutions to challenges such as herbicide resistance and environmental impact. Additionally, two datasets and model weights are made available to the research community, facilitating further advancements in weed detection and precision farming technologies.
CVJul 23, 2025
Bearded Dragon Activity Recognition Pipeline: An AI-Based Approach to Behavioural MonitoringArsen Yermukan, Pedro Machado, Feliciano Domingos et al.
Traditional monitoring of bearded dragon (Pogona Viticeps) behaviour is time-consuming and prone to errors. This project introduces an automated system for real-time video analysis, using You Only Look Once (YOLO) object detection models to identify two key behaviours: basking and hunting. We trained five YOLO variants (v5, v7, v8, v11, v12) on a custom, publicly available dataset of 1200 images, encompassing bearded dragons (600), heating lamps (500), and crickets (100). YOLOv8s was selected as the optimal model due to its superior balance of accuracy (mAP@0.5:0.95 = 0.855) and speed. The system processes video footage by extracting per-frame object coordinates, applying temporal interpolation for continuity, and using rule-based logic to classify specific behaviours. Basking detection proved reliable. However, hunting detection was less accurate, primarily due to weak cricket detection (mAP@0.5 = 0.392). Future improvements will focus on enhancing cricket detection through expanded datasets or specialised small-object detectors. This automated system offers a scalable solution for monitoring reptile behaviour in controlled environments, significantly improving research efficiency and data quality.
CVDec 21, 2023
UDEEP: Edge-based Computer Vision for In-Situ Underwater Crayfish and Plastic DetectionDennis Monari, Jack Larkin, Pedro Machado et al.
Invasive signal crayfish have a detrimental impact on ecosystems. They spread the fungal-type crayfish plague disease (Aphanomyces astaci) that is lethal to the native white clawed crayfish, the only native crayfish species in Britain. Invasive signal crayfish extensively burrow, causing habitat destruction, erosion of river banks and adverse changes in water quality, while also competing with native species for resources and leading to declines in native populations. Moreover, pollution exacerbates the vulnerability of White-clawed crayfish, with their populations declining by over 90% in certain English counties, making them highly susceptible to extinction. To safeguard aquatic ecosystems, it is imperative to address the challenges posed by invasive species and discarded plastics in the United Kingdom's river ecosystem's. The UDEEP platform can play a crucial role in environmental monitoring by performing on-the-fly classification of Signal crayfish and plastic debris while leveraging the efficacy of AI, IoT devices and the power of edge computing (i.e., NJN). By providing accurate data on the presence, spread and abundance of these species, the UDEEP platform can contribute to monitoring efforts and aid in mitigating the spread of invasive species.