ROFeb 26
SignVLA: A Gloss-Free Vision-Language-Action Framework for Real-Time Sign Language-Guided Robotic ManipulationXinyu Tan, Ningwei Bai, Harry Gardener et al.
We present, to our knowledge, the first sign language-driven Vision-Language-Action (VLA) framework for intuitive and inclusive human-robot interaction. Unlike conventional approaches that rely on gloss annotations as intermediate supervision, the proposed system adopts a gloss-free paradigm and directly maps visual sign gestures to semantic instructions. This design reduces annotation cost and avoids the information loss introduced by gloss representations, enabling more natural and scalable multimodal interaction. In this work, we focus on a real-time alphabet-level finger-spelling interface that provides a robust and low-latency communication channel for robotic control. Compared with large-scale continuous sign language recognition, alphabet-level interaction offers improved reliability, interpretability, and deployment feasibility in safety-critical embodied environments. The proposed pipeline transforms continuous gesture streams into coherent language commands through geometric normalization, temporal smoothing, and lexical refinement, ensuring stable and consistent interaction. Furthermore, the framework is designed to support future integration of transformer-based gloss-free sign language models, enabling scalable word-level and sentence-level semantic understanding. Experimental results demonstrate the effectiveness of the proposed system in grounding sign-derived instructions into precise robotic actions under diverse interaction scenarios. These results highlight the potential of the framework to advance accessible, scalable, and multimodal embodied intelligence.
CVJan 28, 2024
Real-time object detection and robotic manipulation for agriculture using a YOLO-based learning approachHongyu Zhao, Zezhi Tang, Zhenhong Li et al.
The optimisation of crop harvesting processes for commonly cultivated crops is of great importance in the aim of agricultural industrialisation. Nowadays, the utilisation of machine vision has enabled the automated identification of crops, leading to the enhancement of harvesting efficiency, but challenges still exist. This study presents a new framework that combines two separate architectures of convolutional neural networks (CNNs) in order to simultaneously accomplish the tasks of crop detection and harvesting (robotic manipulation) inside a simulated environment. Crop images in the simulated environment are subjected to random rotations, cropping, brightness, and contrast adjustments to create augmented images for dataset generation. The you only look once algorithmic framework is employed with traditional rectangular bounding boxes for crop localization. The proposed method subsequently utilises the acquired image data via a visual geometry group model in order to reveal the grasping positions for the robotic manipulation.
LGJul 5, 2025
Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical SystemsJinwei Hu, Zezhi Tang, Xin Jin et al.
This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributions via global and local perspective. Its generalizability ensures applicability across diverse ICPS scenarios. This study specifically focuses on the Proton Exchange Membrane Fuel Cell system, chosen for its highly dynamic operational conditions, complex degradation mechanisms, and increasing integration into ICPS as a sustainable and efficient energy solution. Experimental results highlight HERO's ability to uncover vulnerabilities in even state-of-the-art PHM models, underscoring the critical need for enhanced robustness in real-world applications. By addressing these challenges, HERO demonstrates its potential to advance more resilient PHM systems across a wide range of ICPS domains.
LGApr 8, 2025
A temporal scale transformer framework for precise remaining useful life prediction in fuel cellsZezhi Tang, Xiaoyu Chen, Xin Jin et al.
In exploring Predictive Health Management (PHM) strategies for Proton Exchange Membrane Fuel Cells (PEMFC), the Transformer model, widely used in data-driven approaches, excels in many fields but struggles with time series analysis due to its self-attention mechanism, which yields a complexity of the input sequence squared and low computational efficiency. It also faces challenges in capturing both global long-term dependencies and local details effectively. To tackle this, we propose the Temporal Scale Transformer (TSTransformer), an enhanced version of the inverted Transformer (iTransformer). Unlike traditional Transformers that treat each timestep as an input token, TSTransformer maps sequences of varying lengths into tokens at different stages for inter-sequence modeling, using attention to capture multivariate correlations and feed-forward networks (FFN) to encode sequence representations. By integrating a one-dimensional convolutional layer into the multivariate attention for multi-level scaling of K and V matrices, it improves local feature extraction, captures temporal scale characteristics, and reduces token count and computational costs. Experiments comparing TSTransformer with models like Long Short-Term Memory, iTransformer, and Transformer demonstrate its potential as a powerful tool for advancing PHM in renewable energy, effectively addressing the limitations of pure Transformer models in data-driven time series tasks.
OCDec 5, 2025
Deep Reinforcement Learning Optimization for Uncertain Nonlinear Systems via Event-Triggered Robust Adaptive Dynamic ProgrammingNingwei Bai, Chi Pui Chan, Qichen Yin et al.
This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.