LGMay 23
Assessing Region-Level EEG Contributions to Cognitive Workload PredictionJacob Wong, Sohan Singh, Prannaya Gupta et al.
Accurate and generalizable estimation of cognitive workload from electroencephalography (EEG) is critical for human-centered and safety-critical systems. Although EEG is widely used for workload assessment, the consistency of region-level EEG contributions across tasks, datasets, and subjects remains unclear. This paper presents a region-level evaluation framework for EEG-based workload prediction in which models are trained and evaluated using features extracted exclusively from electrodes belonging to anatomically defined scalp regions. We perform a large-scale analysis across four publicly available EEG workload datasets spanning diverse task demands, recording hardware, and electrode montages. Region importance is quantified using a model-agnostic, performance-based approach under both mixed-subject and subject-independent evaluation protocols, with results aggregated using a rank-based strategy to ensure robustness across experimental configurations. Across all datasets and subject-independent evaluations, frontal electrode groups outperform the full-scalp baseline by approximately 15-20% in relative rank position while using substantially fewer electrodes. Fronto-central regions exhibit the most stable predictive utility, whereas posterior and occipital regions contribute less consistently across experimental conditions. These findings indicate that workload-relevant EEG information is most consistently retained within frontal and fronto-central electrode groups, supporting the design of efficient and generalizable EEG-based workload monitoring systems.
LGOct 5, 2025Code
Learning More with Less: A Generalizable, Self-Supervised Framework for Privacy-Preserving Capacity Estimation with EV Charging DataAnushiya Arunan, Yan Qin, Xiaoli Li et al.
Accurate battery capacity estimation is key to alleviating consumer concerns about battery performance and reliability of electric vehicles (EVs). However, practical data limitations imposed by stringent privacy regulations and labeled data shortages hamper the development of generalizable capacity estimation models that remain robust to real-world data distribution shifts. While self-supervised learning can leverage unlabeled data, existing techniques are not particularly designed to learn effectively from challenging field data -- let alone from privacy-friendly data, which are often less feature-rich and noisier. In this work, we propose a first-of-its-kind capacity estimation model based on self-supervised pre-training, developed on a large-scale dataset of privacy-friendly charging data snippets from real-world EV operations. Our pre-training framework, snippet similarity-weighted masked input reconstruction, is designed to learn rich, generalizable representations even from less feature-rich and fragmented privacy-friendly data. Our key innovation lies in harnessing contrastive learning to first capture high-level similarities among fragmented snippets that otherwise lack meaningful context. With our snippet-wise contrastive learning and subsequent similarity-weighted masked reconstruction, we are able to learn rich representations of both granular charging patterns within individual snippets and high-level associative relationships across different snippets. Bolstered by this rich representation learning, our model consistently outperforms state-of-the-art baselines, achieving 31.9% lower test error than the best-performing benchmark, even under challenging domain-shifted settings affected by both manufacturer and age-induced distribution shifts. Source code is available at https://github.com/en-research/GenEVBattery.
ROApr 28, 2025
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied TasksChia-Yu Hung, Qi Sun, Pengfei Hong et al.
Existing Visual-Language-Action (VLA) models have shown promising performance in zero-shot scenarios, demonstrating impressive task execution and reasoning capabilities. However, a significant challenge arises from the limitations of visual encoding, which can result in failures during tasks such as object grasping. Moreover, these models typically suffer from high computational overhead due to their large sizes, often exceeding 7B parameters. While these models excel in reasoning and task planning, the substantial computational overhead they incur makes them impractical for real-time robotic environments, where speed and efficiency are paramount. To address the limitations of existing VLA models, we propose NORA, a 3B-parameter model designed to reduce computational overhead while maintaining strong task performance. NORA adopts the Qwen-2.5-VL-3B multimodal model as its backbone, leveraging its superior visual-semantic understanding to enhance visual reasoning and action grounding. Additionally, our \model{} is trained on 970k real-world robot demonstrations and equipped with the FAST+ tokenizer for efficient action sequence generation. Experimental results demonstrate that NORA outperforms existing large-scale VLA models, achieving better task performance with significantly reduced computational overhead, making it a more practical solution for real-time robotic autonomy.
RODec 16, 2024
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial ReasoningQi Sun, Pengfei Hong, Tej Deep Pala et al. · deepmind
Traditional reinforcement learning-based robotic control methods are often task-specific and fail to generalize across diverse environments or unseen objects and instructions. Visual Language Models (VLMs) demonstrate strong scene understanding and planning capabilities but lack the ability to generate actionable policies tailored to specific robotic embodiments. To address this, Visual-Language-Action (VLA) models have emerged, yet they face challenges in long-horizon spatial reasoning and grounded task planning. In this work, we propose the Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning, Emma-X. Emma-X leverages our constructed hierarchical embodiment dataset based on BridgeV2, containing 60,000 robot manipulation trajectories auto-annotated with grounded task reasoning and spatial guidance. Additionally, we introduce a trajectory segmentation strategy based on gripper states and motion trajectories, which can help mitigate hallucination in grounding subtask reasoning generation. Experimental results demonstrate that Emma-X achieves superior performance over competitive baselines, particularly in real-world robotic tasks requiring spatial reasoning.
RODec 9, 2024
A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPOLeon Fernando, Billy Pik Lik Lau, Chau Yuen et al.
The rapid advancements in unmanned aerial vehicles (UAVs) have unlocked numerous applications, including environmental monitoring, disaster response, and agricultural surveying. Enhancing the collective behavior of multiple decentralized UAVs can significantly improve these applications through more efficient and coordinated operations. In this study, we explore a Recurrent PPO model for target localization in perceptually degraded environments like places without GNSS/GPS signals. We first developed a single-drone approach for target identification, followed by a decentralized two-drone model. Our approach can utilize two types of sensors on the UAVs, a detection sensor and a target signal sensor. The single-drone model achieved an accuracy of 93%, while the two-drone model achieved an accuracy of 86%, with the latter requiring fewer average steps to locate the target. This demonstrates the potential of our method in UAV swarms, offering efficient and effective localization of radiant targets in complex environmental conditions.
RONov 24, 2025
CNN-Based Camera Pose Estimation and Localisation of Scan Images for Aircraft Visual InspectionXueyan Oh, Leonard Loh, Shaohui Foong et al.
General Visual Inspection is a manual inspection process regularly used to detect and localise obvious damage on the exterior of commercial aircraft. There has been increasing demand to perform this process at the boarding gate to minimise the downtime of the aircraft and automating this process is desired to reduce the reliance on human labour. Automating this typically requires estimating a camera's pose with respect to the aircraft for initialisation but most existing localisation methods require infrastructure, which is very challenging in uncontrolled outdoor environments and within the limited turnover time (approximately 2 hours) on an airport tarmac. Additionally, many airlines and airports do not allow contact with the aircraft's surface or using UAVs for inspection between flights, and restrict access to commercial aircraft. Hence, this paper proposes an on-site method that is infrastructure-free and easy to deploy for estimating a pan-tilt-zoom camera's pose and localising scan images. This method initialises using the same pan-tilt-zoom camera used for the inspection task by utilising a Deep Convolutional Neural Network fine-tuned on only synthetic images to predict its own pose. We apply domain randomisation to generate the dataset for fine-tuning the network and modify its loss function by leveraging aircraft geometry to improve accuracy. We also propose a workflow for initialisation, scan path planning, and precise localisation of images captured from a pan-tilt-zoom camera. We evaluate and demonstrate our approach through experiments with real aircraft, achieving root-mean-square camera pose estimation errors of less than 0.24 m and 2 degrees for all real scenes.
ROOct 13, 2021
Collaborative Radio SLAM for Multiple Robots based on WiFi Fingerprint SimilarityRan Liu, Zhenghong Qin, Hua Zhang et al.
Simultaneous Localization and Mapping (SLAM) enables autonomous robots to navigate and execute their tasks through unknown environments. However, performing SLAM in large environments with a single robot is not efficient, and visual or LiDAR-based SLAM requires feature extraction and matching algorithms, which are computationally expensive. In this paper, we present a collaborative SLAM approach with multiple robots using the pervasive WiFi radio signals. A centralized solution is proposed to optimize the trajectory based on the odometry and radio fingerprints collected from multiple robots. To improve the localization accuracy, a novel similarity model is introduced that combines received signal strength (RSS) and detection likelihood of an access point (AP). We perform extensive experiments to demonstrate the effectiveness of the proposed similarity model and collaborative SLAM framework.
ROJul 19, 2021
Relative Localization of Mobile Robots with Multiple Ultra-WideBand Ranging MeasurementsZhiqiang Cao, Ran Liu, Chau Yuen et al.
Relative localization between autonomous robots without infrastructure is crucial to achieve their navigation, path planning, and formation in many applications, such as emergency response, where acquiring a prior knowledge of the environment is not possible. The traditional Ultra-WideBand (UWB)-based approach provides a good estimation of the distance between the robots, but obtaining the relative pose (including the displacement and orientation) remains challenging. We propose an approach to estimate the relative pose between a group of robots by equipping each robot with multiple UWB ranging nodes. We determine the pose between two robots by minimizing the residual error of the ranging measurements from all UWB nodes. To improve the localization accuracy, we propose to utilize the odometry constraints through a sliding window-based optimization. The optimized pose is then fused with the odometry in a particle filtering for pose tracking among a group of mobile robots. We have conducted extensive experiments to validate the effectiveness of the proposed approach.
NINov 30, 2019
Collaborative SLAM based on Wifi Fingerprint Similarity and Motion InformationRan Liu, Sumudu Hasala Marakkalage, Madhushanka Padmal et al.
Simultaneous localization and mapping (SLAM) has been extensively researched in past years particularly with regard to range-based or visual-based sensors. Instead of deploying dedicated devices that use visual features, it is more pragmatic to exploit the radio features to achieve this task, due to their ubiquitous nature and the widespread deployment of Wi-Fi wireless network. This paper presents a novel approach for collaborative simultaneous localization and radio fingerprint mapping (C-SLAM-RF) in large unknown indoor environments. The proposed system uses received signal strengths (RSS) from Wi-Fi access points (AP) in the existing infrastructure and pedestrian dead reckoning (PDR) from a smart phone, without a prior knowledge about map or distribution of AP in the environment. We claim a loop closure based on the similarity of the two radio fingerprints. To further improve the performance, we incorporate the turning motion and assign a small uncertainty value to a loop closure if a matched turning is identified. The experiment was done in an area of 130 meters by 70 meters and the results show that our proposed system is capable of estimating the tracks of four users with an accuracy of 0.6 meters with Tango-based PDR and 4.76 meters with a step counter-based PDR.
ROApr 26, 2019
Crowd-sensing Simultaneous Localization and Radio Fingerprint Mapping based on Probabilistic Similarity ModelsRan Liu, Sumudu Hasala Marakkalage, Madhushanka Padmal et al.
Simultaneous localization and mapping (SLAM) has been richly researched in past years particularly with regard to range-based or visual-based sensors. Instead of deploying dedicated devices that use visual features, it is more pragmatic to exploit the radio features to achieve this task, due to their ubiquitous nature and the wide deployment of Wifi wireless network. In this paper, we present a novel approach for crowd-sensing simultaneous localization and radio fingerprint mapping (C-SLAM-RF) in large unknown indoor environments. The proposed system makes use of the received signal strength (RSS) from surrounding Wifi access points (AP) and the motion tracking data from a smart phone (Tango as an example). These measurements are captured duration the walking of multiple users in unknown environments without map information and location of the AP. The experiments were done in a university building with dynamic environment and the results show that the proposed system is capable of estimating the tracks of a group of users with an accuracy of 1.74 meters when compared to the ground truth acquired from a point cloud-based SLAM.