CVMar 21, 2024
Diffusion Attack: Leveraging Stable Diffusion for Naturalistic Image AttackingQianyu Guo, Jiaming Fu, Yawen Lu et al.
In Virtual Reality (VR), adversarial attack remains a significant security threat. Most deep learning-based methods for physical and digital adversarial attacks focus on enhancing attack performance by crafting adversarial examples that contain large printable distortions that are easy for human observers to identify. However, attackers rarely impose limitations on the naturalness and comfort of the appearance of the generated attack image, resulting in a noticeable and unnatural attack. To address this challenge, we propose a framework to incorporate style transfer to craft adversarial inputs of natural styles that exhibit minimal detectability and maximum natural appearance, while maintaining superior attack capabilities.
ROMay 20, 2025
Think, Reflect, Create: Metacognitive Learning for Zero-Shot Robotic Planning with LLMsWenjie Lin, Jin Wei-Kocsis, Jiansong Zhang et al.
While large language models (LLMs) have shown great potential across various domains, their applications in robotics remain largely limited to static prompt-based behaviors and still face challenges in complex tasks under zero-shot or few-shot settings. Inspired by human metacognitive learning and creative problem-solving, we address this limitation by exploring a fundamental question: Can LLMs be empowered with metacognitive capabilities to reason, reflect, and create, thereby enhancing their ability to perform robotic tasks with minimal demonstrations? In this paper, we present a framework that integrates metacognitive learning into LLM-powered multi-robot collaboration. The system equips the LLM-powered robotic agents with a skill decomposition and self-reflection mechanism that identifies modular skills from prior tasks, reflects on failures in unseen task scenarios, and synthesizes effective new solutions. We propose a more challenging robotic benchmark task and evaluate our framework on the existing benchmark and the novel task. Experimental results show that our metacognitive learning framework significantly outperforms existing baselines. Moreover, we observe that the framework can generate solutions that differ from the ground truth yet still successfully complete the tasks. These findings support our hypothesis that metacognitive learning can foster creativity in robotic planning.
ROAug 19, 2025
Toward Deployable Multi-Robot Collaboration via a Symbolically-Guided Decision TransformerRathnam Vidushika Rasanji, Jin Wei-Kocsis, Jiansong Zhang et al.
Reinforcement learning (RL) has demonstrated great potential in robotic operations. However, its data-intensive nature and reliance on the Markov Decision Process (MDP) assumption limit its practical deployment in real-world scenarios involving complex dynamics and long-term temporal dependencies, such as multi-robot manipulation. Decision Transformers (DTs) have emerged as a promising offline alternative by leveraging causal transformers for sequence modeling in RL tasks. However, their applications to multi-robot manipulations still remain underexplored. To address this gap, we propose a novel framework, Symbolically-Guided Decision Transformer (SGDT), which integrates a neuro-symbolic mechanism with a causal transformer to enable deployable multi-robot collaboration. In the proposed SGDT framework, a neuro-symbolic planner generates a high-level task-oriented plan composed of symbolic subgoals. Guided by these subgoals, a goal-conditioned decision transformer (GCDT) performs low-level sequential decision-making for multi-robot manipulation. This hierarchical architecture enables structured, interpretable, and generalizable decision making in complex multi-robot collaboration tasks. We evaluate the performance of SGDT across a range of task scenarios, including zero-shot and few-shot scenarios. To our knowledge, this is the first work to explore DT-based technology for multi-robot manipulation.
ROJul 15, 2021
Real-Time Grasping Strategies Using Event CameraXiaoqian Huang, Mohamad Halwani, Rajkumar Muthusamy et al.
Robotic vision plays a key role for perceiving the environment in grasping applications. However, the conventional framed-based robotic vision, suffering from motion blur and low sampling rate, may not meet the automation needs of evolving industrial requirements. This paper, for the first time, proposes an event-based robotic grasping framework for multiple known and unknown objects in a cluttered scene. Compared with standard frame-based vision, neuromorphic vision has advantages of microsecond-level sampling rate and no motion blur. Building on that, the model-based and model-free approaches are developed for known and unknown objects' grasping respectively. For the model-based approach, event-based multi-view approach is used to localize the objects in the scene, and then point cloud processing allows for the clustering and registering of objects. Differently, the proposed model-free approach utilizes the developed event-based object segmentation, visual servoing and grasp planning to localize, align to, and grasp the targeting object. The proposed approaches are experimentally validated with objects of different sizes, using a UR10 robot with an eye-in-hand neuromorphic camera and a Barrett hand gripper. Moreover, the robustness of the two proposed event-based grasping approaches are validated in a low-light environment. This low-light operating ability shows a great advantage over the grasping using the standard frame-based vision. Furthermore, the developed model-free approach demonstrates the advantage of dealing with unknown object without prior knowledge compared to the proposed model-based approach.
ROApr 15, 2020
Neuromorphic Eye-in-Hand Visual ServoingRajkumar Muthusamy, Abdulla Ayyad, Mohamad Halwani et al.
Robotic vision plays a major role in factory automation to service robot applications. However, the traditional use of frame-based camera sets a limitation on continuous visual feedback due to their low sampling rate and redundant data in real-time image processing, especially in the case of high-speed tasks. Event cameras give human-like vision capabilities such as observing the dynamic changes asynchronously at a high temporal resolution ($1μs$) with low latency and wide dynamic range. In this paper, we present a visual servoing method using an event camera and a switching control strategy to explore, reach and grasp to achieve a manipulation task. We devise three surface layers of active events to directly process stream of events from relative motion. A purely event based approach is adopted to extract corner features, localize them robustly using heat maps and generate virtual features for tracking and alignment. Based on the visual feedback, the motion of the robot is controlled to make the temporal upcoming event features converge to the desired event in spatio-temporal space. The controller switches its strategy based on the sequence of operation to establish a stable grasp. The event based visual servoing (EVBS) method is validated experimentally using a commercial robot manipulator in an eye-in-hand configuration. Experiments prove the effectiveness of the EBVS method to track and grasp objects of different shapes without the need for re-tuning.
ROApr 15, 2020
Neuromorphic Event-Based Slip Detection and suppression in Robotic Grasping and ManipulationRajkumar Muthusamy, Xiaoqian Huang, Yahya Zweiri et al.
Slip detection is essential for robots to make robust grasping and fine manipulation. In this paper, a novel dynamic vision-based finger system for slip detection and suppression is proposed. We also present a baseline and feature based approach to detect object slips under illumination and vibration uncertainty. A threshold method is devised to autonomously sample noise in real-time to improve slip detection. Moreover, a fuzzy based suppression strategy using incipient slip feedback is proposed for regulating the grip force. A comprehensive experimental study of our proposed approaches under uncertainty and system for high-performance precision manipulation are presented. We also propose a slip metric to evaluate such performance quantitatively. Results indicate that the system can effectively detect incipient slip events at a sampling rate of 2kHz ($Δt = 500μs$) and suppress them before a gross slip occurs. The event-based approach holds promises to high precision manipulation task requirement in industrial manufacturing and household services.