Shiyu Jin

RO
h-index7
8papers
173citations
Novelty51%
AI Score32

8 Papers

ROMar 9, 2023
GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning

Yaru Niu, Shiyu Jin, Zeqing Zhang et al.

In this work, we first formulate the problem of robotic water scooping using goal-conditioned reinforcement learning. This task is particularly challenging due to the complex dynamics of fluids and the need to achieve multi-modal goals. The policy is required to successfully reach both position goals and water amount goals, which leads to a large convoluted goal state space. To overcome these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum throughout the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently adapt to noisy real-robot water-scooping scenarios with diverse physical configurations and unseen settings, demonstrating superior efficacy and generalizability. The videos of this work are available on our project page: https://sites.google.com/view/goatscooping.

AISep 1, 2024
Hound: Hunting Supervision Signals for Few and Zero Shot Node Classification on Text-attributed Graph

Yuxiang Wang, Xiao Yan, Shiyu Jin et al.

Text-attributed graph (TAG) is an important type of graph structured data with text descriptions for each node. Few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. However, the two tasks are challenging due to the lack of supervision signals, and existing methods only use the contrastive loss to align graph-based node embedding and language-based text embedding. In this paper, we propose Hound to improve accuracy by introducing more supervision signals, and the core idea is to go beyond the node-text pairs that come with data. Specifically, we design three augmentation techniques, i.e., node perturbation, text matching, and semantics negation to provide more reference nodes for each text and vice versa. Node perturbation adds/drops edges to produce diversified node embeddings that can be matched with a text. Text matching retrieves texts with similar embeddings to match with a node. Semantics negation uses a negative prompt to construct a negative text with the opposite semantics, which is contrasted with the original node and text. We evaluate Hound on 5 datasets and compare with 13 state-of-the-art baselines. The results show that Hound consistently outperforms all baselines, and its accuracy improvements over the best-performing baseline are usually over 5%.

ROMar 11, 2024
RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

Liangliang Chen, Yutian Lei, Shiyu Jin et al.

Reinforcement learning (RL) has demonstrated its capability in solving various tasks but is notorious for its low sample efficiency. In this paper, we propose RLingua, a framework that can leverage the internal knowledge of large language models (LLMs) to reduce the sample complexity of RL in robotic manipulations. To this end, we first present a method for extracting the prior knowledge of LLMs by prompt engineering so that a preliminary rule-based robot controller for a specific task can be generated in a user-friendly manner. Despite being imperfect, the LLM-generated robot controller is utilized to produce action samples during rollouts with a decaying probability, thereby improving RL's sample efficiency. We employ TD3, the widely-used RL baseline method, and modify the actor loss to regularize the policy learning towards the LLM-generated controller. RLingua also provides a novel method of improving the imperfect LLM-generated robot controllers by RL. We demonstrate that RLingua can significantly reduce the sample complexity of TD3 in four robot tasks of panda_gym and achieve high success rates in 12 sampled sparsely rewarded robot tasks in RLBench, where the standard TD3 fails. Additionally, We validated RLingua's effectiveness in real-world robot experiments through Sim2Real, demonstrating that the learned policies are effectively transferable to real robot tasks. Further details about our work are available at our project website https://rlingua.github.io.

CLMay 13, 2025
Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph

Yuxiang Wang, Xiao Yan, Shiyu Jin et al.

Text-attributed graph (TAG) provides a text description for each graph node, and few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. Existing work utilizes various graph-based augmentation techniques to train the node and text embeddings, while text-based augmentations are largely unexplored. In this paper, we propose Text Semantics Augmentation (TSA) to improve accuracy by introducing more text semantic supervision signals. Specifically, we design two augmentation techniques, i.e., positive semantics matching and negative semantics contrast, to provide more reference texts for each graph node or text description. Positive semantic matching retrieves texts with similar embeddings to match with a graph node. Negative semantic contrast adds a negative prompt to construct a text description with the opposite semantics, which is contrasted with the original node and text. We evaluate TSA on 5 datasets and compare with 13 state-of-the-art baselines. The results show that TSA consistently outperforms all baselines, and its accuracy improvements over the best-performing baseline are usually over 5%.

ROMay 9, 2024
ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers

Liangliang Chen, Shiyu Jin, Haoyu Wang et al.

Excavators are crucial for diverse tasks such as construction and mining, while autonomous excavator systems enhance safety and efficiency, address labor shortages, and improve human working conditions. Different from the existing modularized approaches, this paper introduces ExACT, an end-to-end autonomous excavator system that processes raw LiDAR, camera data, and joint positions to control excavator valves directly. Utilizing the Action Chunking with Transformers (ACT) architecture, ExACT employs imitation learning to take observations from multi-modal sensors as inputs and generate actionable sequences. In our experiment, we build a simulator based on the captured real-world data to model the relations between excavator valve states and joint velocities. With a few human-operated demonstration data trajectories, ExACT demonstrates the capability of completing different excavation tasks, including reaching, digging and dumping through imitation learning in validations with the simulator. To the best of our knowledge, ExACT represents the first instance towards building an end-to-end autonomous excavator system via imitation learning methods with a minimal set of human demonstrations. The video about this work can be accessed at https://youtu.be/NmzR_Rf-aEk.

ROOct 25, 2021
Learning Insertion Primitives with Discrete-Continuous Hybrid Action Space for Robotic Assembly Tasks

Xiang Zhang, Shiyu Jin, Changhao Wang et al.

This paper introduces a discrete-continuous action space to learn insertion primitives for robotic assembly tasks. Primitive is a sequence of elementary actions with certain exit conditions, such as "pushing down the peg until contact". Since the primitive is an abstraction of robot control commands and encodes human prior knowledge, it reduces the exploration difficulty and yields better learning efficiency. In this paper, we learn robot assembly skills via primitives. Specifically, we formulate insertion primitives as parameterized actions: hybrid actions consisting of discrete primitive types and continuous primitive parameters. Compared with the previous work using a set of discretized parameters for each primitive, the agent in our method can freely choose primitive parameters from a continuous space, which is more flexible and efficient. To learn these insertion primitives, we propose Twin-Smoothed Multi-pass Deep Q-Network (TS-MP-DQN), an advanced version of MP-DQN with twin Q-network to reduce the Q-value over-estimation. Extensive experiments are conducted in the simulation and real world for validation. From experiment results, our approach achieves higher success rates than three baselines: MP-DQN with parameterized actions, primitives with discrete parameters, and continuous velocity control. Furthermore, learned primitives are robust to sim-to-real transfer and can generalize to challenging assembly tasks such as tight round peg-hole and complex shaped electric connectors with promising success rates. Experiment videos are available at https://msc.berkeley.edu/research/insertion-primitives.html.

ROJun 2, 2021
Trajectory Optimization for Manipulation of Deformable Objects: Assembly of Belt Drive Units

Shiyu Jin, Diego Romeres, Arvind Ragunathan et al.

This paper presents a novel trajectory optimization formulation to solve the robotic assembly of the belt drive unit. Robotic manipulations involving contacts and deformable objects are challenging in both dynamic modeling and trajectory planning. For modeling, variations in the belt tension and contact forces between the belt and the pulley could dramatically change the system dynamics. For trajectory planning, it is computationally expensive to plan trajectories for such hybrid dynamical systems as it usually requires planning for discrete modes separately. In this work, we formulate the belt drive unit assembly task as a trajectory optimization problem with complementarity constraints to avoid explicitly imposing contact mode sequences. The problem is solved as a mathematical program with complementarity constraints (MPCC) to obtain feasible and efficient assembly trajectories. We validate the proposed method both in simulations with a physics engine and in real-world experiments with a robotic manipulator.

ROJan 29, 2021
Contact Pose Identification for Peg-in-Hole Assembly under Uncertainties

Shiyu Jin, Xinghao Zhu, Changhao Wang et al.

Peg-in-hole assembly is a challenging contact-rich manipulation task. There is no general solution to identify the relative position and orientation between the peg and the hole. In this paper, we propose a novel method to classify the contact poses based on a sequence of contact measurements. When the peg contacts the hole with pose uncertainties, a tilt-then-rotate strategy is applied, and the contacts are measured as a group of patterns to encode the contact pose. A convolutional neural network (CNN) is trained to classify the contact poses according to the patterns. In the end, an admittance controller guides the peg towards the error direction and finishes the peg-in-hole assembly. Simulations and experiments are provided to show that the proposed method can be applied to the peg-in-hole assembly of different geometries. We also demonstrate the ability to alleviate the sim-to-real gap.