Nguyễn Thanh Hải

10.4ROFeb 28, 2024Code

Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

Hai Nguyen, Tadashi Kozuno, Cristian C. Beltran-Hernandez et al.

This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one. Previous studies often use a fully observable formulation, requiring external setups or estimators for the peg-to-hole pose. In contrast, we use a partially observable formulation and deep reinforcement learning from demonstrations to learn a memory-based agent that acts purely on haptic and proprioceptive signals. Moreover, previous works do not incorporate potential domain symmetry and thus must search for solutions in a bigger space. Instead, we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry. Results in simulation with five different symmetric peg shapes show that our proposed agent can be comparable to or even outperform a state-based agent. In particular, the sample efficiency also allows us to learn directly on the real robot within 3 hours.

15.7IRMay 29, 2014

Cold-start Problems in Recommendation Systems via Contextual-bandit Algorithms

Hai Thanh Nguyen, Jérémie Mary, Philippe Preux

In this paper, we study a cold-start problem in recommendation systems where we have completely new users entered the systems. There is not any interaction or feedback of the new users with the systems previoustly, thus no ratings are available. Trivial approaches are to select ramdom items or the most popular ones to recommend to the new users. However, these methods perform poorly in many case. In this research, we provide a new look of this cold-start problem in recommendation systems. In fact, we cast this cold-start problem as a contextual-bandit problem. No additional information on new users and new items is needed. We consider all the past ratings of previous users as contextual information to be integrated into the recommendation framework. To solve this type of the cold-start problems, we propose a new efficient method which is based on the LinUCB algorithm for contextual-bandit problems. The experiments were conducted on three different publicly-available data sets, namely Movielens, Netflix and Yahoo!Music. The new proposed methods were also compared with other state-of-the-art techniques. Experiments showed that our new method significantly improves upon all these methods.

Nguyễn Thanh Hải

2 Papers