ROJan 10, 2022
Task planning and explanation with virtual actionsGuowei Cui, Xiaoping Chen
One of the challenges of task planning is to find out what causes the planning failure and how to handle the failure intelligently. This paper shows how to achieve this. The idea is inspired by the connected graph: each verticle represents a set of compatible \textit{states}, and each edge represents an \textit{action}. For any given initial states and goals, we construct virtual actions to ensure that we always get a plan via task planning. This paper shows how to introduce virtual action to extend action models to make the graph to be connected: i) explicitly defines static predicate (type, permanent properties, etc) or dynamic predicate (state); ii) constructs a full virtual action or a semi-virtual action for each state; iii) finds the cause of the planning failure through a progressive planning approach. The implementation was evaluated in three typical scenarios.
ROJan 5, 2022
Control of a Soft Robotic Arm Using a Piecewise Universal Joint ModelZhanchi Wang, Gaotian Wang, Xiaoping Chen et al.
The 'infinite' passive degrees of freedom of soft robotic arms render their control especially challenging. In this paper, we leverage a previously developed model, which drawing equivalence of the soft arm to a series of universal joints, to design two closed-loop controllers: a configuration space controller for trajectory tracking and a task space controller for position control of the end effector. Extensive experiments and simulations on a four-segment soft arm attest to substantial improvement in terms of: a) superior tracking accuracy of the configuration space controller and b) reduced settling time and steady-state error of the task space controller. The task space controller is also verified to be effective in the presence of interactions between the soft arm and the environment.
ROSep 13, 2021
A Q-learning Control Method for a Soft Robotic Arm Utilizing Training Data from a Rough SimulatorPeijin Li, Gaotian Wang, Hao Jiang et al.
It is challenging to control a soft robot, where reinforcement learning methods have been applied with promising results. However, due to the poor sample efficiency, reinforcement learning methods require a large collection of training data, which limits their applications. In this paper, we propose a Q-learning controller for a physical soft robot, in which pre-trained models using data from a rough simulator are applied to improve the performance of the controller. We implement the method on our soft robot, i.e., Honeycomb Pneumatic Network (HPN) arm. The experiments show that the usage of pre-trained models can not only reduce the amount of the real-world training data, but also greatly improve its accuracy and convergence rate.
ROSep 6, 2021
Crowd-Aware Robot Navigation for Pedestrians with Multiple Collision Avoidance Strategies via Map-based Deep Reinforcement LearningShunyi Yao1, Guangda Chen, Quecheng Qiu et al.
It is challenging for a mobile robot to navigate through human crowds. Existing approaches usually assume that pedestrians follow a predefined collision avoidance strategy, like social force model (SFM) or optimal reciprocal collision avoidance (ORCA). However, their performances commonly need to be further improved for practical applications, where pedestrians follow multiple different collision avoidance strategies. In this paper, we propose a map-based deep reinforcement learning approach for crowd-aware robot navigation with various pedestrians. We use the sensor map to represent the environmental information around the robot, including its shape and observable appearances of obstacles. We also introduce the pedestrian map that specifies the movements of pedestrians around the robot. By applying both maps as inputs of the neural network, we show that a navigation policy can be trained to better interact with pedestrians following different collision avoidance strategies. We evaluate our approach under multiple scenarios both in the simulator and on an actual robot. The results show that our approach allows the robot to successfully interact with various pedestrians and outperforms compared methods in terms of the success rate.
RONov 2, 2020
NEARL: Non-Explicit Action Reinforcement Learning for Robotic ControlNan Lin, Yuxuan Li, Yujun Zhu et al.
Traditionally, reinforcement learning methods predict the next action based on the current state. However, in many situations, directly applying actions to control systems or robots is dangerous and may lead to unexpected behaviors because action is rather low-level. In this paper, we propose a novel hierarchical reinforcement learning framework without explicit action. Our meta policy tries to manipulate the next optimal state and actual action is produced by the inverse dynamics model. To stabilize the training process, we integrate adversarial learning and information bottleneck into our framework. Under our framework, widely available state-only demonstrations can be exploited effectively for imitation learning. Also, prior knowledge and constraints can be applied to meta policy. We test our algorithm in simulation tasks and its combination with imitation learning. The experimental results show the reliability and robustness of our algorithms.
RONov 1, 2020
Semantic Task Planning for Service Robots in Open WorldGuowei Cui, Wei Shuai, Xiaoping Chen
In this paper, we present a planning system based on semantic reasoning for a general-purpose service robot, which is aimed at behaving more intelligently in domains that contain incomplete information, under-specified goals, and dynamic changes. First, Two kinds of data are generated by Natural Language Processing module from the speech: (i) action frames and their relationships; (ii) the modifier used to indicate some property or characteristic of a variable in the action frame. Next, the goals of the task are generated from these action frames and modifiers. These goals are represented as AI symbols, combining world state and domain knowledge, which are used to generate plans by an Answer Set Programming solver. Finally, the actions of the plan are executed one by one, and continuous sensing grounds useful information, which make the robot to use contingent knowledge to adapt to dynamic changes and faults. For each action in the plan, the planner gets its preconditions and effects from domain knowledge, so during the execution of the task, the environmental changes, especially those conflict with the actions, not only the action being performed, but also the subsequent actions, can be detected and handled as early as possible. A series of case studies are used to evaluate the system and verify its ability to acquire knowledge through dialogue with users, solve problems with the acquired causal knowledge, and plan for complex tasks autonomously in the open world.
ROJul 8, 2020
Design, Control, and Applications of a Soft Robotic ArmHao Jiang, Zhanchi Wang, Yusong Jin et al.
This paper presents the design, control, and applications of a multi-segment soft robotic arm. In order to design a soft arm with large load capacity, several design principles are proposed by analyzing two kinds of buckling issues, under which we present a novel structure named Honeycomb Pneumatic Networks (HPN). Parameter optimization method, based on finite element method (FEM), is proposed to optimize HPN Arm design parameters. Through a quick fabrication process, several prototypes with different performance are made, one of which can achieve the transverse load capacity of 3 kg under 3 bar pressure. Next, considering different internal and external conditions, we develop three controllers according to different model precision. Specifically, based on accurate model, an open-loop controller is realized by combining piece-wise constant curvature (PCC) modeling method and machine learning method. Based on inaccurate model, a feedback controller, using estimated Jacobian, is realized in 3D space. A model-free controller, using reinforcement learning to learn a control policy rather than a model, is realized in 2D plane, with minimal training data. Then, these three control methods are compared on a same experiment platform to explore the applicability of different methods under different conditions. Lastly, we figure out that soft arm can greatly simplify the perception, planning, and control of interaction tasks through its compliance, which is its main advantage over the rigid arm. Through plentiful experiments in three interaction application scenarios, human-robot interaction, free space interaction task, and confined space interaction task, we demonstrate the potential application prospect of the soft arm.
AIMay 20, 2020
Learning and Reasoning for Robot Dialog and Navigation TasksKeting Lu, Shiqi Zhang, Peter Stone et al.
Reinforcement learning and probabilistic reasoning algorithms aim at learning from interaction experiences and reasoning with probabilistic contextual knowledge respectively. In this research, we develop algorithms for robot task completions, while looking into the complementary strengths of reinforcement learning and probabilistic reasoning techniques. The robots learn from trial-and-error experiences to augment their declarative knowledge base, and the augmented knowledge can be used for speeding up the learning process in potentially different tasks. We have implemented and evaluated the developed algorithms using mobile robots conducting dialog and navigation tasks. From the results, we see that our robot's performance can be improved by both reasoning with human knowledge and learning from task-completion experience. More interestingly, the robot was able to learn from navigation tasks to improve its dialog strategies.
AIMay 7, 2020
Adaptive Dialog Policy Learning with Hindsight and User ModelingYan Cao, Keting Lu, Xiaoping Chen et al.
Reinforcement learning methods have been used to compute dialog policies from language-based interaction experiences. Efficiency is of particular importance in dialog policy learning, because of the considerable cost of interacting with people, and the very poor user experience from low-quality conversations. Aiming at improving the efficiency of dialog policy learning, we develop algorithm LHUA (Learning with Hindsight, User modeling, and Adaptation) that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Simulation and hindsight provide the dialog agent with more experience and more (positive) reinforcements respectively. Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature, including its no-simulation, no-adaptation, and no-hindsight counterparts.
LGApr 22, 2020
AutoEG: Automated Experience Grafting for Off-Policy Deep Reinforcement LearningKeting Lu, Shiqi Zhang, Xiaoping Chen
Deep reinforcement learning (RL) algorithms frequently require prohibitive interaction experience to ensure the quality of learned policies. The limitation is partly because the agent cannot learn much from the many low-quality trials in early learning phase, which results in low learning rate. Focusing on addressing this limitation, this paper makes a twofold contribution. First, we develop an algorithm, called Experience Grafting (EG), to enable RL agents to reorganize segments of the few high-quality trajectories from the experience pool to generate many synthetic trajectories while retaining the quality. Second, building on EG, we further develop an AutoEG agent that automatically learns to adjust the grafting-based learning strategy. Results collected from a set of six robotic control environments show that, in comparison to a standard deep RL algorithm (DDPG), AutoEG increases the speed of learning process by at least 30%.
CVApr 5, 2020
Attentive One-Dimensional Heatmap Regression for Facial Landmark Detection and TrackingShi Yin, Shangfei Wang, Xiaoping Chen et al.
Although heatmap regression is considered a state-of-the-art method to locate facial landmarks, it suffers from huge spatial complexity and is prone to quantization error. To address this, we propose a novel attentive one-dimensional heatmap regression method for facial landmark localization. First, we predict two groups of 1D heatmaps to represent the marginal distributions of the x and y coordinates. These 1D heatmaps reduce spatial complexity significantly compared to current heatmap regression methods, which use 2D heatmaps to represent the joint distributions of x and y coordinates. With much lower spatial complexity, the proposed method can output high-resolution 1D heatmaps despite limited GPU memory, significantly alleviating the quantization error. Second, a co-attention mechanism is adopted to model the inherent spatial patterns existing in x and y coordinates, and therefore the joint distributions on the x and y axes are also captured. Third, based on the 1D heatmap structures, we propose a facial landmark detector capturing spatial patterns for landmark detection on an image; and a tracker further capturing temporal patterns with a temporal refinement mechanism for landmark tracking. Experimental results on four benchmark databases demonstrate the superiority of our method.
ROFeb 11, 2020
Robot Navigation with Map-Based Deep Reinforcement LearningGuangda Chen, Lifan Pan, Yu'an Chen et al.
This paper proposes an end-to-end deep reinforcement learning approach for mobile robot navigation with dynamic obstacles avoidance. Using experience collected in a simulation environment, a convolutional neural network (CNN) is trained to predict proper steering actions of a robot from its egocentric local occupancy maps, which accommodate various sensors and fusion algorithms. The trained neural network is then transferred and executed on a real-world mobile robot to guide its local path planning. The new approach is evaluated both qualitatively and quantitatively in simulation and real-world robot experiments. The results show that the map-based end-to-end navigation model is easy to be deployed to a robotic platform, robust to sensor noise and outperforms other existing DRL-based models in many indicators.
AISep 28, 2018
Robot Representation and Reasoning with Knowledge from Reinforcement LearningKeting Lu, Shiqi Zhang, Peter Stone et al.
Reinforcement learning (RL) agents aim at learning by interacting with an environment, and are not designed for representing or reasoning with declarative knowledge. Knowledge representation and reasoning (KRR) paradigms are strong in declarative KRR tasks, but are ill-equipped to learn from such experiences. In this work, we integrate logical-probabilistic KRR with model-based RL, enabling agents to simultaneously reason with declarative knowledge and learn from interaction experiences. The knowledge from humans and RL is unified and used for dynamically computing task-specific planning models under potentially new environments. Experiments were conducted using a mobile robot working on dialog, navigation, and delivery tasks. Results show significant improvements, in comparison to existing model-based RL methods.
CLAug 28, 2018
KDSL: a Knowledge-Driven Supervised Learning Framework for Word Sense DisambiguationShi Yin, Yi Zhou, Chenguang Li et al.
We propose KDSL, a new word sense disambiguation (WSD) framework that utilizes knowledge to automatically generate sense-labeled data for supervised learning. First, from WordNet, we automatically construct a semantic knowledge base called DisDict, which provides refined feature words that highlight the differences among word senses, i.e., synsets. Second, we automatically generate new sense-labeled data by DisDict from unlabeled corpora. Third, these generated data, together with manually labeled data and unlabeled data, are fed to a neural framework conducting supervised and unsupervised learning jointly to model the semantic relations among synsets, feature words and their contexts. The experimental results show that KDSL outperforms several representative state-of-the-art methods on various major benchmarks. Interestingly, it performs relatively well even when manually labeled data is unavailable, thus provides a potential solution for similar tasks in a lack of manual annotations.
AIAug 20, 2018
Goal-oriented Dialogue Policy Learning from FailuresKeting Lu, Shiqi Zhang, Xiaoping Chen
Reinforcement learning methods have been used for learning dialogue policies. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the very few successful dialogues in early learning phase. Hindsight experience replay (HER) enables learning from failures, but the vanilla HER is inapplicable to dialogue learning due to the implicit goals. In this work, we develop two complex HER methods providing different trade-offs between complexity and performance, and, for the first time, enabled HER-based dialogue policy learning. Experiments using a realistic user simulator show that our HER methods perform better than existing experience replay methods (as applied to deep Q-networks) in learning rate.
CVApr 28, 2018
Precise Box Score: Extract More Information from Datasets to Improve the Performance of Face DetectionCe Qi, Xiaoping Chen, Pingyu Wang et al.
For the training of face detection network based on R-CNN framework, anchors are assigned to be positive samples if intersection-over-unions (IoUs) with ground-truth are higher than the first threshold(such as 0.7); and to be negative samples if their IoUs are lower than the second threshold(such as 0.3). And the face detection model is trained by the above labels. However, anchors with IoU between first threshold and second threshold are not used. We propose a novel training strategy, Precise Box Score(PBS), to train object detection models. The proposed training strategy uses the anchors with IoUs between the first and second threshold, which can consistently improve the performance of face detection. Our proposed training strategy extracts more information from datasets, making better utilization of existing datasets. What's more, we also introduce a simple but effective model compression method(SEMCM), which can boost the performance of face detectors further. Experimental results show that the performance of face detection network can consistently be improved based on our proposed scheme.
CVDec 26, 2016
Signature of Geometric Centroids for 3D Local Shape Description and Partial Shape MatchingKeke Tang, Peng Song, Xiaoping Chen
Depth scans acquired from different views may contain nuisances such as noise, occlusion, and varying point density. We propose a novel Signature of Geometric Centroids descriptor, supporting direct shape matching on the scans, without requiring any preprocessing such as scan denoising or converting into a mesh. First, we construct the descriptor by voxelizing the local shape within a uniquely defined local reference frame and concatenating geometric centroid and point density features extracted from each voxel. Second, we compare two descriptors by employing only corresponding voxels that are both non-empty, thus supporting matching incomplete local shape such as those close to scan boundary. Third, we propose a descriptor saliency measure and compute it from a descriptor-graph to improve shape matching performance. We demonstrate the descriptor's robustness and effectiveness for shape matching by comparing it with three state-of-the-art descriptors, and applying it to object/scene reconstruction and 3D object recognition.
ROJun 9, 2016
Understanding User Instructions by Utilizing Open Knowledge for Service RobotsDongcai Lu, Feng Wu, Xiaoping Chen
Understanding user instructions in natural language is an active research topic in AI and robotics. Typically, natural user instructions are high-level and can be reduced into low-level tasks expressed in common verbs (e.g., `take', `get', `put'). For robots understanding such instructions, one of the key challenges is to process high-level user instructions and achieve the specified tasks with robots' primitive actions. To address this, we propose novel algorithms by utilizing semantic roles of common verbs defined in semantic dictionaries and integrating multiple open knowledge to generate task plans. Specifically, we present a new method for matching and recovering semantics of user instructions and a novel task planner that exploits functional knowledge of robot's action model. To verify and evaluate our approach, we implemented a prototype system using knowledge from several open resources. Experiments on our system confirmed the correctness and efficiency of our algorithms. Notably, our system has been deployed in the KeJia robot, which participated the annual RoboCup@Home competitions in the past three years and achieved encouragingly high scores in the benchmark tests.
AIOct 16, 2012
FHHOP: A Factored Hybrid Heuristic Online Planning Algorithm for Large POMDPsZhongzhang Zhang, Xiaoping Chen
Planning in partially observable Markov decision processes (POMDPs) remains a challenging topic in the artificial intelligence community, in spite of recent impressive progress in approximation techniques. Previous research has indicated that online planning approaches are promising in handling large-scale POMDP domains efficiently as they make decisions "on demand" instead of proactively for the entire state space. We present a Factored Hybrid Heuristic Online Planning (FHHOP) algorithm for large POMDPs. FHHOP gets its power by combining a novel hybrid heuristic search strategy with a recently developed factored state representation. On several benchmark problems, FHHOP substantially outperformed state-of-the-art online heuristic search approaches in terms of both scalability and quality.
AIMar 15, 2012
Rollout Sampling Policy Iteration for Decentralized POMDPsFeng Wu, Shlomo Zilberstein, Xiaoping Chen
We present decentralized rollout sampling policy iteration (DecRSPI) - a new algorithm for multi-agent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte- Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach.