Hankz Hankui Zhuo

h-index19

38papers

738citations

Novelty46%

AI Score49

Ranked #26,103 of 194,257 authors (top 13%)#1,367 in AI (top 11%)

38 Papers

15.5LGJan 20, 2023Code

Plan To Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning

Zifan Wu, Chao Yu, Chen Chen et al.

In Model-based Reinforcement Learning (MBRL), model learning is critical since an inaccurate model can bias policy learning via generating misleading samples. However, learning an accurate model can be difficult since the policy is continually updated and the induced distribution over visited states used for model learning shifts accordingly. Prior methods alleviate this issue by quantifying the uncertainty of model-generated samples. However, these methods only quantify the uncertainty passively after the samples were generated, rather than foreseeing the uncertainty before model trajectories fall into those highly uncertain regions. The resulting low-quality samples can induce unstable learning targets and hinder the optimization of the policy. Moreover, while being learned to minimize one-step prediction errors, the model is generally used to predict for multiple steps, leading to a mismatch between the objectives of model learning and model usage. To this end, we propose \emph{Plan To Predict} (P2P), an MBRL framework that treats the model rollout process as a sequential decision making problem by reversely considering the model as a decision maker and the current policy as the dynamics. In this way, the model can quickly adapt to the current policy and foresee the multi-step future uncertainty when generating trajectories. Theoretically, we show that the performance of P2P can be guaranteed by approximately optimizing a lower bound of the true environment return. Empirical results demonstrate that P2P achieves state-of-the-art performance on several challenging benchmark tasks.

2.1AIJun 14, 2023

Hierarchical Task Network Planning for Facilitating Cooperative Multi-Agent Reinforcement Learning

Xuechen Mu, Hankz Hankui Zhuo, Chen Chen et al.

Exploring sparse reward multi-agent reinforcement learning (MARL) environments with traps in a collaborative manner is a complex task. Agents typically fail to reach the goal state and fall into traps, which affects the overall performance of the system. To overcome this issue, we present SOMARL, a framework that uses prior knowledge to reduce the exploration space and assist learning. In SOMARL, agents are treated as part of the MARL environment, and symbolic knowledge is embedded using a tree structure to build a knowledge hierarchy. The framework has a two-layer hierarchical structure, comprising a hybrid module with a Hierarchical Task Network (HTN) planning and meta-controller at the higher level, and a MARL-based interactive module at the lower level. The HTN module and meta-controller use Hierarchical Domain Definition Language (HDDL) and the option framework to formalize symbolic knowledge and obtain domain knowledge and a symbolic option set, respectively. Moreover, the HTN module leverages domain knowledge to guide low-level agent exploration by assisting the meta-controller in selecting symbolic options. The meta-controller further computes intrinsic rewards of symbolic options to limit exploration behavior and adjust HTN planning solutions as needed. We evaluate SOMARL on two benchmarks, FindTreasure and MoveBox, and report superior performance over state-of-the-art MARL and subgoal-based baselines for MARL environments significantly.

11.7AIApr 24, 2023

Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey

Chao Yu, Shicheng Ye, Hankz Hankui Zhuo

Reinforcement Learning (RL) has achieved tremendous development in recent years, but still faces significant obstacles in addressing complex real-life problems due to the issues of poor system generalization, low sample efficiency as well as safety and interpretability concerns. The core reason underlying such dilemmas can be attributed to the fact that most of the work has focused on the computational aspect of value functions or policies using a representational model to describe atomic components of rewards, states and actions etc, thus neglecting the rich high-level declarative domain knowledge of facts, relations and rules that can be either provided a priori or acquired through reasoning over time. Recently, there has been a rapidly growing interest in the use of Knowledge Representation and Reasoning (KRR) methods, usually using logical languages, to enable more abstract representation and efficient learning in RL. In this survey, we provide a preliminary overview on these endeavors that leverage the strengths of KRR to help solving various problems in RL, and discuss the challenging open problems and possible directions for future work in this area.

4.3CLApr 10

Prototype-Regularized Federated Learning for Cross-Domain Aspect Sentiment Triplet Extraction

Zongming Cai, Jianhang Tang, Zhenyong Zhang et al.

Aspect Sentiment Triplet Extraction (ASTE) aims to extract all sentiment triplets of aspect terms, opinion terms, and sentiment polarities from a sentence. Existing methods are typically trained on individual datasets in isolation, failing to jointly capture the common feature representations shared across domains. Moreover, data privacy constraints prevent centralized data aggregation. To address these challenges, we propose Prototype-based Cross-Domain Span Prototype extraction (PCD-SpanProto), a prototype-regularized federated learning framework to enable distributed clients to exchange class-level prototypes instead of full model parameters. Specifically, we design a weighted performance-aware aggregation strategy and a contrastive regularization module to improve the global prototype under domain heterogeneity and the promotion between intra-class compactness and inter-class separability across clients. Extensive experiments on four ASTE datasets demonstrate that our method outperforms baselines and reduces communication costs, validating the effectiveness of prototype-based cross-domain knowledge transfer.

2.1CLAug 26, 2023

Planning with Logical Graph-based Language Model for Instruction Generation

Fan Zhang, Kebing Jin, Hankz Hankui Zhuo

Despite the superior performance of large language models to generate natural language texts, it is hard to generate texts with correct logic according to a given task, due to the difficulties for neural models to capture implied rules from free-form texts. In this paper, we propose a novel graph-based language model, Logical-GLM, to infuse logic into language models for more valid text generation and interpretability. Specifically, we first capture information from natural language instructions and construct logical bayes graphs that generally describe domains. Next, we generate logical skeletons to guide language model training, infusing domain knowledge into language models. Finally, we alternately optimize the searching policy of graphs and language models until convergence. The experimental results show that Logical-GLM is both effective and efficient compared with traditional language models, despite using smaller-scale training data and fewer parameters. Our approach can generate instructional texts with more correct logic owing to the internalized domain knowledge. Moreover, the usage of logical graphs reflects the inner mechanism of the language models, which improves the interpretability of black-box models.

0.5CLJul 26, 2023

DPBERT: Efficient Inference for BERT based on Dynamic Planning

Weixin Wu, Hankz Hankui Zhuo

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing power is limited. In this paper we aim to address the weakness of existing input-adaptive inference methods which fail to take full advantage of the structure of BERT. We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT through selecting a subsequence of transformer layers list of backbone as a computational path for an input sample. To do this, our approach adds a planning module to the original BERT model to determine whether a layer is included or bypassed during inference. Experimental results on the GLUE benchmark exhibit that our method reduces latency to 75\% while maintaining 98\% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.

2.4AIMar 2

LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

Chang Yao, Jinghui Qin, Kebing Jin et al.

Despite achieving remarkable success in complex tasks, Deep Reinforcement Learning (DRL) is still suffering from critical issues in practical applications, such as low data efficiency, lack of interpretability, and limited cross-environment transferability. However, the learned policy generating actions based on states are sensitive to the environmental changes, struggling to guarantee behavioral safety and compliance. Recent research shows that integrating Large Language Models (LLMs) with symbolic planning is promising in addressing these challenges. Inspired by this, we introduce a novel LLM-driven closed-loop framework, which enables semantic-driven skill reuse and real-time constraint monitoring by mapping natural language instructions into executable rules and semantically annotating automatically created options. The proposed approach utilizes the general knowledge of LLMs to facilitate exploration efficiency and adapt to transferable options for similar environments, and provides inherent interpretability through semantic annotations. To validate the effectiveness of this framework, we conduct experiments on two domains, Office World and Montezuma's Revenge, respectively. The results demonstrate superior performance in data efficiency, constraint compliance, and cross-task transferability.

2.5AIDec 11, 2022

A Hierarchical Temporal Planning-Based Approach for Dynamic Hoist Scheduling Problems

Kebing Jin, Yingkai Xiao, Hankz Hankui Zhuo et al.

Hoist scheduling has become a bottleneck in electroplating industry applications with the development of autonomous devices. Although there are a few approaches proposed to target at the challenging problem, they generally cannot scale to large-scale scheduling problems. In this paper, we formulate the hoist scheduling problem as a new temporal planning problem in the form of adapted PDDL, and propose a novel hierarchical temporal planning approach to efficiently solve the scheduling problem. Additionally, we provide a collection of real-life benchmark instances that can be used to evaluate solution methods for the problem. We exhibit that the proposed approach is able to efficiently find solutions of high quality for large-scale real-life benchmark instances, with comparison to state-of-the-art baselines.

2.1AIMay 23, 2023Code

XRoute Environment: A Novel Reinforcement Learning Environment for Routing

Zhanwen Zhou, Hankz Hankui Zhuo, Xiaowu Zhang et al.

Routing is a crucial and time-consuming stage in modern design automation flow for advanced technology nodes. Great progress in the field of reinforcement learning makes it possible to use those approaches to improve the routing quality and efficiency. However, the scale of the routing problems solved by reinforcement learning-based methods in recent studies is too small for these methods to be used in commercial EDA tools. We introduce the XRoute Environment, a new reinforcement learning environment where agents are trained to select and route nets in an advanced, end-to-end routing framework. Novel algorithms and ideas can be quickly tested in a safe and reproducible manner in it. The resulting environment is challenging, easy to use, customize and add additional scenarios, and it is available under a permissive open-source license. In addition, it provides support for distributed deployment and multi-instance experiments. We propose two tasks for learning and build a full-chip test bed with routing benchmarks of various region sizes. We also pre-define several static routing regions with different pin density and number of nets for easier learning and testing. For net ordering task, we report baseline results for two widely used reinforcement learning algorithms (PPO and DQN) and one searching-based algorithm (TritonRoute). The XRoute Environment will be available at https://github.com/xplanlab/xroute_env.

2.3AIFeb 18, 2024

On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs

Hankz Hankui Zhuo, Xin Chen, Rong Pan

Plan synthesis aims to generate a course of actions or policies to transit given initial states to goal states, provided domain models that could be designed by experts or learnt from training data or interactions with the world. Intrigued by the claims of emergent planning capabilities in large language models (LLMs), works have been proposed to investigate the planning effectiveness of LLMs, without considering any utilization of off-the-shelf planning techniques in LLMs. In this paper, we aim to further study the insight of the planning capability of LLMs by investigating the roles of LLMs in off-the-shelf planning frameworks. To do this, we investigate the effectiveness of embedding LLMs into one of the well-known planning frameworks, graph-based planning, proposing a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs, i.e., mutual constraints generation level and constraints solving level. We empirically exhibit the effectiveness of our proposed framework in various planning domains.

5.4AIDec 26, 2023

BalMCTS: Balancing Objective Function and Search Nodes in MCTS for Constraint Optimization Problems

Yingkai Xiao, Jingjin Liu, Hankz Hankui Zhuo

Constraint Optimization Problems (COP) pose intricate challenges in combinatorial problems usually addressed through Branch and Bound (B\&B) methods, which involve maintaining priority queues and iteratively selecting branches to search for solutions. However, conventional approaches take a considerable amount of time to find optimal solutions, and it is also crucial to quickly identify a near-optimal feasible solution in a shorter time. In this paper, we aim to investigate the effectiveness of employing a depth-first search algorithm for solving COP, specifically focusing on identifying optimal or near-optimal solutions within top $n$ solutions. Hence, we propose a novel heuristic neural network algorithm based on MCTS, which, by simultaneously conducting search and training, enables the neural network to effectively serve as a heuristic during Backtracking. Furthermore, our approach incorporates encoding COP problems and utilizing graph neural networks to aggregate information about variables and constraints, offering more appropriate variables for assignments. Experimental results on stochastic COP instances demonstrate that our method identifies feasible solutions with a gap of less than 17.63% within the initial 5 feasible solutions. Moreover, when applied to attendant Constraint Satisfaction Problem (CSP) instances, our method exhibits a remarkable reduction of less than 5% in searching nodes compared to state-of-the-art approaches.

3.3AIAug 15, 2025

Inspire or Predict? Exploring New Paradigms in Assisting Classical Planners with Large Language Models

Wenkai Yu, Jianhang Tang, Yang Zhang et al.

Addressing large-scale planning problems has become one of the central challenges in the planning community, deriving from the state-space explosion caused by growing objects and actions. Recently, researchers have explored the effectiveness of leveraging Large Language Models (LLMs) to generate helpful actions and states to prune the search space. However, prior works have largely overlooked integrating LLMs with domain-specific knowledge to ensure valid plans. In this paper, we propose a novel LLM-assisted planner integrated with problem decomposition, which first decomposes large planning problems into multiple simpler sub-tasks. Then we explore two novel paradigms to utilize LLMs, i.e., LLM4Inspire and LLM4Predict, to assist problem decomposition, where LLM4Inspire provides heuristic guidance according to general knowledge and LLM4Predict employs domain-specific knowledge to infer intermediate conditions. We empirically validate the effectiveness of our planner across multiple domains, demonstrating the ability of search space partition when solving large-scale planning problems. The experimental results show that LLMs effectively locate feasible solutions when pruning the search space, where infusing domain-specific knowledge into LLMs, i.e., LLM4Predict, holds particular promise compared with LLM4Inspire, which offers general knowledge within LLMs.

5.4AIMay 29, 2023

Sequential Condition Evolved Interaction Knowledge Graph for Traditional Chinese Medicine Recommendation

Jingjin Liu, Hankz Hankui Zhuo, Kebing Jin et al.

Traditional Chinese Medicine (TCM) has a rich history of utilizing natural herbs to treat a diversity of illnesses. In practice, TCM diagnosis and treatment are highly personalized and organically holistic, requiring comprehensive consideration of the patient's state and symptoms over time. However, existing TCM recommendation approaches overlook the changes in patient status and only explore potential patterns between symptoms and prescriptions. In this paper, we propose a novel Sequential Condition Evolved Interaction Knowledge Graph (SCEIKG), a framework that treats the model as a sequential prescription-making problem by considering the dynamics of the patient's condition across multiple visits. In addition, we incorporate an interaction knowledge graph to enhance the accuracy of recommendations by considering the interactions between different herbs and the patient's condition. Experimental results on a real-world dataset demonstrate that our approach outperforms existing TCM recommendation methods, achieving state-of-the-art performance.

1.8LGFeb 15, 2022

Text-Based Action-Model Acquisition for Planning

Kebing Jin, Huaixun Chen, Hankz Hankui Zhuo

Although there have been approaches that are capable of learning action models from plan traces, there is no work on learning action models from textual observations, which is pervasive and much easier to collect from real-world applications compared to plan traces. In this paper we propose a novel approach to learning action models from natural language texts by integrating Constraint Satisfaction and Natural Language Processing techniques. Specifically, we first build a novel language model to extract plan traces from texts, and then build a set of constraints to generate action models based on the extracted plan traces. After that, we iteratively improve the language model and constraints until we achieve the convergent language model and action models. We empirically exhibit that our approach is both effective and efficient.

6.2AIFeb 15, 2022

Integrating AI Planning with Natural Language Processing: A Combination of Explicit and Tacit Knowledge

Kebing Jin, Hankz Hankui Zhuo

Natural language processing (NLP) aims at investigating the interactions between agents and humans, processing and analyzing large amounts of natural language data. Large-scale language models play an important role in current natural language processing. However, the challenges of explainability and complexity come along with the developments of language models. One way is to introduce logical relations and rules into natural language processing models, such as making use of Automated Planning. Automated planning (AI planning) focuses on building symbolic domain models and synthesizing plans to transit initial states to goals based on domain models. Recently, there have been plenty of works related to these two fields, which have the abilities to generate explicit knowledge, e.g., preconditions and effects of action models, and learn from tacit knowledge, e.g., neural models, respectively. Integrating AI planning and natural language processing effectively improves the communication between human and intelligent agents. This paper outlines the commons and relations between AI planning and natural language processing, argues that each of them can effectively impact on the other one by five areas: (1) planning-based text understanding, (2) planning-based natural language processing, (3) planning-based explainability, (4) text-based human-robot interaction, and (5) applications. We also explore some potential future issues between AI planning and natural language processing. To the best of our knowledge, this survey is the first work that addresses the deep connections between AI planning and Natural language processing.

2.5AIJan 19, 2022

Introduction to The Dynamic Pickup and Delivery Problem Benchmark -- ICAPS 2021 Competition

Jianye Hao, Jiawen Lu, Xijun Li et al.

The Dynamic Pickup and Delivery Problem (DPDP) is an essential problem within the logistics domain. So far, research on this problem has mainly focused on using artificial data which fails to reflect the complexity of real-world problems. In this draft, we would like to introduce a new benchmark from real business scenarios as well as a simulator supporting the dynamic evaluation. The benchmark and simulator have been published and successfully supported the ICAPS 2021 Dynamic Pickup and Delivery Problem competition participated by 152 teams.

4.5AIDec 18, 2021

Creativity of AI: Hierarchical Planning Model Learning for Facilitating Deep Reinforcement Learning

Hankz Hankui Zhuo, Shuting Deng, Mu Jin et al.

Despite of achieving great success in real-world applications, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, i.e., data efficiency, lack of the interpretability and transferability. Recent research shows that embedding symbolic knowledge into DRL is promising in addressing those challenges. Inspired by this, we introduce a novel deep reinforcement learning framework with symbolic options. Our framework features a loop training procedure, which enables guiding the improvement of policy by planning with planning models (including action models and hierarchical task network models) and symbolic options learned from interactive trajectories automatically. The learned symbolic options alleviate the dense requirement of expert domain knowledge and provide inherent interpretability of policies. Moreover, the transferability and data efficiency can be further improved by planning with the symbolic planning models. To validate the effectiveness of our framework, we conduct experiments on two domains, Montezuma's Revenge and Office World, respectively. The results demonstrate the comparable performance, improved data efficiency, interpretability and transferability.

16.0AIDec 11, 2021Code

Retrosynthetic Planning with Experience-Guided Monte Carlo Tree Search

Siqi Hong, Hankz Hankui Zhuo, Kebing Jin et al.

In retrosynthetic planning, the huge number of possible routes to synthesize a complex molecule using simple building blocks leads to a combinatorial explosion of possibilities. Even experienced chemists often have difficulty to select the most promising transformations. The current approaches rely on human-defined or machine-trained score functions which have limited chemical knowledge or use expensive estimation methods for guiding. Here we an propose experience-guided Monte Carlo tree search (EG-MCTS) to deal with this problem. Instead of rollout, we build an experience guidance network to learn knowledge from synthetic experiences during the search. Experiments on benchmark USPTO datasets show that, EG-MCTS gains significant improvement over state-of-the-art approaches both in efficiency and effectiveness. In a comparative experiment with the literature, our computer-generated routes mostly matched the reported routes. Routes designed for real drug compounds exhibit the effectiveness of EG-MCTS on assisting chemists performing retrosynthetic analysis.

11.1AINov 18, 2021

Lifelong Reinforcement Learning with Temporal Logic Formulas and Reward Machines

Xuejing Zheng, Chao Yu, Chen Chen et al.

Continuously learning new tasks using high-level ideas or knowledge is a key capability of humans. In this paper, we propose Lifelong reinforcement learning with Sequential linear temporal logic formulas and Reward Machines (LSRM), which enables an agent to leverage previously learned knowledge to fasten learning of logically specified tasks. For the sake of more flexible specification of tasks, we first introduce Sequential Linear Temporal Logic (SLTL), which is a supplement to the existing Linear Temporal Logic (LTL) formal language. We then utilize Reward Machines (RM) to exploit structural reward functions for tasks encoded with high-level events, and propose automatic extension of RM and efficient knowledge transfer over tasks for continuous learning in lifetime. Experimental results show that LSRM outperforms the methods that learn the target tasks from scratch by taking advantage of the task decomposition using SLTL and knowledge transfer over RM during the lifelong learning process.

13.8AINov 7, 2021Code

Coordinated Proximal Policy Optimization

Zifan Wu, Chao Yu, Deheng Ye et al.

We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the policy update process among multiple agents. We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective, and derive a simplified optimization objective based on a set of approximations. We then interpret that such an objective in CoPPO can achieve dynamic credit assignment among agents, thereby alleviating the high variance issue during the concurrent update of agent policies. Finally, we demonstrate that CoPPO outperforms several strong baselines and is competitive with the latest multi-agent PPO method (i.e. MAPPO) under typical multi-agent settings, including cooperative matrix games and the StarCraft II micromanagement tasks.

10.1AIOct 19, 2021

Gradient-Based Mixed Planning with Symbolic and Numeric Action Parameters

Kebing Jin, Hankz Hankui Zhuo, Zhanhao Xiao et al.

Dealing with planning problems with both logical relations and numeric changes in real-world dynamic environments is challenging. Existing numeric planning systems for the problem often discretize numeric variables or impose convex constraints on numeric variables, which harms the performance when solving problems. In this paper, we propose a novel algorithm framework to solve numeric planning problems mixed with logical relations and numeric changes based on gradient descent. We cast the numeric planning with logical relations and numeric changes as an optimization problem. Specifically, we extend syntax to allow parameters of action models to be either objects or real-valued numbers, which enhances the ability to model real-world numeric effects. Based on the extended modeling language, we propose a gradient-based framework to simultaneously optimize numeric parameters and compute appropriate actions to form candidate plans. The gradient-based framework is composed of an algorithmic heuristic module based on propositional operations to select actions and generate constraints for gradient descent, an algorithmic transition module to update states to next ones, and a loss module to compute loss. We repeatedly minimize loss by updating numeric parameters and compute candidate plans until it converges into a valid plan for the planning problem. In the empirical study, we exhibit that our algorithm framework is both effective and efficient in solving planning problems mixed with logical relations and numeric changes, especially when the problems contain obstacles and non-linear numeric effects.

12.1AIMar 15, 2021

Learning Symbolic Rules for Interpretable Deep Reinforcement Learning

Zhihao Ma, Yuzheng Zhuang, Paul Weng et al.

Recent progress in deep reinforcement learning (DRL) can be largely attributed to the use of neural networks. However, this black-box approach fails to explain the learned policy in a human understandable way. To address this challenge and improve the transparency, we propose a Neural Symbolic Reinforcement Learning framework by introducing symbolic logic into DRL. This framework features a fertilization of reasoning and learning modules, enabling end-to-end learning with prior symbolic knowledge. Moreover, interpretability is achieved by extracting the logical rules learned by the reasoning module in a symbolic rule space. The experimental results show that our framework has better interpretability, along with competing performance in comparison to state-of-the-art approaches.

1.2LGFeb 25, 2020

Dual Graph Representation Learning

Huiling Zhu, Xin Luo, Hankz Hankui Zhuo

Graph representation learning embeds nodes in large graphs as low-dimensional vectors and is of great benefit to many downstream applications. Most embedding frameworks, however, are inherently transductive and unable to generalize to unseen nodes or learn representations across different graphs. Although inductive approaches can generalize to unseen nodes, they neglect different contexts of nodes and cannot learn node embeddings dually. In this paper, we present a context-aware unsupervised dual encoding framework, \textbf{CADE}, to generate representations of nodes by combining real-time neighborhoods with neighbor-attentioned representation, and preserving extra memory of known nodes. We exhibit that our approach is effective by comparing to state-of-the-art methods.

4.8LGNov 11, 2019

Transfer Value Iteration Networks

Junyi Shen, Hankz Hankui Zhuo, Jin Xu et al.

Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains. However, based on our experiments, a policy learned by VINs still fail to generalize well on the domain whose action space and feature space are not identical to those in the domain where it is trained. In this paper, we propose a transfer learning approach on top of VINs, termed Transfer VINs (TVINs), such that a learned policy from a source domain can be generalized to a target domain with only limited training data, even if the source domain and the target domain have domain-specific actions and features. We empirically verify that our proposed TVINs outperform VINs when the source and the target domains have similar but not identical action and feature spaces. Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and kernel size.

5.1AISep 20, 2019

Repositioning Bikes with Carrier Vehicles and Bike Trailers in Bike Sharing Systems

Xinghua Zheng, Ming Tang, Hankz Hankui Zhuo et al.

Bike Sharing Systems (BSSs) have been adopted in many major cities of the world due to traffic congestion and carbon emissions. Although there have been approaches to exploiting either bike trailers via crowdsourcing or carrier vehicles to reposition bikes in the ``right'' stations in the ``right'' time, they do not jointly consider the usage of both bike trailers and carrier vehicles. In this paper, we aim to take advantage of both bike trailers and carrier vehicles to reduce the loss of demand with regard to the crowdsourcing of bike trailers and the fuel cost of carrier vehicles. In the experiment, we exhibit that our approach outperforms baselines in several datasets from bike sharing companies.

12.6AIAug 26, 2019

Learning Action Models from Disordered and Noisy Plan Traces

Hankz Hankui Zhuo, Jing Peng, Subbarao Kambhampati

There is increasing awareness in the planning community that the burden of specifying complete domain models is too high, which impedes the applicability of planning technology in many real-world domains. Although there have many learning systems that help automatically learning domain models, most existing work assumes that the input traces are completely correct. A more realistic situation is that the plan traces are disordered and noisy, such as plan traces described by natural language. In this paper we propose and evaluate an approach for doing this. Our approach takes as input a set of plan traces with disordered actions and noise and outputs action models that can best explain the plan traces. We use a MAX-SAT framework for learning, where the constraints are derived from the given plan traces. Unlike traditional action models learners, the states in plan traces can be partially observable and noisy as well as the actions in plan traces can be disordered and parallel. We demonstrate the effectiveness of our approach through a systematic empirical evaluation with both IPC domains and the real-world dataset extracted from natural language documents.

6.6IRJun 3, 2019

Federated Hierarchical Hybrid Networks for Clickbait Detection

Feng Liao, Hankz Hankui Zhuo, Xiaoling Huang et al.

Online media outlets adopt clickbait techniques to lure readers to click on articles in a bid to expand their reach and subsequently increase revenue through ad monetization. As the adverse effects of clickbait attract more and more attention, researchers have started to explore machine learning techniques to automatically detect clickbaits. Previous work on clickbait detection assumes that all the training data is available locally during training. In many real-world applications, however, training data is generally distributedly stored by different parties (e.g., different parties maintain data with different feature spaces), and the parties cannot share their data with each other due to data privacy issues. It is challenging to build models of high-quality federally for detecting clickbaits effectively without data sharing. In this paper, we propose a federated training framework, which is called federated hierarchical hybrid networks, to build clickbait detection models, where the titles and contents are stored by different parties, whose relationships must be exploited for clickbait detection. We empirically demonstrate that our approach is effective by comparing our approach to the state-of-the-art approaches using datasets from social media.

22.7LGJan 24, 2019

Federated Deep Reinforcement Learning

Hankz Hankui Zhuo, Wenfeng Feng, Yufeng Lin et al.

In deep reinforcement learning, building policies of high-quality is challenging when the feature space of states is small and the training data is limited. Despite the success of previous transfer learning approaches in deep reinforcement learning, directly transferring data or models from an agent to another agent is often not allowed due to the privacy of data and/or models in many privacy-aware applications. In this paper, we propose a novel deep reinforcement learning framework to federatively build models of high-quality for agents with consideration of their privacies, namely Federated deep Reinforcement Learning (FedRL). To protect the privacy of data and models, we exploit Gausian differentials on the information shared with each other when updating their local models. In the experiment, we evaluate our FedRL framework in two diverse domains, Grid-world and Text2Action domains, by comparing to various baselines.

1.7AIApr 19, 2018

An Integrated Development Environment for Planning Domain Modeling

Yuncong Li, Hankz Hankui Zhuo

In order to make the task, description of planning domains and problems, more comprehensive for non-experts in planning, the visual representation has been used in planning domain modeling in recent years. However, current knowledge engineering tools with visual modeling, like itSIMPLE (Vaquero et al. 2012) and VIZ (Vodrážka and Chrpa 2010), are less efficient than the traditional method of hand-coding by a PDDL expert using a text editor, and rarely involved in finetuning planning domains depending on the plan validation. Aim at this, we present an integrated development environment KAVI for planning domain modeling inspired by itSIMPLE and VIZ. KAVI using an abstract domain knowledge base to improve the efficiency of planning domain visual modeling. By integrating planners and a plan validator, KAVI proposes a method to fine-tune planning domains based on the plan validation.

20.6AIMar 7, 2018

Extracting Action Sequences from Texts Based on Deep Reinforcement Learning

Wenfeng Feng, Hankz Hankui Zhuo, Subbarao Kambhampati

Extracting action sequences from natural language texts is challenging, as it requires commonsense inferences based on world knowledge. Although there has been work on extracting action scripts, instructions, navigation actions, etc., they require that either the set of candidate actions be provided in advance, or that action descriptions are restricted to a specific form, e.g., description templates. In this paper, we aim to extract action sequences from texts in free natural language, i.e., without any restricted templates, provided the candidate set of actions is unknown. We propose to extract action sequences from texts based on the deep reinforcement learning framework. Specifically, we view "selecting" or "eliminating" words from texts as "actions", and the texts associated with actions as "states". We then build Q-networks to learn the policy of extracting actions and extract plans from the labeled texts. We demonstrate the effectiveness of our approach on several datasets with comparison to state-of-the-art approaches, including online experiments interacting with humans.

12.7AIMar 4, 2018

Discovering Underlying Plans Based on Shallow Models

Hankz Hankui Zhuo, Yantian Zha, Subbarao Kambhampati

Plan recognition aims to discover target plans (i.e., sequences of actions) behind observed actions, with history plan libraries or domain models in hand. Previous approaches either discover plans by maximally "matching" observed actions to plan libraries, assuming target plans are from plan libraries, or infer plans by executing domain models to best explain the observed actions, assuming that complete domain models are available. In real world applications, however, target plans are often not from plan libraries, and complete domain models are often not available, since building complete sets of plans and complete domain models are often difficult or expensive. In this paper we view plan libraries as corpora and learn vector representations of actions using the corpora, we then discover target plans based on the vector representations. Specifically, we propose two approaches, DUP and RNNPlanner, to discover target plans based on vector representations of actions. DUP explores the EM-style framework to capture local contexts of actions and discover target plans by optimizing the probability of target plans, while RNNPlanner aims to leverage long-short term contexts of actions based on RNNs (recurrent neural networks) framework to help recognize target plans. In the experiments, we empirically show that our approaches are capable of discovering underlying plans that are not from plan libraries, without requiring domain models provided. We demonstrate the effectiveness of our approaches by comparing its performance to traditional plan recognition approaches in three planning domains. We also compare DUP and RNNPlanner to see their advantages and disadvantages.

9.3IRMar 20, 2017

Paper2vec: Citation-Context Based Document Distributed Representation for Scholar Recommendation

Han Tian, Hankz Hankui Zhuo

Due to the availability of references of research papers and the rich information contained in papers, various citation analysis approaches have been proposed to identify similar documents for scholar recommendation. Despite of the success of previous approaches, they are, however, based on co-occurrence of items. Once there are no co-occurrence items available in documents, they will not work well. Inspired by distributed representations of words in the literature of natural language processing, we propose a novel approach to measuring the similarity of papers based on distributed representations learned from the citation context of papers. We view the set of papers as the vocabulary, define the weighted citation context of papers, and convert it to weight matrix similar to the word-word cooccurrence matrix in natural language processing. After that we explore a variant of matrix factorization approach to train distributed representations of papers on the matrix, and leverage the distributed representations to measure similarities of papers. In the experiment, we exhibit that our approach outperforms state-of-theart citation-based approaches by 25%, and better than other distributed representation based methods.

8.2IRMar 15, 2017

Distributed-Representation Based Hybrid Recommender System with Short Item Descriptions

Junhua He, Hankz Hankui Zhuo, Jarvan Law

Collaborative filtering (CF) aims to build a model from users' past behaviors and/or similar decisions made by other users, and use the model to recommend items for users. Despite of the success of previous collaborative filtering approaches, they are all based on the assumption that there are sufficient rating scores available for building high-quality recommendation models. In real world applications, however, it is often difficult to collect sufficient rating scores, especially when new items are introduced into the system, which makes the recommendation task challenging. We find that there are often "short" texts describing features of items, based on which we can approximate the similarity of items and make recommendation together with rating scores. In this paper we "borrow" the idea of vector representation of words to capture the information of short texts and embed it into a matrix factorization framework. We empirically show that our approach is effective by comparing it with state-of-the-art approaches.

1.3CLFeb 23, 2017

LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

Jarvan Law, Hankz Hankui Zhuo, Junhua He et al.

Topic models have been widely used in discovering latent topics which are shared across documents in text mining. Vector representations, word embeddings and topic embeddings, map words and topics into a low-dimensional and dense real-value vector space, which have obtained high performance in NLP tasks. However, most of the existing models assume the result trained by one of them are perfect correct and used as prior knowledge for improving the other model. Some other models use the information trained from external large corpus to help improving smaller corpus. In this paper, we aim to build such an algorithm framework that makes topic models and vector representations mutually improve each other within the same corpus. An EM-style algorithm framework is employed to iteratively optimize both topic model and vector representations. Experimental results show that our model outperforms state-of-art methods on various NLP tasks.

31.9AINov 25, 2015

Plan Explicability and Predictability for Robot Task Planning

Yu Zhang, Sarath Sreedharan, Anagha Kulkarni et al.

Intelligent robots and machines are becoming pervasive in human populated environments. A desirable capability of these agents is to respond to goal-oriented commands by autonomously constructing task plans. However, such autonomy can add significant cognitive load and potentially introduce safety risks to humans when agents behave unexpectedly. Hence, for such agents to be helpful, one important requirement is for them to synthesize plans that can be easily understood by humans. While there exists previous work that studied socially acceptable robots that interact with humans in "natural ways", and work that investigated legible motion planning, there lacks a general solution for high level task planning. To address this issue, we introduce the notions of plan {\it explicability} and {\it predictability}. To compute these measures, first, we postulate that humans understand agent plans by associating abstract tasks with agent actions, which can be considered as a labeling process. We learn the labeling scheme of humans for agent plans from training examples using conditional random fields (CRFs). Then, we use the learned model to label a new plan to compute its explicability and predictability. These measures can be used by agents to proactively choose or directly synthesize plans that are more explicable and predictable to humans. We provide evaluations on a synthetic domain and with human subjects using physical robots to show the effectiveness of our approach

15.2AINov 18, 2015

Discovering Underlying Plans Based on Distributed Representations of Actions

Xin Tian, Hankz Hankui Zhuo, Subbarao Kambhampati

5.6AIJul 29, 2013

Herding the Crowd: Automated Planning for Crowdsourced Planning

Kartik Talamadupula, Subbarao Kambhampati

There has been significant interest in crowdsourcing and human computation. One subclass of human computation applications are those directed at tasks that involve planning (e.g. travel planning) and scheduling (e.g. conference scheduling). Much of this work appears outside the traditional automated planning forums, and at the outset it is not clear whether automated planning has much of a role to play in these human computation systems. Interestingly however, work on these systems shows that even primitive forms of automated oversight of the human planner does help in significantly improving the effectiveness of the humans/crowd. In this paper, we will argue that the automated oversight used in these systems can be viewed as a primitive automated planner, and that there are several opportunities for more sophisticated automated planning in effectively steering crowdsourced planning. Straightforward adaptation of current planning technology is however hampered by the mismatch between the capabilities of human workers and automated planners. We identify two important challenges that need to be overcome before such adaptation of planning technology can occur: (i) interpreting the inputs of the human workers (and the requester) and (ii) steering or critiquing the plans being produced by the human workers armed only with incomplete domain and preference models. In this paper, we discuss approaches for handling these challenges, and characterize existing human computation systems in terms of the specific choices they make in handling these challenges.

9.9AIJul 28, 2012

Model-Lite Case-Based Planning

Hankz Hankui Zhuo, Subbarao Kambhampati, Tuan Nguyen

There is increasing awareness in the planning community that depending on complete models impedes the applicability of planning technology in many real world domains where the burden of specifying complete domain models is too high. In this paper, we consider a novel solution for this challenge that combines generative planning on incomplete domain models with a library of plan cases that are known to be correct. While this was arguably the original motivation for case-based planning, most existing case-based planners assume (and depend on) from-scratch planners that work on complete domain models. In contrast, our approach views the plan generated with respect to the incomplete model as a "skeletal plan" and augments it with directed mining of plan fragments from library cases. We will present the details of our approach and present an empirical evaluation of our method in comparison to a state-of-the-art case-based planner that depends on complete domain models.