Gang Chen

LG
h-index28
36papers
2,830citations
Novelty51%
AI Score44

36 Papers

14.9LGFeb 13, 2023Code
Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting

Yuchen Liu, Chen Chen, Lingjuan Lyu et al.

Federated learning has exhibited vulnerabilities to Byzantine attacks, where the Byzantine attackers can send arbitrary gradients to a central server to destroy the convergence and performance of the global model. A wealth of robust AGgregation Rules (AGRs) have been proposed to defend against Byzantine attacks. However, Byzantine clients can still circumvent robust AGRs when data is non-Identically and Independently Distributed (non-IID). In this paper, we first reveal the root causes of performance degradation of current robust AGRs in non-IID settings: the curse of dimensionality and gradient heterogeneity. In order to address this issue, we propose GAS, a \shorten approach that can successfully adapt existing robust AGRs to non-IID settings. We also provide a detailed convergence analysis when the existing robust AGRs are combined with GAS. Experiments on various real-world datasets verify the efficacy of our proposed GAS. The implementation code is provided in https://github.com/YuchenLiu-a/byzantine-gas.

22.1CLNov 27, 2023Code
FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models

Ruixuan Xiao, Yiwen Dong, Junbo Zhao et al.

Collecting high-quality labeled data for model training is notoriously time-consuming and labor-intensive for various NLP tasks. While copious solutions, such as active learning for small language models (SLMs) and prevalent in-context learning in the era of large language models (LLMs), have been proposed and alleviate the labeling burden to some extent, their performances are still subject to human intervention. It is still underexplored how to reduce the annotation cost in the LLMs era. To bridge this, we revolutionize traditional active learning and propose an innovative collaborative learning framework FreeAL to interactively distill and filter the task-specific knowledge from LLMs. During collaborative training, an LLM serves as an active annotator inculcating its coarse-grained knowledge, while a downstream SLM is incurred as a student to filter out high-quality in-context samples to feedback LLM for the subsequent label refinery. Extensive experiments on eight benchmark datasets demonstrate that FreeAL largely enhances the zero-shot performances for both SLM and LLM without any human supervision. The code is available at https://github.com/Justherozen/FreeAL .

6.9LGOct 3, 2022
MSRL: Distributed Reinforcement Learning with Dataflow Fragments

Huanzhou Zhu, Bo Zhao, Gang Chen et al.

Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. Different RL training algorithms offer different opportunities for distributing and parallelising the computation. Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e.g. policy network updates) on GPU workers. Fundamentally, current systems lack abstractions that decouple RL algorithms from their execution. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training computation is parallelised and distributed on cluster resources, without requiring changes to the algorithm implementation. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps Python functions from an RL algorithm's training loop to parallel computational fragments. Fragments are executed on different devices by translating them to low-level dataflow representations, e.g. computational graphs as supported by deep learning engines, CUDA implementations or multi-threaded CPU processes. We show that MSRL subsumes the distribution strategies of existing systems, while scaling RL training to 64 GPUs.

1.2DCFeb 9, 2023
FLAC: A Robust Failure-Aware Atomic Commit Protocol for Distributed Transactions

Hexiang Pan, Quang-Trung Ta, Meihui Zhang et al.

In distributed transaction processing, atomic commit protocol (ACP) is used to ensure database consistency. With the use of commodity compute nodes and networks, failures such as system crashes and network partitioning are common. It is therefore important for ACP to dynamically adapt to the operating condition for efficiency while ensuring the consistency of the database. Existing ACPs often assume stable operating conditions, hence, they are either non-generalizable to different environments or slow in practice. In this paper, we propose a novel and practical ACP, called Failure-Aware Atomic Commit (FLAC). In essence, FLAC includes three protocols, which are specifically designed for three different environments: (i) no failure occurs, (ii) participant nodes might crash but there is no delayed connection, or (iii) both crashed nodes and delayed connection can occur. It models these environments as the failure-free, crash-failure, and network-failure robustness levels. During its operation, FLAC can monitor if any failure occurs and dynamically switch to operate the most suitable protocol, using a robustness level state machine, whose parameters are fine-tuned by reinforcement learning. Consequently, it improves both the response time and throughput, and effectively handles nodes distributed across the Internet where crash and network failures might occur. We implement FLAC in a distributed transactional key-value storage system based on Google Percolator and evaluate its performance with both a micro benchmark and a macro benchmark of real workload. The results show that FLAC achieves up to 2.22x throughput improvement and 2.82x latency speedup, compared to existing ACPs for high-contention workloads.

0.5CLJul 17, 2023Code
A mixed policy to improve performance of language models on math problems

Gang Chen

When to solve math problems, most language models take a sampling strategy to predict next word according conditional probabilities. In the math reasoning step, it may generate wrong answer. Considering math problems are deterministic, we propose a mixed policy exploration approach to solve math problems with reinforcement learning. In peculiar, we propose a two level token exploration policy: the abstract level explores next token with probability and the second level is deterministic. Specifically, the abstract level policy will decide whether the token is operator or operand with probability sampling, while the second level is deterministic to select next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than $2\%$ performance gain. Our implementation is available at https://github.com/vividitytech/math_lm_rl.

21.1CLOct 28, 2023Code
TLM: Token-Level Masking for Transformers

Yangjun Wu, Kebin Fang, Dongxiang Zhang et al.

Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers. In this paper, we propose a new regularization scheme based on token-level rather than structure-level to reduce overfitting. Specifically, we devise a novel Token-Level Masking (TLM) training strategy for Transformers to regularize the connections of self-attention, which consists of two masking techniques that are effective and easy to implement. The underlying idea is to manipulate the connections between tokens in the multi-head attention via masking, where the networks are forced to exploit partial neighbors' information to produce a meaningful representation. The generality and effectiveness of TLM are thoroughly evaluated via extensive experiments on 4 diversified NLP tasks across 18 datasets, including natural language understanding benchmark GLUE, ChineseGLUE, Chinese Grammatical Error Correction, and data-to-text generation. The results indicate that TLM can consistently outperform attention dropout and DropHead, e.g., it increases by 0.5 points relative to DropHead with BERT-large on GLUE. Moreover, TLM can establish a new record on the data-to-text benchmark Rotowire (18.93 BLEU). Our code will be publicly available at https://github.com/Young1993/tlm.

8.5AISep 27, 2024
Cost-Aware Dynamic Cloud Workflow Scheduling using Self-Attention and Evolutionary Reinforcement Learning

Ya Shen, Gang Chen, Hui Ma et al.

The Cost-aware Dynamic Multi-Workflow Scheduling (CDMWS) in the cloud is a kind of cloud workflow management problem, which aims to assign virtual machine (VM) instances to execute tasks in workflows so as to minimize the total costs, including both the penalties for violating Service Level Agreement (SLA) and the VM rental fees. Powered by deep neural networks, Reinforcement Learning (RL) methods can construct effective scheduling policies for solving CDMWS problems. Traditional policy networks in RL often use basic feedforward architectures to separately determine the suitability of assigning any VM instances, without considering all VMs simultaneously to learn their global information. This paper proposes a novel self-attention policy network for cloud workflow scheduling (SPN-CWS) that captures global information from all VMs. We also develop an Evolution Strategy-based RL (ERL) system to train SPN-CWS reliably and effectively. The trained SPN-CWS can effectively process all candidate VM instances simultaneously to identify the most suitable VM instance to execute every workflow task. Comprehensive experiments show that our method can noticeably outperform several state-of-the-art algorithms on multiple benchmark CDMWS problems.

6.6LGOct 18, 2023
Learning to Optimise Climate Sensor Placement using a Transformer

Chen Wang, Victoria Huang, Gang Chen et al.

The optimal placement of sensors for environmental monitoring and disaster management is a challenging problem due to its NP-hard nature. Traditional methods for sensor placement involve exact, approximation, or heuristic approaches, with the latter being the most widely used. However, heuristic methods are limited by expert intuition and experience. Deep learning (DL) has emerged as a promising approach for generating heuristic algorithms automatically. In this paper, we introduce a novel sensor placement approach focused on learning improvement heuristics using deep reinforcement learning (RL) methods. Our approach leverages an RL formulation for learning improvement heuristics, driven by an actor-critic algorithm for training the policy network. We compare our method with several state-of-the-art approaches by conducting comprehensive experiments, demonstrating the effectiveness and superiority of our proposed approach in producing high-quality solutions. Our work presents a promising direction for applying advanced DL and RL techniques to challenging climate sensor placement problems.

3.3LGSep 29, 2022
Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical Multi-Step Approach for Policy Training

Gang Chen, Victoria Huang

Actor-critic deep reinforcement learning (DRL) algorithms have recently achieved prominent success in tackling various challenging reinforcement learning (RL) problems, particularly complex control tasks with high-dimensional continuous state and action spaces. Nevertheless, existing research showed that actor-critic DRL algorithms often failed to explore their learning environments effectively, resulting in limited learning stability and performance. To address this limitation, several ensemble DRL algorithms have been proposed lately to boost exploration and stabilize the learning process. However, most of existing ensemble algorithms do not explicitly train all base learners towards jointly optimizing the performance of the ensemble. In this paper, we propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method. This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration through stable inter-learner parameter sharing. The design of our new algorithm is verified theoretically. The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.

5.1DBMay 18, 2025Code
HAKES: Scalable Vector Database for Embedding Search Service

Guoyu Hu, Shaofeng Cai, Tien Tuan Anh Dinh et al.

Modern deep learning models capture the semantics of complex data by transforming them into high-dimensional embedding vectors. Emerging applications, such as retrieval-augmented generation, use approximate nearest neighbor (ANN) search in the embedding vector space to find similar data. Existing vector databases provide indexes for efficient ANN searches, with graph-based indexes being the most popular due to their low latency and high recall in real-world high-dimensional datasets. However, these indexes are costly to build, suffer from significant contention under concurrent read-write workloads, and scale poorly to multiple servers. Our goal is to build a vector database that achieves high throughput and high recall under concurrent read-write workloads. To this end, we first propose an ANN index with an explicit two-stage design combining a fast filter stage with highly compressed vectors and a refine stage to ensure recall, and we devise a novel lightweight machine learning technique to fine-tune the index parameters. We introduce an early termination check to dynamically adapt the search process for each query. Next, we add support for writes while maintaining search performance by decoupling the management of the learned parameters. Finally, we design HAKES, a distributed vector database that serves the new index in a disaggregated architecture. We evaluate our index and system against 12 state-of-the-art indexes and three distributed vector databases, using high-dimensional embedding datasets generated by deep learning models. The experimental results show that our index outperforms index baselines in the high recall region and under concurrent read-write workloads. Furthermore, \namesys{} is scalable and achieves up to $16\times$ higher throughputs than the baselines. The HAKES project is open-sourced at https://www.comp.nus.edu.sg/~dbsystem/hakes/.

3.6CVOct 16, 2025Code
Unifying Environment Perception and Route Choice Modeling for Trajectory Representation Learning

Ji Cao, Yu Wang, Tongya Zheng et al.

Trajectory Representation Learning (TRL) aims to encode raw trajectories into low-dimensional vectors, which can then be leveraged in various downstream tasks, including travel time estimation, location prediction, and trajectory similarity analysis. However, existing TRL methods suffer from a key oversight: treating trajectories as isolated spatio-temporal sequences, without considering the external environment and internal route choice behavior that govern their formation. To bridge this gap, we propose a novel framework that unifies comprehensive environment \textbf{P}erception and explicit \textbf{R}oute choice modeling for effective \textbf{Traj}ectory representation learning, dubbed \textbf{PRTraj}. Specifically, PRTraj first introduces an Environment Perception Module to enhance the road network by capturing multi-granularity environmental semantics from surrounding POI distributions. Building on this environment-aware backbone, a Route Choice Encoder then captures the route choice behavior inherent in each trajectory by modeling its constituent road segment transitions as a sequence of decisions. These route-choice-aware representations are finally aggregated to form the global trajectory embedding. Extensive experiments on 3 real-world datasets across 5 downstream tasks validate the effectiveness and generalizability of PRTraj. Moreover, PRTraj demonstrates strong data efficiency, maintaining robust performance under few-shot scenarios. Our code is available at: https://anonymous.4open.science/r/PRTraj.

5.8AIMay 18, 2025Code
GATES: Cost-aware Dynamic Workflow Scheduling via Graph Attention Networks and Evolution Strategy

Ya Shen, Gang Chen, Hui Ma et al.

Cost-aware Dynamic Workflow Scheduling (CADWS) is a key challenge in cloud computing, focusing on devising an effective scheduling policy to efficiently schedule dynamically arriving workflow tasks, represented as Directed Acyclic Graphs (DAG), to suitable virtual machines (VMs). Deep reinforcement learning (DRL) has been widely employed for automated scheduling policy design. However, the performance of DRL is heavily influenced by the design of the problem-tailored policy network and is highly sensitive to hyperparameters and the design of reward feedback. Considering the above-mentioned issues, this study proposes a novel DRL method combining Graph Attention Networks-based policy network and Evolution Strategy, referred to as GATES. The contributions of GATES are summarized as follows: (1) GATES can capture the impact of current task scheduling on subsequent tasks by learning the topological relationships between tasks in a DAG. (2) GATES can assess the importance of each VM to the ready task, enabling it to adapt to dynamically changing VM resources. (3) Utilizing Evolution Strategy's robustness, exploratory nature, and tolerance for delayed rewards, GATES achieves stable policy learning in CADWS. Extensive experimental results demonstrate the superiority of the proposed GATES in CADWS, outperforming several state-of-the-art algorithms. The source code is available at: https://github.com/YaShen998/GATES.

22.9LGMay 15, 2023Code
Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility

Wentao Ye, Mingfeng Ou, Tianyi Li et al.

The recent popularity of large language models (LLMs) has brought a significant impact to boundless fields, particularly through their open-ended ecosystem such as the APIs, open-sourced models, and plugins. However, with their widespread deployment, there is a general lack of research that thoroughly discusses and analyzes the potential risks concealed. In that case, we intend to conduct a preliminary but pioneering study covering the robustness, consistency, and credibility of LLMs systems. With most of the related literature in the era of LLM uncharted, we propose an automated workflow that copes with an upscaled number of queries/responses. Overall, we conduct over a million queries to the mainstream LLMs including ChatGPT, LLaMA, and OPT. Core to our workflow consists of a data primitive, followed by an automated interpreter that evaluates these LLMs under different adversarial metrical systems. As a result, we draw several, and perhaps unfortunate, conclusions that are quite uncommon from this trendy community. Briefly, they are: (i)-the minor but inevitable error occurrence in the user-generated query input may, by chance, cause the LLM to respond unexpectedly; (ii)-LLMs possess poor consistency when processing semantically similar query input. In addition, as a side finding, we find that ChatGPT is still capable to yield the correct answer even when the input is polluted at an extreme level. While this phenomenon demonstrates the powerful memorization of the LLMs, it raises serious concerns about using such data for LLM-involved evaluation in academic development. To deal with it, we propose a novel index associated with a dataset that roughly decides the feasibility of using such data for LLM-involved evaluation. Extensive empirical studies are tagged to support the aforementioned claims.

1.2SIMay 15, 2025Code
Advancing Community Detection with Graph Convolutional Neural Networks: Bridging Topological and Attributive Cohesion

Anjali de Silva, Gang Chen, Hui Ma et al.

Community detection, a vital technology for real-world applications, uncovers cohesive node groups (communities) by leveraging both topological and attribute similarities in social networks. However, existing Graph Convolutional Networks (GCNs) trained to maximize modularity often converge to suboptimal solutions. Additionally, directly using human-labeled communities for training can undermine topological cohesiveness by grouping disconnected nodes based solely on node attributes. We address these issues by proposing a novel Topological and Attributive Similarity-based Community detection (TAS-Com) method. TAS-Com introduces a novel loss function that exploits the highly effective and scalable Leiden algorithm to detect community structures with global optimal modularity. Leiden is further utilized to refine human-labeled communities to ensure connectivity within each community, enabling TAS-Com to detect community structures with desirable trade-offs between modularity and compliance with human labels. Experimental results on multiple benchmark networks confirm that TAS-Com can significantly outperform several state-of-the-art algorithms.

3.4CLJan 20, 2024
Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA

Xi Chen, MingKe You, Li Wang et al.

The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. This study focused on evaluating and enhancing the clinical capabilities of LLMs in specific domains, using osteoarthritis (OA) management as a case study. A domain specific benchmark framework was developed, which evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM tailored for OA management that integrates retrieval-augmented generation (RAG) and instruction prompts, was developed. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results showed that general LLMs like GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. This study introduces a novel benchmark framework which assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs.

2.0LGMay 18, 2023
Deep Metric Tensor Regularized Policy Gradient

Gang Chen, Victoria Huang

Policy gradient algorithms are an important family of deep reinforcement learning techniques. Many past research endeavors focused on using the first-order policy gradient information to train policy networks. Different from these works, we conduct research in this paper driven by the believe that properly utilizing and controlling Hessian information associated with the policy gradient can noticeably improve the performance of policy gradient algorithms. One key Hessian information that attracted our attention is the Hessian trace, which gives the divergence of the policy gradient vector field in the Euclidean policy parametric space. We set the goal to generalize this Euclidean policy parametric space into a general Riemmanian manifold by introducing a metric tensor field $g_ab$ in the parametric space. This is achieved through newly developed mathematical tools, deep learning algorithms, and metric tensor deep neural networks (DNNs). Armed with these technical developments, we propose a new policy gradient algorithm that learns to minimize the absolute divergence in the Riemannian manifold as an important regularization mechanism, allowing the Riemannian manifold to smoothen its policy gradient vector field. The newly developed algorithm is experimentally studied on several benchmark reinforcement learning problems. Our experiments clearly show that the new metric tensor regularized algorithm can significantly outperform its counterpart that does not use our regularization technique. Additional experimental analysis further suggests that the trained metric tensor DNN and the corresponding metric tensor $g_{ab}$ can effectively reduce the absolute divergence towards zero in the Riemannian manifold.

1.2NIFeb 6, 2021
Multi-Agent Deep Reinforcement Learning for Request Dispatching in Distributed-Controller Software-Defined Networking

Victoria Huang, Gang Chen, Qiang Fu

Recently, distributed controller architectures have been quickly gaining popularity in Software-Defined Networking (SDN). However, the use of distributed controllers introduces a new and important Request Dispatching (RD) problem with the goal for every SDN switch to properly dispatch their requests among all controllers so as to optimize network performance. This goal can be fulfilled by designing an RD policy to guide distribution of requests at each switch. In this paper, we propose a Multi-Agent Deep Reinforcement Learning (MA-DRL) approach to automatically design RD policies with high adaptability and performance. This is achieved through a new problem formulation in the form of a Multi-Agent Markov Decision Process (MA-MDP), a new adaptive RD policy design and a new MA-DRL algorithm called MA-PPO. Extensive simulation studies show that our MA-DRL technique can effectively train RD policies to significantly outperform man-made policies, model-based policies, as well as RD policies learned via single-agent DRL algorithms.

13.8SEOct 17, 2020
MLCask: Efficient Management of Component Evolution in Collaborative Data Analytics Pipelines

Zhaojing Luo, Sai Ho Yeung, Meihui Zhang et al.

With the ever-increasing adoption of machine learning for data analytics, maintaining a machine learning pipeline is becoming more complex as both the datasets and trained models evolve with time. In a collaborative environment, the changes and updates due to pipeline evolution often cause cumbersome coordination and maintenance work, raising the costs and making it hard to use. Existing solutions, unfortunately, do not address the version evolution problem, especially in a collaborative environment where non-linear version control semantics are necessary to isolate operations made by different user roles. The lack of version control semantics also incurs unnecessary storage consumption and lowers efficiency due to data duplication and repeated data pre-processing, which are avoidable. In this paper, we identify two main challenges that arise during the deployment of machine learning pipelines, and address them with the design of versioning for an end-to-end analytics system MLCask. The system supports multiple user roles with the ability to perform Git-like branching and merging operations in the context of the machine learning pipelines. We define and accelerate the metric-driven merge operation by pruning the pipeline search tree using reusable history records and pipeline compatibility information. Further, we design and implement the prioritized pipeline search, which gives preference to the pipelines that probably yield better performance. The effectiveness of MLCask is evaluated through an extensive study over several real-world deployment cases. The performance evaluation shows that the proposed merge operation is up to 7.8x faster and saves up to 11.9x storage space than the baseline method that does not utilize history records.

4.2LGJun 12, 2020
Decorrelated Double Q-learning

Gang Chen

Q-learning with value function approximation may have the poor performance because of overestimation bias and imprecise estimate. Specifically, overestimation bias is from the maximum operator over noise estimate, which is exaggerated using the estimate of a subsequent state. Inspired by the recent advance of deep reinforcement learning and Double Q-learning, we introduce the decorrelated double Q-learning (D2Q). Specifically, we introduce the decorrelated regularization item to reduce the correlation between value function approximators, which can lead to less biased estimation and low variance. The experimental results on a suite of MuJoCo continuous control tasks demonstrate that our decorrelated double Q-learning can effectively improve the performance.

1.5CLDec 5, 2019
Learning to Predict Explainable Plots for Neural Story Generation

Gang Chen, Yang Liu, Huanbo Luan et al.

Story generation is an important natural language processing task that aims to generate coherent stories automatically. While the use of neural networks has proven effective in improving story generation, how to learn to generate an explainable high-level plot still remains a major challenge. In this work, we propose a latent variable model for neural story generation. The model treats an outline, which is a natural language sentence explainable to humans, as a latent variable to represent a high-level plot that bridges the input and output. We adopt an external summarization model to guide the latent variable model to learn how to generate outlines from training data. Experiments show that our approach achieves significant improvements over state-of-the-art methods in both automatic and human evaluations.

2.7LGNov 24, 2019
Merging Deterministic Policy Gradient Estimations with Varied Bias-Variance Tradeoff for Effective Deep Reinforcement Learning

Gang Chen

Deep reinforcement learning (DRL) on Markov decision processes (MDPs) with continuous action spaces is often approached by directly training parametric policies along the direction of estimated policy gradients (PGs). Previous research revealed that the performance of these PG algorithms depends heavily on the bias-variance tradeoffs involved in estimating and using PGs. A notable approach towards balancing this tradeoff is to merge both on-policy and off-policy gradient estimations. However existing PG merging methods can be sample inefficient and are not suitable to train deterministic policies directly. To address these issues, this paper introduces elite PGs and strengthens their variance reduction effect by adopting elitism and policy consolidation techniques to regularize policy training based on policy behavioral knowledge extracted from elite trajectories. Meanwhile, we propose a two-step method to merge elite PGs and conventional PGs as a new extension of the conventional interpolation merging method. At both the theoretical and experimental levels, we show that both two-step merging and interpolation merging can induce varied bias-variance tradeoffs during policy training. They enable us to effectively use elite PGs and mitigate their performance impact on trained policies. Our experiments also show that two-step merging can outperform interpolation merging and several state-of-the-art algorithms on six benchmark control tasks.

1.0LGNov 11, 2019
Context-aware Active Multi-Step Reinforcement Learning

Gang Chen, Dingcheng Li, Ran Xu

Reinforcement learning has attracted great attention recently, especially policy gradient algorithms, which have been demonstrated on challenging decision making and control tasks. In this paper, we propose an active multi-step TD algorithm with adaptive stepsizes to learn actor and critic. Specifically, our model consists of two components: active stepsize learning and adaptive multi-step TD algorithm. Firstly, we divide the time horizon into chunks and actively select state and action inside each chunk. Then given the selected samples, we propose the adaptive multi-step TD, which generalizes TD($λ$), but adaptively switch on/off the backups from future returns of different steps. Particularly, the adaptive multi-step TD introduces a context-aware mechanism, here a binary classifier, which decides whether or not to turn on its future backups based on the context changes. Thus, our model is kind of combination of active learning and multi-step TD algorithm, which has the capacity for learning off-policy without the need of importance sampling. We evaluate our approach on both discrete and continuous space tasks in an off-policy setting respectively, and demonstrate competitive results compared to other reinforcement learning baselines.

7.1LGOct 21, 2019
A New Framework for Multi-Agent Reinforcement Learning -- Centralized Training and Exploration with Decentralized Execution via Policy Distillation

Gang Chen

Deep reinforcement learning (DRL) is a booming area of artificial intelligence. Many practical applications of DRL naturally involve more than one collaborative learners, making it important to study DRL in a multi-agent context. Previous research showed that effective learning in complex multi-agent systems demands for highly coordinated environment exploration among all the participating agents. Many researchers attempted to cope with this challenge through learning centralized value functions. However, the common strategy for every agent to learn their local policies directly often fail to nurture strong inter-agent collaboration and can be sample inefficient whenever agents alter their communication channels. To address these issues, we propose a new framework known as centralized training and exploration with decentralized execution via policy distillation. Guided by this framework and the maximum-entropy learning technique, we will first train agents' policies with shared global component to foster coordinated and effective learning. Locally executable policies will be derived subsequently from the trained global policies via policy distillation. Experiments show that our new framework and algorithm can achieve significantly better performance and higher sample efficiency than a cutting-edge baseline on several multi-agent DRL benchmarks.

2.0AIJun 19, 2019
Memetic EDA-Based Approaches to Comprehensive Quality-Aware Automated Semantic Web Service Composition

Chen Wang, Hui Ma, Gang Chen et al.

Comprehensive quality-aware automated semantic web service composition is an NP-hard problem, where service composition workflows are unknown, and comprehensive quality, i.e., Quality of services (QoS) and Quality of semantic matchmaking (QoSM) are simultaneously optimized. The objective of this problem is to find a solution with optimized or near-optimized overall QoS and QoSM within polynomial time over a service request. In this paper, we proposed novel memetic EDA-based approaches to tackle this problem. The proposed method investigates the effectiveness of several neighborhood structures of composite services by proposing domain-dependent local search operators. Apart from that, a joint strategy of the local search procedure is proposed to integrate with a modified EDA to reduce the overall computation time of our memetic approach. To better demonstrate the effectiveness and scalability of our approach, we create a more challenging, augmented version of the service composition benchmark based on WSC-08 \cite{bansal2008wsc} and WSC-09 \cite{kona2009wsc}. Experimental results on this benchmark show that one of our proposed memetic EDA-based approach (i.e., MEEDA-LOP) significantly outperforms existing state-of-the-art algorithms.

11.9AIFeb 18, 2019
Evolutionary Multitasking for Semantic Web Service Composition

Chen Wang, Hui Ma, Gang Chen et al.

Web services are basic functions of a software system to support the concept of service-oriented architecture. They are often composed together to provide added values, known as web service composition. Researchers often employ Evolutionary Computation techniques to efficiently construct composite services with near-optimized functional quality (i.e., Quality of Semantic Matchmaking) or non-functional quality (i.e., Quality of Service) or both due to the complexity of this problem. With a significant increase in service composition requests, many composition requests have similar input and output requirements but may vary due to different preferences from different user segments. This problem is often treated as a multi-objective service composition so as to cope with different preferences from different user segments simultaneously. Without taking a multi-objective approach that gives rise to a solution selection challenge, we perceive multiple similar service composition requests as jointly forming an evolutionary multi-tasking problem in this work. We propose an effective permutation-based evolutionary multi-tasking approach that can simultaneously generate a set of solutions, with one for each service request. We also introduce a neighborhood structure over multiple tasks to allow newly evolved solutions to be evaluated on related tasks. Our proposed method can perform better at the cost of only a fraction of time, compared to one state-of-art single-tasking EC-based method. We also found that the use of the proper neighborhood structure can enhance the effectiveness of our approach.

2.3NIFeb 14, 2019
Optimizing Controller Placement for Software-Defined Networks

Victoria Huang, Gang Chen, Qiang Fu et al.

Controller placement problem (CPP) is a key issue for Software-Defined Networking (SDN) with distributed controller architectures. This problem aims to determine a suitable number of controllers deployed in important locations so as to optimize the overall network performance. In comparison to communication delay, existing literature on the CPP assumes that the influence of controller workload distribution on network performance is negligible. In this paper, we tackle the CPP that simultaneously considers the communication delay, the control plane utilization, and the controller workload distribution. Due to this reason, our CPP is intrinsically different from and clearly more difficult than any previously studied CPPs that are NP-hard. To tackle this challenging issue, we develop a new algorithm that seamlessly integrates the genetic algorithm (GA) and the gradient descent (GD) optimization method. Particularly, GA is used to search for suitable CPP solutions. The quality of each solution is further evaluated through GD. Simulation results on two representative network scenarios (small-scale and large-scale) show that our algorithm can effectively strike the trade-off between the control plane utilization and the network response time.

5.4LGFeb 14, 2019
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

Gang Chen, Yiming Peng

We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness.

2.0AIJan 26, 2019
Composing Distributed Data-intensive Web Services Using a Flexible Memetic Algorithm

Soheila Sadeghiram, Hui Ma, Gang Chen

Web Service Composition (WSC) is a particularly promising application of Web services, where multiple individual services with specific functionalities are composed to accomplish a more complex task, which must fulfil functional requirements and optimise Quality of Service (QoS) attributes, simultaneously. Additionally, large quantities of data, produced by technological advances, need to be exchanged between services. Data-intensive Web services, which manipulate and deal with those data, are of great interest to implement data-intensive processes, such as distributed Data-intensive Web Service Composition (DWSC). Researchers have proposed Evolutionary Computing (EC) fully-automated WSC techniques that meet all the above factors. Some of these works employed Memetic Algorithms (MAs) to enhance the performance of EC through increasing its exploitation ability of in searching neighbourhood area of a solution. However, those works are not efficient or effective. This paper proposes an MA-based approach to solving the problem of distributed DWSC in an effective and efficient manner. In particular, we develop an MA that hybridises EC with a flexible local search technique incorporating distance of services. An evaluation using benchmark datasets is carried out, comparing existing state-of-the-art methods. Results show that our proposed method has the highest quality and an acceptable execution time overall.

8.3LGSep 2, 2018
Effective Exploration for Deep Reinforcement Learning via Bootstrapped Q-Ensembles under Tsallis Entropy Regularization

Gang Chen, Yiming Peng, Mengjie Zhang

Recently deep reinforcement learning (DRL) has achieved outstanding success on solving many difficult and large-scale RL problems. However the high sample cost required for effective learning often makes DRL unaffordable in resource-limited applications. With the aim of improving sample efficiency and learning performance, we will develop a new DRL algorithm in this paper that seamless integrates entropy-induced and bootstrap-induced techniques for efficient and deep exploration of the learning environment. Specifically, a general form of Tsallis entropy regularizer will be utilized to drive entropy-induced exploration based on efficient approximation of optimal action-selection policies. Different from many existing works that rely on action dithering strategies for exploration, our algorithm is efficient in exploring actions with clear exploration value. Meanwhile, by employing an ensemble of Q-networks under varied Tsallis entropy regularization, the diversity of the ensemble can be further enhanced to enable effective bootstrap-induced exploration. Experiments on Atari game playing tasks clearly demonstrate that our new algorithm can achieve more efficient and effective exploration for DRL, in comparison to recently proposed exploration methods including Bootstrapped Deep Q-Network and UCB Q-Ensemble.

5.6AIApr 26, 2018
PANDA: Facilitating Usable AI Development

Jinyang Gao, Wei Wang, Meihui Zhang et al.

Recent advances in artificial intelligence (AI) and machine learning have created a general perception that AI could be used to solve complex problems, and in some situations over-hyped as a tool that can be so easily used. Unfortunately, the barrier to realization of mass adoption of AI on various business domains is too high because most domain experts have no background in AI. Developing AI applications involves multiple phases, namely data preparation, application modeling, and product deployment. The effort of AI research has been spent mostly on new AI models (in the model training stage) to improve the performance of benchmark tasks such as image recognition. Many other factors such as usability, efficiency and security of AI have not been well addressed, and therefore form a barrier to democratizing AI. Further, for many real world applications such as healthcare and autonomous driving, learning via huge amounts of possibility exploration is not feasible since humans are involved. In many complex applications such as healthcare, subject matter experts (e.g. Clinicians) are the ones who appreciate the importance of features that affect health, and their knowledge together with existing knowledge bases are critical to the end results. In this paper, we take a new perspective on developing AI solutions, and present a solution for making AI usable. We hope that this resolution will enable all subject matter experts (eg. Clinicians) to exploit AI like data scientists.

12.0LGApr 17, 2018
An Adaptive Clipping Approach for Proximal Policy Optimization

Gang Chen, Yiming Peng, Mengjie Zhang

Very recently proximal policy optimization (PPO) algorithms have been proposed as first-order optimization methods for effective reinforcement learning. While PPO is inspired by the same learning theory that justifies trust region policy optimization (TRPO), PPO substantially simplifies algorithm design and improves data efficiency by performing multiple epochs of \emph{clipped policy optimization} from sampled data. Although clipping in PPO stands for an important new mechanism for efficient and reliable policy update, it may fail to adaptively improve learning performance in accordance with the importance of each sampled state. To address this issue, a new surrogate learning objective featuring an adaptive clipping mechanism is proposed in this paper, enabling us to develop a new algorithm, known as PPO-$λ$. PPO-$λ$ optimizes policies repeatedly based on a theoretical target for adaptive policy improvement. Meanwhile, destructively large policy update can be effectively prevented through both clipping and adaptive control of a hyperparameter $λ$ in PPO-$λ$, ensuring high learning reliability. PPO-$λ$ enjoys the same simple and efficient design as PPO. Empirically on several Atari game playing tasks and benchmark control tasks, PPO-$λ$ also achieved clearly better performance than PPO.

22.1DBApr 17, 2018Code
Rafiki: Machine Learning as an Analytics Service System

Wei Wang, Sheng Wang, Jinyang Gao et al.

Big data analytics is gaining massive momentum in the last few years. Applying machine learning models to big data has become an implicit requirement or an expectation for most analysis tasks, especially on high-stakes applications.Typical applications include sentiment analysis against reviews for analyzing on-line products, image classification in food logging applications for monitoring user's daily intake and stock movement prediction. Extending traditional database systems to support the above analysis is intriguing but challenging. First, it is almost impossible to implement all machine learning models in the database engines. Second, expertise knowledge is required to optimize the training and inference procedures in terms of efficiency and effectiveness, which imposes heavy burden on the system users. In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models, and facilitate complex analytics on top of cloud platforms. Rafiki provides distributed hyper-parameter tuning for the training service, and online ensemble modeling for the inference service which trades off between latency and accuracy. Experimental results confirm the efficiency, effectiveness, scalability and usability of Rafiki.

14.9DBFeb 14, 2018
ForkBase: An Efficient Storage Engine for Blockchain and Forkable Applications

Sheng Wang, Tien Tuan Anh Dinh, Qian Lin et al.

Existing data storage systems offer a wide range of functionalities to accommodate an equally diverse range of applications. However, new classes of applications have emerged, e.g., blockchain and collaborative analytics, featuring data versioning, fork semantics, tamper-evidence or any combination thereof. They present new opportunities for storage systems to efficiently support such applications by embedding the above requirements into the storage. In this paper, we present ForkBase, a storage engine specifically designed to provide efficient support for blockchain and forkable applications. By integrating the core application properties into the storage, ForkBase not only delivers high performance but also reduces development effort. Data in ForkBase is multi-versioned, and each version uniquely identifies the data content and its history. Two variants of fork semantics are supported in ForkBase to facilitate any collaboration workflows. A novel index structure is introduced to efficiently identify and eliminate duplicate content across data objects. Consequently, ForkBase is not only efficient in performance, but also in space requirement. We demonstrate the performance of ForkBase using three applications: a blockchain platform, a wiki engine and a collaborative analytics application. We conduct extensive experimental evaluation of these applications against respective state-of-the-art system. The results show that ForkBase achieves superior performance while significantly lowering the development cost.

21.4DBAug 17, 2017Code
Untangling Blockchain: A Data Processing View of Blockchain Systems

Tien Tuan Anh Dinh, Rui Liu, Meihui Zhang et al.

Blockchain technologies are gaining massive momentum in the last few years. Blockchains are distributed ledgers that enable parties who do not fully trust each other to maintain a set of global states. The parties agree on the existence, values and histories of the states. As the technology landscape is expanding rapidly, it is both important and challenging to have a firm grasp of what the core technologies have to offer, especially with respect to their data processing capabilities. In this paper, we first survey the state of the art, focusing on private blockchains (in which parties are authenticated). We analyze both in-production and research systems in four dimensions: distributed ledger, cryptography, consensus protocol and smart contract. We then present BLOCKBENCH, a benchmarking framework for understanding performance of private blockchains against data processing workloads. We conduct a comprehensive evaluation of three major blockchain systems based on BLOCKBENCH, namely Ethereum, Parity and Hyperledger Fabric. The results demonstrate several trade-offs in the design space, as well as big performance gaps between blockchain and database systems. Drawing from design principles of database systems, we discuss several research directions for bringing blockchain performance closer to the realm of databases.

22.6DBMar 12, 2017Code
BLOCKBENCH: A Framework for Analyzing Private Blockchains

Tien Tuan Anh Dinh, Ji Wang, Gang Chen et al.

Blockchain technologies are taking the world by storm. Public blockchains, such as Bitcoin and Ethereum, enable secure peer-to-peer applications like crypto-currency or smart contracts. Their security and performance are well studied. This paper concerns recent private blockchain systems designed with stronger security (trust) assumption and performance requirement. These systems target and aim to disrupt applications which have so far been implemented on top of database systems, for example banking, finance applications. Multiple platforms for private blockchains are being actively developed and fine tuned. However, there is a clear lack of a systematic framework with which different systems can be analyzed and compared against each other. Such a framework can be used to assess blockchains' viability as another distributed data processing platform, while helping developers to identify bottlenecks and accordingly improve their platforms. In this paper, we first describe BlockBench, the first evaluation framework for analyzing private blockchains. It serves as a fair means of comparison for different platforms and enables deeper understanding of different system design choices. Any private blockchain can be integrated to BlockBench via simple APIs and benchmarked against workloads that are based on real and synthetic smart contracts. BlockBench measures overall and component-wise performance in terms of throughput, latency, scalability and fault-tolerance. Next, we use BlockBench to conduct comprehensive evaluation of three major private blockchains: Ethereum, Parity and Hyperledger Fabric. The results demonstrate that these systems are still far from displacing current database systems in traditional data processing workloads. Furthermore, there are gaps in performance among the three systems which are attributed to the design choices at different layers of the software stack.

14.3LGJan 13, 2015
Deep Learning with Nonparametric Clustering

Gang Chen

Clustering is an essential problem in machine learning and data mining. One vital factor that impacts clustering performance is how to learn or design the data representation (or features). Fortunately, recent advances in deep learning can learn unsupervised features effectively, and have yielded state of the art performance in many classification problems, such as character recognition, object recognition and document categorization. However, little attention has been paid to the potential of deep learning for unsupervised clustering problems. In this paper, we propose a deep belief network with nonparametric clustering. As an unsupervised method, our model first leverages the advantages of deep learning for feature representation and dimension reduction. Then, it performs nonparametric clustering under a maximum margin framework -- a discriminative clustering model and can be trained online efficiently in the code space. Lastly model parameters are refined in the deep belief network. Thus, this model can learn features for clustering and infer model complexity in an unified framework. The experimental results show the advantage of our approach over competitive baselines.