Weihua Gui

OC
8papers
289citations
Novelty43%
AI Score42

8 Papers

ROJun 2Code
Preference-Calibrated Human-in-the-Loop Reinforcement Learning for Robotic Manipulation

Zeyi Liu, Guangyao Liu, Yinuo Qu et al.

Human-in-the-loop reinforcement learning (HIL-RL) improves sample efficiency in real-robot manipulation through online human intervention. However, successful trajectories may include suboptimal actions that deviate from the desired task-execution path and force human intervention. Existing HIL-RL methods typically apply the consistent credit assignment principle to all transitions, uniformly propagating discounted terminal rewards through suboptimal segments, ignoring the actual contribution of each transition to task success. This overestimates Q-values for critic learning and indirectly misguides actor updates toward suboptimal behavior patterns. To this end, we propose PACT, a Preference-calibrated Actor-Critic Training framework that leverages the implicit preference signals induced by intervention to perform credit reassignment on identified suboptimal segments while directly guiding policy training for unbiased critic-actor learning. Specifically, we first design a progress model that learns from human demonstration and identifies suboptimal segments for credit correction. Then, from the human action and resampled policy action at the intervention state, we build preference pairs to define a counterfactual advantage that penalizes Bellman targets of the identified suboptimal segment, enabling directional credit calibration. Moreover, we directly align the policy with human corrective actions in the bounded mean space, providing an additional signal beyond critic-guided updates. Across five real-robot manipulation tasks, PACT improves the average success rate by 24.5% and achieves 1.3 times faster convergence, thereby improving both RL sample efficiency and performance. Code is available at https://anonymous.4open.science/r/HILRL-A1X-BC05.

SYJul 22, 2019
Categorization Problem on Controllability of Boolean Control Networks

Qunxi Zhu, Zuguang Gao, Yang Liu et al.

A Boolean control network (BCN) is a discrete-time dynamical system whose variables take values from a binary set $\{0,1\}$. At each time step, each variable of the BCN updates its value simultaneously according to a Boolean function which takes the state and control of the previous time step as its input. Given an ordered pair of states of a BCN, we define the set of reachable time steps as the set of positive integer $k$'s where there exists a control sequence such that the BCN can be steered from one state to the other in exactly $k$ time steps; and the set of unreachable time steps as the set of $k$'s where there does not exist any control sequences such that the BCN can be steered from one state to the other in exactly $k$ time steps. We consider in this paper the so-called categorization problem of a BCN, i.e., we develop a method, via algebraic graph theoretic approach, to determine whether the set of reachable time steps and the set of unreachable time steps, associated with the given pair of states, are finite or infinite. Our results can be applied to classify all ordered pairs of states into four categories, depending on whether the set of reachable (unreachable) time steps is finite or not.

LGMay 11, 2022
Spatial-temporal associations representation and application for process monitoring using graph convolution neural network

Hao Ren, Xiaojun Liang, Chunhua Yang et al.

Thank you very much for the attention and concern of colleagues and scholars in this work. With the comments and guidance of experts, editors, and reviewers, this work has been accepted for publishing in the journal "Process Safety and Environmental Protection". The theme of this paper relies on the Spatial-temporal associations of numerous variables in the same industrial processes, which refers to numerous variables obtained in dynamic industrial processes with Spatial-temporal correlation characteristics, i.e., these variables are not only highly correlated in time but also interrelated in space. To handle this problem, three key issues need to be well addressed: variable characteristics modeling and representation, graph network construction (temporal information), and graph characteristics perception. The first issue is implemented by assuming the data follows one improved Gaussian distribution, while the graph network can be defined by the monitoring variables and their edges which are calculated by their characteristics in time. Finally, these networks corresponding to process states at different times are fed into a graph convolutional neural network to implement graph classification to achieve process monitoring. A benchmark experiment (Tennessee Eastman chemical process) and one application study (cobalt purification from zinc solution) are employed to demonstrate the feasibility and applicability of this paper.

LGSep 28, 2024
Canonical Correlation Guided Deep Neural Network

Zhiwen Chen, Siwen Mo, Haobin Ke et al.

Learning representations of two views of data such that the resulting representations are highly linearly correlated is appealing in machine learning. In this paper, we present a canonical correlation guided learning framework, which allows to be realized by deep neural networks (CCDNN), to learn such a correlated representation. It is also a novel merging of multivariate analysis (MVA) and machine learning, which can be viewed as transforming MVA into end-to-end architectures with the aid of neural networks. Unlike the linear canonical correlation analysis (CCA), kernel CCA and deep CCA, in the proposed method, the optimization formulation is not restricted to maximize correlation, instead we make canonical correlation as a constraint, which preserves the correlated representation learning ability and focuses more on the engineering tasks endowed by optimization formulation, such as reconstruction, classification and prediction. Furthermore, to reduce the redundancy induced by correlation, a redundancy filter is designed. We illustrate the performance of CCDNN on various tasks. In experiments on MNIST dataset, the results show that CCDNN has better reconstruction performance in terms of mean squared error and mean absolute error than DCCA and DCCAE. Also, we present the application of the proposed network to industrial fault diagnosis and remaining useful life cases for the classification and prediction tasks accordingly. The proposed method demonstrates superior performance in both tasks when compared to existing methods. Extension of CCDNN to much more deeper with the aid of residual connection is also presented in appendix.

SYApr 25, 2019
Tracking Performance Limitations of MIMO Networked Control Systems with Multiple Communication Constraints

Chao-Yang Chen, Weihua Gui, Lianghong Wu et al.

In this paper, the tracking performance limitation of networked control systems (NCSs) is studied. The NCSs is considered as continuous-time linear multi-input multi-output (MIMO) systems with random reference noises. The controlled plants include unstable poles and non-minimum phase (NMP) zeros. The output feedback path is affected by multiple communication constraints. We focus on some basic communication constraints, including additive white noise (AWN), quantization noise, bandwidth, as well as encoder-decoder. The system performance is evaluated with the tracking error energy, and used a two-degree of freedom (2DOF) controller. The explicit representation of the tracking performance is given in this paper. The results indicate the tracking performance limitations rely to internal characteristics of the plant (unstable poles and NMP zeros), reference noises (the reference noise power distribution (RNPD) and its directions) and the characteristics of communication constraints. Moreover, the tracking performance limitations are also affected by the angles between the each transform NMP zero direction and RNPD direction, and these angles between each transform unstable poles direction and the direction of communication constraint distribution/allocation. In addition, for MIMO NCSs, bandwidth (there are not identical two channels) always can affects the direction of unstable poles, and the channel allocation of bandwidth and encode-decode may be used for a feasible method for the performance allocation of each channels. Lastly, a instance is given for verifying the effectiveness of the theoretical outcomes.

OCApr 29, 2013
A Discrete State Transition Algorithm for Generalized Traveling Salesman Problem

Xiaolin Tang, Chunhua Yang, Xiaojun Zhou et al.

Generalized traveling salesman problem (GTSP) is an extension of classical traveling salesman problem (TSP), which is a combinatorial optimization problem and an NP-hard problem. In this paper, an efficient discrete state transition algorithm (DSTA) for GTSP is proposed, where a new local search operator named \textit{K-circle}, directed by neighborhood information in space, has been introduced to DSTA to shrink search space and strengthen search ability. A novel robust update mechanism, restore in probability and risk in probability (Double R-Probability), is used in our work to escape from local minima. The proposed algorithm is tested on a set of GTSP instances. Compared with other heuristics, experimental results have demonstrated the effectiveness and strong adaptability of DSTA and also show that DSTA has better search ability than its competitors.

OCAug 1, 2012
Initial Version of State Transition Algorithm

Xiaojun Zhou, Chunhua Yang, Weihua Gui

In terms of the concepts of state and state transition, a new algorithm-State Transition Algorithm (STA) is proposed in order to probe into classical and intelligent optimization algorithms. On the basis of state and state transition, it becomes much simpler and easier to understand. As for continuous function optimization problems, three special operators named rotation, translation and expansion are presented. While for discrete function optimization problems, an operator called general elementary transformation is introduced. Finally, with 4 common benchmark continuous functions and a discrete problem used to test the performance of STA, the experiment shows that STA is a promising algorithm due to its good search capability.

OCMay 30, 2012
State Transition Algorithm

Xiaojun Zhou, Chunhua Yang, Weihua Gui

In terms of the concepts of state and state transition, a new heuristic random search algorithm named state transition algorithm is proposed. For continuous function optimization problems, four special transformation operators called rotation, translation, expansion and axesion are designed. Adjusting measures of the transformations are mainly studied to keep the balance of exploration and exploitation. Convergence analysis is also discussed about the algorithm based on random search theory. In the meanwhile, to strengthen the search ability in high dimensional space, communication strategy is introduced into the basic algorithm and intermittent exchange is presented to prevent premature convergence. Finally, experiments are carried out for the algorithms. With 10 common benchmark unconstrained continuous functions used to test the performance, the results show that state transition algorithms are promising algorithms due to their good global search capability and convergence property when compared with some popular algorithms.