Minghui Zhu

LG
h-index34
23papers
403citations
Novelty50%
AI Score53

23 Papers

SYJan 19, 2017
An Integrated Design of Optimization and Physical Dynamics for Energy Efficient Buildings: A Passivity Approach

Takeshi Hatanaka, Xuan Zhang, Wenbo Shi et al.

In this paper, we address energy management for heating, ventilation, and air-conditioning (HVAC) systems in buildings, and present a novel combined optimization and control approach. We first formulate a thermal dynamics and an associated optimization problem. An optimization dynamics is then designed based on a standard primal-dual algorithm, and its strict passivity is proved. We then design a local controller and prove that the physical dynamics with the controller is ensured to be passivity-short. Based on these passivity results, we interconnect the optimization and physical dynamics, and prove convergence of the room temperatures to the optimal ones defined for unmeasurable disturbances. Finally, we demonstrate the present algorithms through simulation.

LGFeb 3, 2023
Efficient Gradient Approximation Method for Constrained Bilevel Optimization

Siyuan Xu, Minghui Zhu

Bilevel optimization has been developed for many machine learning tasks with large-scale and high-dimensional data. This paper considers a constrained bilevel optimization problem, where the lower-level optimization problem is convex with equality and inequality constraints and the upper-level optimization problem is non-convex. The overall objective function is non-convex and non-differentiable. To solve the problem, we develop a gradient-based approach, called gradient approximation method, which determines the descent direction by computing several representative gradients of the objective function inside a neighborhood of the current estimate. We show that the algorithm asymptotically converges to the set of Clarke stationary points, and demonstrate the efficacy of the algorithm by the experiments on hyperparameter optimization and meta-learning.

SYJan 9, 2020
Distributed Robust Adaptive Frequency Control of Power Systems with Dynamic Loads

Hunmin Kim, Minghui Zhu, Jianming Lian

This paper investigates the frequency control of multi-machine power systems subject to uncertain and dynamic net loads. We propose distributed internal model controllers that coordinate synchronous generators and demand response to tackle the unpredictable nature of net loads. Frequency stability is formally guaranteed via Lyapunov analysis. Numerical simulations on the IEEE 68-bus test system demonstrate the effectiveness of the controllers.

LGDec 15, 2025
Explainable reinforcement learning from human feedback to improve alignment

Shicheng Liu, Siyuan Xu, Wenjie Qiu et al.

A common and effective strategy for humans to improve an unsatisfactory outcome in daily life is to find a cause of this outcome and correct the cause. In this paper, we investigate whether this human improvement strategy can be applied to improving reinforcement learning from human feedback (RLHF) for alignment of language models (LMs). In particular, it is observed in the literature that LMs tuned by RLHF can still output unsatisfactory responses. This paper proposes a method to improve the unsatisfactory responses by correcting their causes. Our method has two parts. The first part proposes a post-hoc explanation method to explain why an unsatisfactory response is generated to a prompt by identifying the training data that lead to this response. We formulate this problem as a constrained combinatorial optimization problem where the objective is to find a set of training data closest to this prompt-response pair in a feature representation space, and the constraint is that the prompt-response pair can be decomposed as a convex combination of this set of training data in the feature space. We propose an efficient iterative data selection algorithm to solve this problem. The second part proposes an unlearning method that improves unsatisfactory responses to some prompts by unlearning the training data that lead to these unsatisfactory responses and, meanwhile, does not significantly degrade satisfactory responses to other prompts. Experimental results demonstrate that our algorithm can improve RLHF.

CLApr 22, 2025Code
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

Qiyao Wang, Guhong Chen, Hongbo Wang et al.

Intellectual Property (IP) is a highly specialized domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. Recent advancements in LLMs have demonstrated their potential to handle IP-related tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce IPBench, the first comprehensive IP task taxonomy and a large-scale bilingual benchmark encompassing 8 IP mechanisms and 20 distinct tasks, designed to evaluate LLMs in real-world IP scenarios. We benchmark 17 main LLMs, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models, under zero-shot, few-shot, and chain-of-thought settings. Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. To foster future research, we publicly release IPBench, and will expand it with additional tasks to better reflect real-world complexities and support model advancements in the IP domain. We provide the data and code in the supplementary URLs.

49.9DBMay 2
Write-Read Decoupling in Modern Large-Scale Search Engines: Architectures, Techniques, and Emerging Approaches

Xin Liang, Qing Yang, Wenru Qiu et al.

Large-scale search engines face a fundamental tension: the index must be updated frequently to maintain freshness, yet updates create resource contention that inflates query latency. In the dominant Lucene-based architecture, segment merges triggered by writes compete with concurrent queries for CPU cycles, disk I/O bandwidth, and operating-system page cache -- a problem we term \emph{write-read contention}. This survey systematically examines the architectural solutions that industry and academia have developed to decouple write pressure from read latency. We identify five principal patterns: (i)~node-level read-write separation; (ii)~compute-storage separation; (iii)~full in-memory indexing; (iv)~log-structured write paths; and (v)~in-place partial updates. We survey representative systems including Elasticsearch, LinkedIn Galene, Uber Sia, Quickwit, Alibaba Havenask, Algolia, Milvus, and Vespa, and discuss an emerging synthesis -- the ScaleSearch architecture -- that combines compute-storage separation with full in-memory indexing and dedicated write nodes. A key contribution of ScaleSearch is \emph{per-field update routing}: each field is assigned its own Kafka topic and update path, allowing scalar fields (price, stock, tags) to be updated in-place in $O(1)$ RAM with immediate visibility while full-text fields follow the segment-based compute-storage path. We conclude with open challenges in hybrid vector-and-full-text retrieval, serverless deployments, and AI-integrated search.

28.7LGMay 1
Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

Yue Mao, Shicheng Liu, Siyuan Xu et al.

Inverse reinforcement learning (IRL) learns a reward function and a corresponding policy that best fit the demonstration data of an expert. However, in the current IRL setting, the learner is isolated from the expert and can only passively observe the expert demonstrations. This limits the applicability of IRL to interactive settings, where the learner actively interacts with the expert and needs to infer the expert's reward function from the interactions. To bridge the gap, this paper studies interactive IRL (IIRL) where a learner aims to learn the reward function of an expert and a policy to interact with the expert during its interactions with the expert. We formulate IIRL as a stochastic bi-level optimization problem where the lower level learns a reward function to explain the behaviors of the expert, and the upper level learns a policy to interact with the expert. We develop a double-loop algorithm, Bi-level Interactive Scenarios Inverse Reinforcement Learning (BISIRL), which solves the lower-level problem in the inner loop and the upper-level problem in the outer loop. We formally guarantee that BISIRL converges and validate our algorithm through extensive experiments.

SYMar 20, 2024
Federated reinforcement learning for robot motion planning with zero-shot generalization

Zhenyuan Yuan, Siyuan Xu, Minghui Zhu

This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework.

LGOct 21, 2024
In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates

Shicheng Liu, Minghui Zhu

Inverse reinforcement learning (IRL) aims to learn a reward function and a corresponding policy that best fit the demonstrated trajectories of an expert. However, current IRL works cannot learn incrementally from an ongoing trajectory because they have to wait to collect at least one complete trajectory to learn. To bridge the gap, this paper considers the problem of learning a reward function and a corresponding policy while observing the initial state-action pair of an ongoing trajectory and keeping updating the learned reward and policy when new state-action pairs of the ongoing trajectory are observed. We formulate this problem as an online bi-level optimization problem where the upper level dynamically adjusts the learned reward according to the newly observed state-action pairs with the help of a meta-regularization term, and the lower level learns the corresponding policy. We propose a novel algorithm to solve this problem and guarantee that the algorithm achieves sub-linear local regret $O(\sqrt{T}+\log T+\sqrt{T}\log T)$. If the reward function is linear, we prove that the proposed algorithm achieves sub-linear regret $O(\log T)$. Experiments are used to validate the proposed algorithm.

LGOct 13, 2024
Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Siyuan Xu, Minghui Zhu

Meta-reinforcement learning (Meta-RL) has attracted attention due to its capability to enhance reinforcement learning (RL) algorithms, in terms of data efficiency and generalizability. In this paper, we develop a bilevel optimization framework for meta-RL (BO-MRL) to learn the meta-prior for task-specific policy adaptation, which implements multiple-step policy optimization on one-time data collection. Beyond existing meta-RL analyses, we provide upper bounds of the expected optimality gap over the task distribution. This metric measures the distance of the policy adaptation from the learned meta-prior to the task-specific optimum, and quantifies the model's generalizability to the task distribution. We empirically validate the correctness of the derived upper bounds and demonstrate the superior effectiveness of the proposed algorithm over benchmarks.

LGJul 18, 2025
Byzantine-resilient federated online learning for Gaussian process regression

Xu Zhang, Zhenyuan Yuan, Minghui Zhu

In this paper, we study Byzantine-resilient federated online learning for Gaussian process regression (GPR). We develop a Byzantine-resilient federated GPR algorithm that allows a cloud and a group of agents to collaboratively learn a latent function and improve the learning performances where some agents exhibit Byzantine failures, i.e., arbitrary and potentially adversarial behavior. Each agent-based local GPR sends potentially compromised local predictions to the cloud, and the cloud-based aggregated GPR computes a global model by a Byzantine-resilient product of experts aggregation rule. Then the cloud broadcasts the current global model to all the agents. Agent-based fused GPR refines local predictions by fusing the received global model with that of the agent-based local GPR. Moreover, we quantify the learning accuracy improvements of the agent-based fused GPR over the agent-based local GPR. Experiments on a toy example and two medium-scale real-world datasets are conducted to demonstrate the performances of the proposed algorithm.

LGMay 11, 2021
Lightweight Distributed Gaussian Process Regression for Online Machine Learning

Zhenyuan Yuan, Minghui Zhu

In this paper, we study the problem where a group of agents aim to collaboratively learn a common static latent function through streaming data. We propose a lightweight distributed Gaussian process regression (GPR) algorithm that is cognizant of agents' limited capabilities in communication, computation and memory. Each agent independently runs agent-based GPR using local streaming data to predict test points of interest; then the agents collaboratively execute distributed GPR to obtain global predictions over a common sparse set of test points; finally, each agent fuses results from distributed GPR with agent-based GPR to refine its predictions. By quantifying the transient and steady-state performances in predictive variance and error, we show that limited inter-agent communication improves learning performances in the sense of Pareto. Monte Carlo simulation is conducted to evaluate the developed algorithm.

ROSep 22, 2020
Data-Driven Distributed State Estimation and Behavior Modeling in Sensor Networks

Rui Yu, Zhenyuan Yuan, Minghui Zhu et al.

Nowadays, the prevalence of sensor networks has enabled tracking of the states of dynamic objects for a wide spectrum of applications from autonomous driving to environmental monitoring and urban planning. However, tracking real-world objects often faces two key challenges: First, due to the limitation of individual sensors, state estimation needs to be solved in a collaborative and distributed manner. Second, the objects' movement behavior is unknown, and needs to be learned using sensor observations. In this work, for the first time, we formally formulate the problem of simultaneous state estimation and behavior learning in a sensor network. We then propose a simple yet effective solution to this new problem by extending the Gaussian process-based Bayes filters (GP-BayesFilters) to an online, distributed setting. The effectiveness of the proposed method is evaluated on tracking objects with unknown movement behaviors using both synthetic data and data collected from a multi-robot platform.

CRAug 1, 2020
Transactive Energy System Deployment over Insecure Communication Links

Yang Lu, Jianming Lian, Minghui Zhu et al.

In this paper, the privacy and security issues associated with the transactive energy system (TES) deployment over insecure communication links are addressed. In particular, it is ensured that (1) individual agents' bidding information is kept private throughout hierarchical market-based interactions; and (2) any extraneous data injection attack can be quickly and easily detected. An implementation framework is proposed to enable the cryptography-based enhancement of privacy and security for the deployment of any general hierarchical systems including TESs. Under the proposed framework, a unified cryptography-based approach is developed to achieve both privacy and security simultaneously. Specifically, privacy preservation is realized by an enhanced Paillier encryption scheme, where a block design is proposed to significantly improve computational efficiency. Attack detection is further achieved by an enhanced Paillier digital signature scheme, where a stamp-concatenation mechanism is proposed to enable detection of data replace and reorder attacks. Simulation results verify the effectiveness of the proposed cyber-resilient design for transactive energy systems.

OCDec 16, 2019
On privacy preserving data release of linear dynamic networks

Yang Lu, Minghui Zhu

Distributed data sharing in dynamic networks is ubiquitous. It raises the concern that the private information of dynamic networks could be leaked when data receivers are malicious or communication channels are insecure. In this paper, we propose to intentionally perturb the inputs and outputs of a linear dynamic system to protect the privacy of target initial states and inputs from released outputs. We formulate the problem of perturbation design as an optimization problem which minimizes the cost caused by the added perturbations while maintaining system controllability and ensuring the privacy. We analyze the computational complexity of the formulated optimization problem. To minimize the $\ell_0$ and $\ell_2$ norms of the added perturbations, we derive their convex relaxations which can be efficiently solved. The efficacy of the proposed techniques is verified by a case study on a heating, ventilation, and air conditioning system.

CRJul 29, 2018
ROPNN: Detection of ROP Payloads Using Deep Neural Networks

Xusheng Li, Zhisheng Hu, Haizhou Wang et al.

Return-oriented programming (ROP) is a code reuse attack that chains short snippets of existing code to perform arbitrary operations on target machines. Existing detection methods against ROP exhibit unsatisfactory detection accuracy and/or have high runtime overhead. In this paper, we present ROPNN, which innovatively combines address space layout guided disassembly and deep neural networks to detect ROP payloads. The disassembler treats application input data as code pointers and aims to find any potential gadget chains, which are then classified by a deep neural network as benign or malicious. Our experiments show that ROPNN has high detection rate (99.3%) and a very low false positive rate (0.01%). ROPNN successfully detects all of the 100 real-world ROP exploits that are collected in-the-wild, created manually or created by ROP exploit generation tools. Additionally, ROPNN detects all 10 ROP exploits that can bypass Bin-CFI. ROPNN is non-intrusive and does not incur any runtime overhead to the protected program.

CRMay 1, 2018
Privacy preserving distributed optimization using homomorphic encryption

Yang Lu, Minghui Zhu

This paper studies how a system operator and a set of agents securely execute a distributed projected gradient-based algorithm. In particular, each participant holds a set of problem coefficients and/or states whose values are private to the data owner. The concerned problem raises two questions: how to securely compute given functions; and which functions should be computed in the first place. For the first question, by using the techniques of homomorphic encryption, we propose novel algorithms which can achieve secure multiparty computation with perfect correctness. For the second question, we identify a class of functions which can be securely computed. The correctness and computational efficiency of the proposed algorithms are verified by two case studies of power systems, one on a demand response problem and the other on an optimal power flow problem.

SYApr 9, 2018
Nonlinear Unknown Input and State Estimation Algorithm in Mobile Robots

Pinyao Guo, Hunmin Kim, Nurali Virani et al.

This technical report provides the description and the derivation of a novel nonlinear unknown input and state estimation algorithm (NUISE) for mobile robots. The algorithm is designed for real-world robots with nonlinear dynamic models and subject to stochastic noises on sensing and actuation. Leveraging sensor readings and planned control commands, the algorithm detects and quantifies anomalies on both sensors and actuators. Later, we elaborate the dynamic models of two distinctive mobile robots for the purpose of demonstrating the application of NUISE. This report serves as a supplementary document for [1].

OCFeb 25, 2018
Pareto optimal multi-robot motion planning

Guoxiang Zhao, Minghui Zhu

This paper studies a class of multi-robot coordination problems where a team of robots aim to reach their goal regions with minimum time and avoid collisions with obstacles and other robots. A novel numerical algorithm is proposed to identify the Pareto optimal solutions where no robot can unilaterally reduce its traveling time without extending others'. The consistent approximation of the algorithm in the epigraphical profile sense is guaranteed using set-valued numerical analysis. Experiments on an indoor multi-robot platform and computer simulations show the anytime property of the proposed algorithm; i.e., it is able to quickly return a feasible control policy that safely steers the robots to their goal regions and it keeps improving policy optimality if more time is given.

CRAug 6, 2017
Exploiting Physical Dynamics to Detect Actuator and Sensor Attacks in Mobile Robots

Pinyao Guo, Hunmin Kim, Nurali Virani et al.

Mobile robots are cyber-physical systems where the cyberspace and the physical world are strongly coupled. Attacks against mobile robots can transcend cyber defenses and escalate into disastrous consequences in the physical world. In this paper, we focus on the detection of active attacks that are capable of directly influencing robot mission operation. Through leveraging physical dynamics of mobile robots, we develop RIDS, a novel robot intrusion detection system that can detect actuator attacks as well as sensor attacks for nonlinear mobile robots subject to stochastic noises. We implement and evaluate a RIDS on Khepera mobile robot against concrete attack scenarios via various attack channels including signal interference, sensor spoofing, logic bomb, and physical damage. Evaluation of 20 experiments shows that the averages of false positive rates and false negative rates are both below 1%. Average detection delay for each attack remains within 0.40s.

OCJul 22, 2017
Switching and Data Injection Attacks on Stochastic Cyber-Physical Systems: Modeling, Resilient Estimation and Attack Mitigation

Sze Zheng Yong, Minghui Zhu, Emilio Frazzoli

In this paper, we consider the problem of attack-resilient state estimation, that is to reliably estimate the true system states despite two classes of attacks: (i) attacks on the switching mechanisms and (ii) false data injection attacks on actuator and sensor signals, in the presence of unbounded stochastic process and measurement noise signals. We model the systems under attack as hidden mode stochastic switched linear systems with unknown inputs and propose the use of a multiple-model inference algorithm to tackle these security issues. Moreover, we characterize fundamental limitations to resilient estimation (e.g., upper bound on the number of tolerable signal attacks) and discuss the topics of attack detection, identification and mitigation under this framework. Simulation examples of switching and false data injection attacks on a benchmark system and an IEEE 68-bus test system show the efficacy of our approach to recover resilient (i.e., asymptotically unbiased) state estimates as well as to identify and mitigate the attacks.

OCJun 27, 2016
Simultaneous Mode, Input and State Estimation for Switched Linear Stochastic Systems

Sze Zheng Yong, Minghui Zhu, Emilio Frazzoli

In this paper, we propose a filtering algorithm for simultaneously estimating the mode, input and state of hidden mode switched linear stochastic systems with unknown inputs. Using a multiple-model approach with a bank of linear input and state filters for each mode, our algorithm relies on the ability to find the most probable model as a mode estimate, which we show is possible with input and state filters by identifying a key property, that a particular residual signal we call generalized innovation is a Gaussian white noise. We also provide an asymptotic analysis for the proposed algorithm and provide sufficient conditions for asymptotically achieving convergence to the true model (consistency), or to the 'closest' model according to an information-theoretic measure (convergence). A simulation example of intention-aware vehicles at an intersection is given to demonstrate the effectiveness of our approach.

OCMay 11, 2011
On distributed convex optimization under inequality and equality constraints via primal-dual subgradient methods

Minghui Zhu, Sonia Martinez

We consider a general multi-agent convex optimization problem where the agents are to collectively minimize a global objective function subject to a global inequality constraint, a global equality constraint, and a global constraint set. The objective function is defined by a sum of local objective functions, while the global constraint set is produced by the intersection of local constraint sets. In particular, we study two cases: one where the equality constraint is absent, and the other where the local constraint sets are identical. We devise two distributed primal-dual subgradient algorithms which are based on the characterization of the primal-dual optimal solutions as the saddle points of the Lagrangian and penalty functions. These algorithms can be implemented over networks with changing topologies but satisfying a standard connectivity property, and allow the agents to asymptotically agree on optimal solutions and optimal values of the optimization problem under the Slater's condition.