Keyou You

h-index35

18papers

409citations

Novelty49%

AI Score55

Ranked #8,278 of 194,257 authors (top 4%)#28 in SY (top 2%)

18 Papers

1.2SYMay 8, 2020

Bayesian Filtering with Unknown Sensor Measurement Losses

Jiaqi Zhang, Keyou You, Lihua Xie

This work studies the state estimation problem of a stochastic nonlinear system with unknown sensor measurement losses. If the estimator knows the sensor measurement losses of a linear Gaussian system, the minimum variance estimate is easily computed by the celebrated intermittent Kalman filter (IKF). However, this will no longer be the case when the measurement losses are unknown and/or the system is nonlinear or non-Gaussian. By exploiting the binary property of the measurement loss process and the IKF, we design three suboptimal filters for the state estimation, i.e., BKF-I, BKF-II and RBPF. The BKF-I is based on the MAP estimator of the measurement loss process and the BKF-II is derived by estimating the conditional loss probability. The RBPF is a particle filter based algorithm which marginalizes out the loss process to increase the efficiency of particles. All the proposed filters can be easily implemented in recursive forms. Finally, a linear system, a target tracking system and a quadrotor's path control problem are included to illustrate their effectiveness, and show the tradeoff between computational complexity and estimation accuracy of the proposed filters.

6.9AIMay 11Code

LLM4Branch: Large Language Model for Discovering Efficient Branching Policies of Integer Programs

Zhinan Hou, Xingchen Li, Yankai Zhang et al.

Efficient branching policies are essential for accelerating Mixed Integer Linear Programming (MILP) solvers. Their design has long relied on hand-crafted heuristics, and now machine learning has emerged as a promising paradigm to automate this process. However, existing learning-based methods are often hindered by their dependence on expensive expert demonstrations and the gap between training objectives and the solver's end-to-end performance. In this work, we propose LLM4Branch, a novel framework that leverages Large Language Models (LLMs) to automate the discovery of efficient branching policies. Specifically, the discovered policy is an executable program with a program skeleton generated by the LLM and a parameter vector, which is optimized via a zeroth-order method over a few instances with their end-to-end performance feedback. Extensive experiments on standard MILP benchmarks demonstrate that LLM4Branch establishes a new state-of-the-art among CPU-based methods and achieves performance competitive with advanced GPU-based models. Codes are available at https://github.com/hzn18/LLM4Branch.

1.2SYJul 31, 2018

Parallel Optimal Control for Cooperative Automation of Large-scale Connected Vehicles via ADMM

Zhitao Wang, Yang Zheng, Shengbo Eben Li et al.

This paper proposes a parallel optimization algorithm for cooperative automation of large-scale connected vehicles. The task of cooperative automation is formulated as a centralized optimization problem taking the whole decision space of all vehicles into account. Considering the uncertainty of the environment, the problem is solved in a receding horizon fashion. Then, we employ the alternating direction method of multipliers (ADMM) to solve the centralized optimization in a parallel way, which scales more favorably to large-scale instances. Also, Taylor series is used to linearize nonconvex constraints caused by coupling collision avoidance constraints among interactive vehicles. Simulations with two typical traffic scenes for multiple vehicles demonstrate the effectiveness and efficiency of our method.

1.2SYFeb 22, 2020

Range-based Coordinate Alignment for Cooperative Mobile Sensor Network Localization

Keyou You, Qizhu Chen, Pei Xie et al.

This paper studies a coordinate alignment problem for cooperative mobile sensor network localization with range-based measurements. The network consists of target nodes, each of which has only access position information in a local fixed coordinate frame, and anchor nodes with GPS position information. To localize target nodes, we aim to align their coordinate frames, which leads to a non-convex optimization problem over a rotation group $\text{SO}(3)$. Then, we reformulate it as an optimization problem with a convex objective function over spherical surfaces. We explicitly design both iterative and recursive algorithms for localizing a target node with an anchor node, and extend to the case with multiple target nodes. Finally, the advantages of our algorithms against the literature are validated via simulations.

1.2DCJan 14, 2018

Distributed Algorithms for Robust Convex Optimization via the Scenario Approach

Keyou You, Roberto Tempo, Pei Xie

This paper proposes distributed algorithms to solve robust convex optimization (RCO) when the constraints are affected by nonlinear uncertainty. We adopt a scenario approach by randomly sampling the uncertainty set. To facilitate the computational task, instead of using a single centralized processor to obtain a "global solution" of the scenario problem (SP), we resort to {\it multiple interconnected processors} that are distributed among different nodes of a network to simultaneously solve the SP. Then, we propose a primal-dual sub-gradient algorithm and a random projection algorithm to distributedly solve the SP over undirected and directed graphs, respectively. Both algorithms are given in an explicit recursive form with simple iterations, which are especially suited for processors with limited computational capability. We show that, if the underlying graph is strongly connected, each node asymptotically computes a common optimal solution to the SP with a convergence rate $O(1/(\sum_{t=1}^kζ^t))$ where $\{ζ^t\}$ is a sequence of appropriately decreasing stepsizes. That is, the RCO is effectively solved in a distributed way. The relations with the existing literature on robust convex programs are thoroughly discussed and an example of robust system identification is included to validate the effectiveness of our distributed algorithms.

9.2SYJun 2

APX-Hardness of Computing Lipschitz Constants for Multi-Parametric Quadratic Programs

Xingchen Li, Kunpeng Liu, Keyou You

Computing the Lipschitz constant of the solution map of a multi-parametric quadratic program is important for the analysis of optimization-based control. This problem is governed by three factors: the parameter dimension, the number of decision variables, and the number of constraints. While empirical evidence has long suggested exponential complexity, a rigorous complexity-theoretic proof has been lacking. In this paper, we fill this gap by proving that this problem is not only NP-hard but also APX-hard. Furthermore, we reveal that: (a) the problem becomes polynomial-time solvable when the number of constraints or decision variables is fixed; and (b) both NP-hardness and APX-hardness persist even in the scalar parameter case. These results confirm that the complexity stems from the number of constraints and variables, rather than the parameter dimension. Numerical experiments further validate these theoretical findings.

6.3SYMay 15

Direct Data-Driven Linear Quadratic Tracking via Policy Optimization

Shubo Kang, Keyou You

Direct data-driven optimal control provides an elegant end-to-end paradigm, yet its real-time applicability is often hindered by the growing dimensionality of online decision variables. Recent breakthroughs, notably Data-EnablEd Policy Optimization (DeePO), overcome this bottleneck for the Linear Quadratic Regulator (LQR) through sample-covariance parameterization; however, extending this paradigm to Linear Quadratic Tracking (LQT) poses a fundamental challenge. The core difficulty stems from the intricate coupling between time-varying references and the feedback-feedforward policy structure, which prevents a direct application of constant-dimension parameterization. We first introduce a reference-decoupled reformulation of LQT that naturally accommodates the covariance parameterization, guaranteeing a fixed dimension of decision variables independent of data horizon. This formulation is proven to be exactly equivalent to the indirect certainty-equivalence LQT solution. Leveraging this characterization, we develop offline and online DeePO algorithms. Theoretically, we prove global linear convergence for the offline algorithm using local gradient dominance and smoothness, and show that in the online setting the optimality gap decays linearly up to a bias term that scales inversely with the signal-to-noise ratio (SNR). Numerical simulations varify the theoretical results and illustrate the superior tracking performance of the proposed method.

1.2SYJan 23

ReLU Networks for Model Predictive Control: Network Complexity and Performance Guarantees

Xingchen Li, Keyou You

Recent years have witnessed a resurgence in using ReLU neural networks (NNs) to represent model predictive control (MPC) policies. However, determining the required network complexity to ensure closed-loop performance remains a fundamental open problem. This involves a critical precision-complexity trade-off: undersized networks may fail to capture the MPC policy, while oversized ones may outweigh the benefits of ReLU network approximation. In this work, we propose a projection-based method to enforce hard constraints and establish a state-dependent Lipschitz continuity property for the optimal MPC cost function, which enables sharp convergence analysis of the closed-loop system. For the first time, we derive explicit bounds on ReLU network width and depth for approximating MPC policies with guaranteed closed-loop performance. To further reduce network complexity and enhance closed-loop performance, we propose a non-uniform error framework with a state-aware scaling function to adaptively adjust both the input and output of the ReLU network. Our contributions provide a foundational step toward certifiable ReLU NN-based MPC.

7.6OCMar 8

Compressed Proximal Federated Learning for Non-Convex Composite Optimization on Heterogeneous Data

Pu Qiu, Chen Ouyang, Yongyang Xiong et al.

Federated Composite Optimization (FCO) has emerged as a promising framework for training models with structural constraints (e.g., sparsity) in distributed edge networks. However, simultaneously achieving communication efficiency and convergence robustness remains a significant challenge, particularly when dealing with non-smooth regularizers, statistical heterogeneity, and the restrictions of biased compression. To address these issues, we propose FedCEF (Federated Composite Error Feedback), a novel algorithm tailored for non-convex FCO. FedCEF introduces a decoupled proximal update scheme that separates the proximal operator from communication, enabling clients to handle non-smooth terms locally while transmitting compressed information. To mitigate the noise from aggressive quantization and the bias from non-IID data, FedCEF integrates a rigorous error feedback mechanism with control variates. Furthermore, we design a communication-efficient pre-proximal downlink strategy that allows clients to exactly reconstruct global control variables without explicit transmission. We theoretically establish that FedCEF achieves sublinear convergence to a bounded residual error under general non-convexity, which is controllable via the step size and batch size. Extensive experiments on real datasets validate FedCEF maintains competitive model accuracy even under extreme compression ratios (e.g., 1%), significantly reducing the total communication volume compared to uncompressed baselines.

5.6OCApr 3, 2024

Deep Reinforcement Learning for Traveling Purchaser Problems

Haofeng Yuan, Rongping Zhu, Wanlu Yang et al.

The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant advantage of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, by leveraging DRL, we can train the policy network towards optimizing the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.

13.6OCMay 14, 2021

Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

Jiaqi Zhang, Keyou You, Lihua Xie

Information compression is essential to reduce communication cost in distributed optimization over peer-to-peer networks. This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems. By compressing innovation vectors, which are the differences between decision vectors and their estimates, COLD is able to achieve linear convergence for a class of $δ$-contracted compressors. We explicitly quantify how the compression affects the convergence rate and show that COLD matches the same rate of its uncompressed version. To accommodate a wider class of compressors that includes the binary quantizer, we further design a novel dynamical scaling mechanism and obtain the linearly convergent Dyna-COLD. Importantly, our results strictly improve existing results for the quantized consensus problem. Numerical experiments demonstrate the advantages of both algorithms under different compressors.

5.9SYNov 22, 2020

Primal-dual Learning for the Model-free Risk-constrained Linear Quadratic Regulator

Feiran Zhao, Keyou You

Risk-aware control, though with promise to tackle unexpected events, requires a known exact dynamical model. In this work, we propose a model-free framework to learn a risk-aware controller with a focus on the linear system. We formulate it as a discrete-time infinite-horizon LQR problem with a state predictive variance constraint. To solve it, we parameterize the policy with a feedback gain pair and leverage primal-dual methods to optimize it by solely using data. We first study the optimization landscape of the Lagrangian function and establish the strong duality in spite of its non-convex nature. Alongside, we find that the Lagrangian function enjoys an important local gradient dominance property, which is then exploited to develop a convergent random search algorithm to learn the dual function. Furthermore, we propose a primal-dual algorithm with global convergence to learn the optimal policy-multiplier pair. Finally, we validate our results via simulations.

5.1CEMar 6, 2020

Smart Train Operation Algorithms based on Expert Knowledge and Reinforcement Learning

Kaichen Zhou, Shiji Song, Anke Xue et al.

During recent decades, the automatic train operation (ATO) system has been gradually adopted in many subway systems for its low-cost and intelligence. This paper proposes two smart train operation algorithms by integrating the expert knowledge with reinforcement learning algorithms. Compared with previous works, the proposed algorithms can realize the control of continuous action for the subway system and optimize multiple critical objectives without using an offline speed profile. Firstly, through learning historical data of experienced subway drivers, we extract the expert knowledge rules and build inference methods to guarantee the riding comfort, the punctuality, and the safety of the subway system. Then we develop two algorithms for optimizing the energy efficiency of train operation. One is the smart train operation (STO) algorithm based on deep deterministic policy gradient named (STOD) and the other is the smart train operation algorithm based on normalized advantage function (STON). Finally, we verify the performance of proposed algorithms via some numerical simulations with the real field data from the Yizhuang Line of the Beijing Subway and illustrate that the developed smart train operation algorithm are better than expert manual driving and existing ATO algorithms in terms of energy efficiency. Moreover, STOD and STON can adapt to different trip times and different resistance conditions.

5.0LGMar 1, 2020

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

Xingyu Sha, Jiaqi Zhang, Keyou You et al.

This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks. Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors. This is in sharp contrast to the gossip-based scheme where a pair of nodes concurrently update. Though the fully asynchronous setting involves a difficult multi-timescale decision problem, we design a novel stochastic average gradient (SAG) based distributed algorithm and develop a push-pull augmented graph approach to prove its exact convergence at a linear rate of $\mathcal{O}(c^k)$ where $c\in(0,1)$ and $k$ increases by one no matter on which node updates. Finally, numerical experiments validate that our method speeds up linearly with respect to the number of nodes, and is robust to straggler nodes.

3.5RONov 27, 2019

A selected review on reinforcement learning based control for autonomous underwater vehicles

Yachu Hsu, Hui Wu, Keyou You et al.

Recently, reinforcement learning (RL) has been extensively studied and achieved promising results in a wide range of control tasks. Meanwhile, autonomous underwater vehicle (AUV) is an important tool for executing complex and challenging underwater tasks. The advances in RL offers ample opportunities for developing intelligent AUVs. This paper provides a selected review on RL based control for AUVs with the focus on applications of RL to low-level control tasks for underwater regulation and tracking. To this end, we first present a concise introduction to the RL based control framework. Then, we provide an overview of RL methods for AUVs control problems, where the main challenges and recent progresses are discussed. Finally, two representative cases of RL-based controllers are given in detail for the model-free RL methods on AUVs.

15.6LGSep 6, 2019

Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

Jiaqi Zhang, Keyou You

This paper studies a decentralized stochastic gradient tracking (DSGT) algorithm for non-convex empirical risk minimization problems over a peer-to-peer network of nodes, which is in sharp contrast to the existing DSGT only for convex problems. To ensure exact convergence and handle the variance among decentralized datasets, each node performs a stochastic gradient (SG) tracking step by using a mini-batch of samples, where the batch size is designed to be proportional to the size of the local dataset. We explicitly evaluate the convergence rate of DSGT with respect to the number of iterations in terms of algebraic connectivity of the network, mini-batch size, gradient variance, etc. Under certain conditions, we further show that DSGT has a network independence property in the sense that the network topology only affects the convergence rate up to a constant factor. Hence, the convergence rate of DSGT can be comparable to the centralized SGD method. Moreover, a linear speedup of DSGT with respect to the number of nodes is achievable for some scenarios. Numerical experiments for neural networks and logistic regression problems on CIFAR-10 finally illustrate the advantages of DSGT.

7.6RONov 22, 2017

Depth Control of Model-Free AUVs via Reinforcement Learning

Hui Wu, Shiji Song, Keyou You et al.

In this paper, we consider depth control problems of an autonomous underwater vehicle (AUV) for tracking the desired depth trajectories. Due to the unknown dynamical model of the AUV, the problems cannot be solved by most of model-based controllers. To this purpose, we formulate the depth control problems of the AUV as continuous-state, continuous-action Markov decision processes (MDPs) under unknown transition probabilities. Based on deterministic policy gradient (DPG) and neural network approximation, we propose a model-free reinforcement learning (RL) algorithm that learns a state-feedback controller from sampled trajectories of the AUV. To improve the performance of the RL algorithm, we further propose a batch-learning scheme through replaying previous prioritized trajectories. We illustrate with simulations that our model-free method is even comparable to the model-based controllers as LQI and NMPC. Moreover, we validate the effectiveness of the proposed RL algorithm on a seafloor data set sampled from the South China Sea.

1.2SYJul 27, 2015

Likelihood Ratio Based Scheduler for Secure Detection in Cyber Physical Systems

Jian-Ya Ding, Keyou You, Shiji Song et al.

This paper is concerned with a binary detection problem over a non-secure network. To satisfy the communication rate constraint and against possible cyber attacks, which are modeled as deceptive signals injected to the network, a likelihood ratio based (LRB) scheduler is designed in the sensor side to smartly select sensor measurements for transmission. By exploring the scheduler, some sensor measurements are successfully retrieved from the attacked data at the decision center. We show that even under a moderate communication rate constraint of secure networks, an optimal LRB scheduler can achieve a comparable asymptotic detection performance to the standard N-P test using the full set of measurements, and is strictly better than the random scheduler. For non-secure networks, the LRB scheduler can also maintain the detection functionality but suffers graceful performance degradation under different attack intensities. Finally, we perform simulations to validate our theoretical results.