Insoon Yang

h-index21

16papers

388citations

Novelty53%

AI Score39

Ranked #78,472 of 194,257 authors (top 40%)#2,351 in RO (top 35%)

16 Papers

3.3OCJan 23, 2017

Optimal Control of Conditional Value-at-Risk in Continuous Time

Christopher W. Miller, Insoon Yang

We consider continuous-time stochastic optimal control problems featuring Conditional Value-at-Risk (CVaR) in the objective. The major difficulty in these problems arises from time-inconsistency, which prevents us from directly using dynamic programming. To resolve this challenge, we convert to an equivalent bilevel optimization problem in which the inner optimization problem is standard stochastic control. Furthermore, we provide conditions under which the outer objective function is convex and differentiable. We compute the outer objective's value via a Hamilton-Jacobi-Bellman equation and its gradient via the viscosity solution of a linear parabolic equation, which allows us to perform gradient descent. The significance of this result is that we provide an efficient dynamic programming-based algorithm for optimal control of CVaR without lifting the state-space. To broaden the applicability of the proposed algorithm, we propose convergent approximation schemes in cases where our key assumptions do not hold and characterize relevant suboptimality bounds. In addition, we extend our method to a more general class of risk metrics, which includes mean-variance and median-deviation. We also demonstrate a concrete application to portfolio optimization under CVaR constraints. Our results contribute an efficient framework for solving time-inconsistent CVaR-based sequential optimization.

2.4OCFeb 22, 2018

Safety-Aware Optimal Control of Stochastic Systems Using Conditional Value-at-Risk

Samantha Samuelson, Insoon Yang

In this paper, we consider a multi-objective control problem for stochastic systems that seeks to minimize a cost of interest while ensuring safety. We introduce a novel measure of safety risk using the conditional value-at-risk and a set distance to formulate a safety risk-constrained optimal control problem. Our reformulation method using an extremal representation of the safety risk measure provides a computationally tractable dynamic programming solution. A useful byproduct of the proposed solution is the notion of a risk-constrained safe set, which is a new stochastic safety verification tool. We also establish useful connections between the risk-constrained safe sets and the popular probabilistic safe sets. The tradeoff between the risk tolerance and the mean performance of our controller is examined through an inventory control problem.

1.2SYApr 1, 2017

Online Combinatorial Optimization for Interconnected Refrigeration Systems: Linear Approximation and Submodularity

Insoon Yang

Commercial refrigeration systems consume 7% of the total commercial energy consumption in the United States. Improving their energy efficiency contributes to the sustainability of global energy systems and the supermarket business sector. This paper proposes a new control method that can save the energy consumption of multi-case supermarket refrigerators by explicitly taking into account their interconnected and switched system dynamics. Its novelty is a bilevel combinatorial optimization formulation to generate ON/OFF control actions for expansion valves and compressors. The inner optimization module keeps display case temperatures in a desirable range and the outer optimization module minimizes energy consumption. In addition to its energy-saving capability, the proposed controller significantly reduces the frequency of compressor switchings by employing a conservative compressor control strategy. However, solving this bilevel optimization problem associated with interconnected and switched systems is a computationally challenging task. To solve the problem in near real time, we propose two approximation algorithms that can solve both the inner and outer optimization problems at once. The first algorithm uses a linear approximation, and the second is based on the submodular structure of the optimization problem. Both are (polynomial-time) scalable algorithms and generate near-optimal solutions with performance guarantees. Our work complements existing optimization-based control methods (e.g., MPC) for supermarket refrigerators, as our algorithms can be adopted as a tool for solving combinatorial optimization problems arising in these methods.

3.2OCSep 2, 2024

Generalized Continuous-Time Models for Nesterov's Accelerated Gradient Methods

Chanwoong Park, Youngchae Cho, Insoon Yang

Recent research has indicated a substantial rise in interest in understanding Nesterov's accelerated gradient methods via their continuous-time models. However, most existing studies focus on specific classes of Nesterov's methods, which hinders the attainment of an in-depth understanding and a unified perspective. To address this deficit, we present generalized continuous-time models that cover a broad range of Nesterov's methods, including those previously studied under existing continuous-time frameworks. Our key contributions are as follows. First, we identify the convergence rates of the generalized models, eliminating the need to determine the convergence rate for any specific continuous-time model derived from them. Second, we show that six existing continuous-time models are special cases of our generalized models, thereby positioning our framework as a unifying tool for analyzing and understanding these models. Third, we design a restart scheme for Nesterov's methods based on our generalized models and show that it ensures a monotonic decrease in objective function values. Owing to the broad applicability of our models, this scheme can be used to a broader class of Nesterov's methods compared to the original restart scheme. Fourth, we uncover a connection between our generalized models and gradient flow in continuous time, showing that the accelerated convergence rates of our generalized models can be attributed to a time reparametrization in gradient flow. Numerical experiment results are provided to support our theoretical analyses and results.

3.8LGDec 9, 2023

On Task-Relevant Loss Functions in Meta-Reinforcement Learning and Online LQR

Jaeuk Shin, Giho Kim, Howon Lee et al.

Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method exploits the value information in order to rapidly capture the decision-critical part of the environment. The key component of our method is the loss function for learning the task inference module and the system model that systematically couples the model discrepancy and the value estimate, thereby facilitating the learning of the policy and the task inference module with a significantly smaller amount of data compared to the existing meta-RL algorithms. The idea is also extended to a non-meta-RL setting, namely an online linear quadratic regulator (LQR) problem, where our method can be simplified to reveal the essence of the strategy. The proposed method is evaluated in high-dimensional robotic control and online LQR problems, empirically verifying its effectiveness in extracting information indispensable for solving the tasks from observations in a sample efficient manner.

7.1LGSep 19, 2025

KoopCast: Trajectory Forecasting via Koopman Operators

Jungjin Lee, Jaeuk Shin, Gihwan Kim et al.

We present KoopCast, a lightweight yet efficient model for trajectory forecasting in general dynamic environments. Our approach leverages Koopman operator theory, which enables a linear representation of nonlinear dynamics by lifting trajectories into a higher-dimensional space. The framework follows a two-stage design: first, a probabilistic neural goal estimator predicts plausible long-term targets, specifying where to go; second, a Koopman operator-based refinement module incorporates intention and history into a nonlinear feature space, enabling linear prediction that dictates how to go. This dual structure not only ensures strong predictive accuracy but also inherits the favorable properties of linear operators while faithfully capturing nonlinear dynamics. As a result, our model offers three key advantages: (i) competitive accuracy, (ii) interpretability grounded in Koopman spectral theory, and (iii) low-latency deployment. We validate these benefits on ETH/UCY, the Waymo Open Motion Dataset, and nuScenes, which feature rich multi-agent interactions and map-constrained nonlinear motion. Across benchmarks, KoopCast consistently delivers high predictive accuracy together with mode-level interpretability and practical efficiency.

22.6MLNov 5, 2021

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

Yeoneung Kim, Insoon Yang, Kwang-Sung Jun

In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve $\tilde O(\min\{d\sqrt{K}, d^{1.5}\sqrt{\sum_{k=1}^K σ_k^2}\} + d^2)$ where $d$ is the dimension of the features, $K$ is the time horizon, and $σ_k^2$ is the noise variance at time step $k$, and $\tilde O$ ignores polylogarithmic dependence, which is a factor of $d^3$ improvement. For linear mixture MDPs with the assumption of maximum cumulative reward in an episode being in $[0,1]$, we achieve a horizon-free regret bound of $\tilde O(d \sqrt{K} + d^2)$ where $d$ is the number of base models and $K$ is the number of episodes. This is a factor of $d^{3.5}$ improvement in the leading term and $d^7$ in the lower order term. Our analysis critically relies on a novel peeling-based regret analysis that leverages the elliptical potential `count' lemma.

3.1LGOct 27, 2021

Training Wasserstein GANs without gradient penalties

Dohyun Kwon, Yeoneung Kim, Guido Montúfar et al.

We propose a stable method to train Wasserstein generative adversarial networks. In order to enhance stability, we consider two objective functions using the $c$-transform based on Kantorovich duality which arises in the theory of optimal transport. We experimentally show that this algorithm can effectively enforce the Lipschitz constraint on the discriminator while other standard methods fail to do so. As a consequence, our method yields an accurate estimation for the optimal discriminator and also for the Wasserstein distance between the true distribution and the generated one. Our method requires no gradient penalties nor corresponding hyperparameter tuning and is computationally more efficient than other methods. At the same time, it yields competitive generators of synthetic images based on the MNIST, F-MNIST, and CIFAR-10 datasets.

10.4ROSep 15, 2021Code

Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments

Jaeuk Shin, Astghik Hakobyan, Mingyu Park et al.

The successful operation of mobile robots requires them to adapt rapidly to environmental changes. To develop an adaptive decision-making tool for mobile robots, we propose a novel algorithm that combines meta-reinforcement learning (meta-RL) with model predictive control (MPC). Our method employs an off-policy meta-RL algorithm as a baseline to train a policy using transition samples generated by MPC when the robot detects certain events that can be effectively handled by MPC, with its explicit use of robot dynamics. The key idea of our method is to switch between the meta-learned policy and the MPC controller in a randomized and event-triggered fashion to make up for suboptimal MPC actions caused by the limited prediction horizon. During meta-testing, the MPC module is deactivated to significantly reduce computation time in motion control. We further propose an online adaptation scheme that enables the robot to infer and adapt to a new task within a single trajectory. The performance of our method has been demonstrated through simulations using a nonlinear car-like vehicle model with (i) synthetic movements of obstacles, and (ii) real-world pedestrian motion data. The simulation results indicate that our method outperforms other algorithms in terms of learning efficiency and navigation quality.

11.6ROMay 3, 2021

Distributionally robust risk map for learning-based motion planning and control: A semidefinite programming approach

Astghik Hakobyan, Insoon Yang

This paper proposes a novel safety specification tool, called the distributionally robust risk map (DR-risk map), for a mobile robot operating in a learning-enabled environment. Given the robot's position, the map aims to reliably assess the conditional value-at-risk (CVaR) of collision with obstacles whose movements are inferred by Gaussian process regression (GPR). Unfortunately, the inferred distribution is subject to errors, making it difficult to accurately evaluate the CVaR of collision. To overcome this challenge, this tool measures the risk under the worst-case distribution in a so-called ambiguity set that characterizes allowable distribution errors. To resolve the infinite-dimensionality issue inherent in the construction of the DR-risk map, we derive a tractable semidefinite programming formulation that provides an upper bound of the risk, exploiting techniques from modern distributionally robust optimization. As a concrete application for motion planning, a distributionally robust RRT* algorithm is considered using the risk map that addresses distribution errors caused by GPR. Furthermore, a motion control method is devised using the DR-risk map in a learning-based model predictive control (MPC) formulation. In particular, a neural network approximation of the risk map is proposed to reduce the computational cost in solving the MPC problem. The performance and utility of the proposed risk map are demonstrated through simulation studies that show its ability to ensure the safety of mobile robots despite learning errors.

13.6LGOct 27, 2020Code

Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls

Jeongho Kim, Jaeuk Shin, Insoon Yang

In this paper, we propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls. Our method is based on a new class of Hamilton-Jacobi-Bellman (HJB) equations derived from applying the dynamic programming principle to continuous-time Q-functions. A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics. We identify the condition under which the Q-function estimated by this algorithm converges to the optimal Q-function. For practical implementation, we propose the Hamilton-Jacobi DQN, which extends the idea of deep Q-networks (DQN) to our continuous control setting. This approach does not require actor networks or numerical solutions to optimization problems for greedy actions since the HJB equation provides a simple characterization of optimal controls via ordinary differential equations. We empirically demonstrate the performance of our method through benchmark tasks and high-dimensional linear-quadratic problems.

7.0ROMar 5, 2020

Learning-based distributionally robust motion control with Gaussian processes

Astghik Hakobyan, Insoon Yang

Safety is a critical issue in learning-based robotic and autonomous systems as learned information about their environments is often unreliable and inaccurate. In this paper, we propose a risk-aware motion control tool that is robust against errors in learned distributional information about obstacles moving with unknown dynamics. The salient feature of our model predictive control (MPC) method is its capability of limiting the risk of unsafety even when the true distribution deviates from the distribution estimated by Gaussian process (GP) regression, within an ambiguity set. Unfortunately, the distributionally robust MPC problem with GP is intractable because the worst-case risk constraint involves an infinite-dimensional optimization problem over the ambiguity set. To remove the infinite-dimensionality issue, we develop a systematic reformulation approach exploiting modern distributionally robust optimization techniques. The performance and utility of our method are demonstrated through simulations using a nonlinear car-like vehicle model for autonomous driving.

15.1ROFeb 24, 2020

Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach

Subin Huh, Insoon Yang

Emerging applications in robotics and autonomous systems, such as autonomous driving and robotic surgery, often involve critical safety constraints that must be satisfied even when information about system models is limited. In this regard, we propose a model-free safety specification method that learns the maximal probability of safe operation by carefully combining probabilistic reachability analysis and safe reinforcement learning (RL). Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage. As a result, it yields a sequence of safe policies that determine the range of safe operation, called the safe set, which monotonically expands and gradually converges. We also develop an efficient safe exploration scheme that accelerates the process of identifying the safety of unexamined states. Exploiting the Lyapunov shielding, our method regulates the exploratory policy to avoid dangerous states with high confidence. To handle high-dimensional systems, we further extend our approach to deep RL by introducing a Lagrangian relaxation technique to establish a tractable actor-critic algorithm. The empirical performance of our method is demonstrated through continuous control benchmark problems, such as a reaching task on a planar robot arm.

17.3ROJan 14, 2020

Wasserstein Distributionally Robust Motion Control for Collision Avoidance Using Conditional Value-at-Risk

Astghik Hakobyan, Insoon Yang

In this paper, a risk-aware motion control scheme is considered for mobile robots to avoid randomly moving obstacles when the true probability distribution of uncertainty is unknown. We propose a novel model predictive control (MPC) method for limiting the risk of unsafety even when the true distribution of the obstacles' movements deviates, within an ambiguity set, from the empirical distribution obtained using a limited amount of sample data. By choosing the ambiguity set as a statistical ball with its radius measured by the Wasserstein metric, we achieve a probabilistic guarantee of the out-of-sample risk, evaluated using new sample data generated independently of the training data. To resolve the infinite-dimensionality issue inherent in the distributionally robust MPC problem, we reformulate it as a finite-dimensional nonlinear program using modern distributionally robust optimization techniques based on the Kantorovich duality principle. To find a globally optimal solution in the case of affine dynamics and output equations, a spatial branch-and-bound algorithm is designed using McCormick relaxation. The performance of the proposed method is demonstrated and analyzed through simulation studies using a nonlinear car-like vehicle model and a linearized quadrotor model.

12.8OCDec 23, 2019

Hamilton-Jacobi-Bellman Equations for Q-Learning in Continuous Time

Jeongho Kim, Insoon Yang

In this paper, we introduce Hamilton-Jacobi-Bellman (HJB) equations for Q-functions in continuous time optimal control problems with Lipschitz continuous controls. The standard Q-function used in reinforcement learning is shown to be the unique viscosity solution of the HJB equation. A necessary and sufficient condition for optimality is provided using the viscosity solution framework. By using the HJB equation, we develop a Q-learning method for continuous-time dynamical systems. A DQN-like algorithm is also proposed for high-dimensional state and control spaces. The performance of the proposed Q-learning algorithm is demonstrated using 1-, 10- and 20-dimensional dynamical systems.

3.3OCJun 17, 2015

Optimal Dynamic Contracts for a Large-Scale Principal-Agent Hierarchy: A Concavity-Preserving Approach

Christopher W. Miller, Insoon Yang

We present a continuous-time contract whereby a top-level player can incentivize a hierarchy of players below him to act in his best interest despite only observing the output of his direct subordinate. This paper extends Sannikov's approach from a situation of asymmetric information between a principal and an agent to one of hierarchical information between several players. We develop an iterative algorithm for constructing an incentive compatible contract and define the correct notion of concavity which must be preserved during iteration. We identify conditions under which a dynamic programming construction of an optimal dynamic contract can be reduced to only a one-dimensional state space and one-dimensional control set, independent of the size of the hierarchy. In this sense, our results contribute to the applicability of dynamic programming on dynamic contracts for a large-scale principal-agent hierarchy.