13.0SYMay 21
A Learning With Errors based encryption scheme for dynamic controllers that discloses residue signal for anomaly detectionYeongjun Jang, Joowon Lee, Junsoo Kim et al.
Although encrypted control systems ensure confidentiality of private data, it is challenging to detect anomalies without the secret key as all signals remain encrypted. To address this issue, we propose a homomorphic encryption scheme for dynamic controllers that automatically discloses the residue signal for anomaly detection, while keeping all other signals private. To this end, we characterize the zero-dynamics of an encrypted dynamic system over a finite field of integers and incorporate it into a Learning With Errors (LWE) based scheme. We then present a method to further utilize the disclosed residue signal for implementing dynamic controllers over encrypted data, which does not involve re-encryption even when they have non-integer state matrices.
ROSep 19, 2024
Fast End-to-End Generation of Belief Space Paths for Minimum Sensing NavigationLukas Taus, Vrushabh Zinage, Takashi Tanaka et al.
We revisit the problem of motion planning in the Gaussian belief space. Motivated by the fact that most existing sampling-based planners suffer from high computational costs due to the high-dimensional nature of the problem, we propose an approach that leverages a deep learning model to predict optimal path candidates directly from the problem description. Our proposed approach consists of three steps. First, we prepare a training dataset comprising a large number of input-output pairs: the input image encodes the problem to be solved (e.g., start states, goal states, and obstacle locations), whereas the output image encodes the solution (i.e., the ground truth of the shortest path). Any existing planner can be used to generate this training dataset. Next, we leverage the U-Net architecture to learn the dependencies between the input and output data. Finally, a trained U-Net model is applied to a new problem encoded as an input image. From the U-Net's output image, which is interpreted as a distribution of paths,an optimal path candidate is reconstructed. The proposed method significantly reduces computation time compared to the sampling-based baseline algorithm.
3.0SYMay 12
Experimental Examination of Secure Two-Party Controller ComputationKaoru Teranishi, Jihoon Suh, Takashi Tanaka
A secure two-party computation protocol for running dynamic controllers over secret sharing has recently been proposed. Unlike encrypted control schemes based on homomorphic encryption, this protocol enables operating dynamic controllers for an infinite time horizon without controller-state decryption, controller-state reset, or input re-encryption. However, the two-party setting introduces additional online communication between the computing parties, which may hinder real-time feasibility. In this study, we demonstrate the feasibility of the protocol through implementation on a commercial cloud platform with an inverted pendulum testbed. Experimental results show that the proposed protocol successfully stabilized the pendulum despite the online communication overhead.
21.3SYApr 21
Path Integral Control for Partially Observed Systems with Controlled SensingGoutam Das, Takashi Tanaka
Path integral control in Gaussian belief space requires a structural matching condition between the observation-driven diffusion of the belief mean and the actuation authority, which a fixed observation matrix cannot enforce. We treat the observation matrix as a control variable and show that constraining the sensing control to a measurable selector from the resulting matching set reduces the Hamilton-Jacobi-Bellman equation for the belief mean and covariance to a linear PDE with a Feynman-Kac representation.
9.9SYApr 14
Path Integral Control in Gaussian Belief Space for Partially Observed SystemsGoutam Das, Takashi Tanaka
This paper extends path integral control (PIC) to partially observed systems by formulating the problem in Gaussian belief space. PIC relies on the diffusion being proportional to the control channel -- the so-called matching condition -- to linearize the Hamilton-Jacobi-Bellman equation via the Cole-Hopf transform; we show that this condition fails in infinite-dimensional belief space under non-affine observations. Restricting to Gaussian beliefs yields a finite-dimensional approximation with deterministic covariance evolution, reducing the problem to stochastic control of the belief mean. We derive necessary and sufficient conditions for matching in this reduced space, obtain an exact Cole-Hopf linearization with a Feynman-Kac representation, and develop the MPPI-Belief algorithm. Numerical experiments on a navigation task with state-dependent observation noise demonstrate the effectiveness of MPPI-Belief relative to certainty-equivalent and particle-filter-based baselines.
44.0SYMar 19
Variational Encrypted Model Predictive ControlJihoon Suh, Yeongjun Jang, Junsoo Kim et al.
We develop a variational encrypted model predictive control (VEMPC) protocol whose online execution relies only on encrypted polynomial operations. The proposed approach reformulates the MPC problem into a sampling-based estimator, in which the computation of the quadratic cost is naturally handled by tilting the sampling distribution, thus reducing online encrypted computation. The resulting protocol requires no additional communication rounds or intermediate decryption, and scales efficiently through two complementary levels of parallelism. We analyze the effect of encryption-induced errors on optimality, and simulation results demonstrate the practical applicability of the proposed method.
6.0OCApr 8
Linearly Solvable Continuous-Time General-Sum Stochastic Differential GamesMonika Tomar, Takashi Tanaka
This paper introduces a class of continuous-time, finite-player stochastic general-sum differential games that admit solutions through an exact linear PDE system. We formulate a distribution planning game utilizing the cross-log-likelihood ratio to naturally model multi-agent spatial conflicts, such as congestion avoidance. By applying a generalized multivariate Cole-Hopf transformation, we decouple the associated non-linear Hamilton-Jacobi-Bellman (HJB) equations into a system of linear partial differential equations. This reduction enables the efficient, grid-free computation of feedback Nash equilibrium strategies via the Feynman-Kac path integral method, effectively overcoming the curse of dimensionality.
LGJun 14, 2025
Relative Entropy Regularized Reinforcement Learning for Efficient Encrypted Policy SynthesisJihoon Suh, Yeongjun Jang, Kaoru Teranishi et al.
We propose an efficient encrypted policy synthesis to develop privacy-preserving model-based reinforcement learning. We first demonstrate that the relative-entropy-regularized reinforcement learning framework offers a computationally convenient linear and ``min-free'' structure for value iteration, enabling a direct and efficient integration of fully homomorphic encryption with bootstrapping into policy synthesis. Convergence and error bounds are analyzed as encrypted policy synthesis propagates errors under the presence of encryption-induced errors including quantization and bootstrapping. Theoretical analysis is validated by numerical simulations. Results demonstrate the effectiveness of the RERL framework in integrating FHE for encrypted policy synthesis.
LGApr 12, 2025
Efficient Implementation of Reinforcement Learning over Homomorphic EncryptionJihoon Suh, Takashi Tanaka
We investigate encrypted control policy synthesis over the cloud. While encrypted control implementations have been studied previously, we focus on the less explored paradigm of privacy-preserving control synthesis, which can involve heavier computations ideal for cloud outsourcing. We classify control policy synthesis into model-based, simulator-driven, and data-driven approaches and examine their implementation over fully homomorphic encryption (FHE) for privacy enhancements. A key challenge arises from comparison operations (min or max) in standard reinforcement learning algorithms, which are difficult to execute over encrypted data. This observation motivates our focus on Relative-Entropy-regularized reinforcement learning (RL) problems, which simplifies encrypted evaluation of synthesis algorithms due to their comparison-free structures. We demonstrate how linearly solvable value iteration, path integral control, and Z-learning can be readily implemented over FHE. We conduct a case study of our approach through numerical simulations of encrypted Z-learning in a grid world environment using the CKKS encryption scheme, showing convergence with acceptable approximation error. Our work suggests the potential for secure and efficient cloud-based reinforcement learning.
ROOct 29, 2021
Upper and Lower Bounds for End-to-End Risks in Stochastic Robot NavigationApurva Patil, Takashi Tanaka
We present novel upper and lower bounds to estimate the collision probability of motion plans for autonomous agents with discrete-time linear Gaussian dynamics. Motion plans generated by planning algorithms cannot be perfectly executed by autonomous agents in reality due to the inherent uncertainties in the real world. Estimating collision probability is crucial to characterize the safety of trajectories and plan risk optimal trajectories. Our approach is an application of standard results in probability theory including the inequalities of Hunter, Kounias, Frechet, and Dawson. Using a ground robot navigation example, we numerically demonstrate that our method is considerably faster than the naive Monte Carlo sampling method and the proposed bounds are significantly less conservative than Boole's bound commonly used in the literature.
ROSep 28, 2021
Gaussian Belief Space Path Planning for Minimum Sensing NavigationAli Reza Pedram, Riku Funada, Takashi Tanaka
We propose a path planning methodology for a mobile robot navigating through an obstacle-filled environment to generate a reference path that is traceable with moderate sensing efforts. The desired reference path is characterized as the shortest path in an obstacle-filled Gaussian belief manifold equipped with a novel information-geometric distance function. The distance function we introduce is shown to be an asymmetric quasi-pseudometric and can be interpreted as the minimum information gain required to steer the Gaussian belief. An RRT*-based numerical solution algorithm is presented to solve the formulated shortest-path problem. To gain insight into the asymptotic optimality of the proposed algorithm, we show that the considered path length function is continuous with respect to the topology of total variation. Simulation results demonstrate that the proposed method is effective in various robot navigation scenarios to reduce sensing costs, such as the required frequency of sensor measurements and the number of sensors that must be operated simultaneously.
ROSep 27, 2021
Dynamic Allocation of Visual Attention for Vision-based Autonomous Navigation under Data Rate ConstraintsAli Reza Pedram, Riku Funada, Takashi Tanaka
This paper considers the problem of task-dependent (top-down) attention allocation for vision-based autonomous navigation using known landmarks. Unlike the existing paradigm in which landmark selection is formulated as a combinatorial optimization problem, we model it as a resource allocation problem where the decision-maker (DM) is granted extra freedom to control the degree of attention to each landmark. The total resource available to DM is expressed in terms of the capacity limit of the in-take information flow, which is quantified by the directed information from the state of the environment to the DM's observation. We consider a receding horizon implementation of such a controlled sensing scheme in the Linear-Quadratic-Gaussian (LQG) regime. The convex-concave procedure is applied in each time step, whose time complexity is shown to be linear in the horizon length if the alternating direction method of multipliers (ADMM) is used. Numerical studies show that the proposed formulation is sparsity-promoting in the sense that it tends to allocate zero data rate to uninformative landmarks.
AISep 10, 2021
Simultaneous Perception-Action Design via Invariant Finite Belief SetsMichael Hibbard, Takashi Tanaka, Ufuk Topcu
Although perception is an increasingly dominant portion of the overall computational cost for autonomous systems, only a fraction of the information perceived is likely to be relevant to the current task. To alleviate these perception costs, we develop a novel simultaneous perception-action design framework wherein an agent senses only the task-relevant information. This formulation differs from that of a partially observable Markov decision process, since the agent is free to synthesize not only its policy for action selection but also its belief-dependent observation function. The method enables the agent to balance its perception costs with those incurred by operating in its environment. To obtain a computationally tractable solution, we approximate the value function using a novel method of invariant finite belief sets, wherein the agent acts exclusively on a finite subset of the continuous belief space. We solve the approximate problem through value iteration in which a linear program is solved individually for each belief state in the set, in each iteration. Finally, we prove that the value functions, under an assumption on their structure, converge to their continuous state-space values as the sample density increases.
CRMar 20, 2021
Encrypted Value Iteration and Temporal Difference Learning over Leveled Homomorphic EncryptionJihoon Suh, Takashi Tanaka
We consider an architecture of confidential cloud-based control synthesis based on Homomorphic Encryption (HE). Our study is motivated by the recent surge of data-driven control such as deep reinforcement learning, whose heavy computational requirements often necessitate an outsourcing to the third party server. To achieve more flexibility than Partially Homomorphic Encryption (PHE) and less computational overhead than Fully Homomorphic Encryption (FHE), we consider a Reinforcement Learning (RL) architecture over Leveled Homomorphic Encryption (LHE). We first show that the impact of the encryption noise under the Cheon-Kim-Kim-Song (CKKS) encryption scheme on the convergence of the model-based tabular Value Iteration (VI) can be analytically bounded. We also consider secure implementations of TD(0), SARSA(0) and Z-learning algorithms over the CKKS scheme, where we numerically demonstrate that the effects of the encryption noise on these algorithms are also minimal.
SYApr 6, 2020
Scalable Synthesis of Minimum-Information Linear-Gaussian Control by Distributed OptimizationMurat Cubuktepe, Takashi Tanaka, Ufuk Topcu
We consider a discrete-time linear-quadratic Gaussian control problem in which we minimize a weighted sum of the directed information from the state of the system to the control input and the control cost. The optimal control and sensing policies can be synthesized jointly by solving a semidefinite programming problem. However, the existing solutions typically scale cubic with the horizon length. We leverage the structure in the problem to develop a distributed algorithm that decomposes the synthesis problem into a set of smaller problems, one for each time step. We prove that the algorithm runs in time linear in the horizon length. As an application of the algorithm, we consider a path-planning problem in a state space with obstacles under the presence of stochastic disturbances. The algorithm computes a locally optimal solution that jointly minimizes the perception and control cost while ensuring the safety of the path. The numerical examples show that the algorithm can scale to thousands of horizon length and compute locally optimal solutions.
SYMar 27, 2020
Closed-loop Parameter Identification of Linear Dynamical Systems through the Lens of Feedback Channel Coding TheoryAli Reza Pedram, Takashi Tanaka
This paper considers the problem of closed-loop identification of linear scalar systems with Gaussian process noise, where the system input is determined by a deterministic state feedback policy. The regularized least-square estimate (LSE) algorithm is adopted, seeking to find the best estimate of unknown model parameters based on noiseless measurements of the state. We are interested in the fundamental limitation of the rate at which unknown parameters can be learned, in the sense of the D-optimality scalarization criterion subject to a quadratic control cost. We first establish a novel connection between a closed-loop identification problem of interest and a channel coding problem involving an additive white Gaussian noise (AWGN) channel with feedback and a certain structural constraint. Based on this connection, we show that the learning rate is fundamentally upper bounded by the capacity of the corresponding AWGN channel. Although the optimal design of the feedback policy remains challenging, we derive conditions under which the upper bound is achieved. Finally, we show that the obtained upper bound implies that super-linear convergence is unattainable for any choice of the policy.
ROFeb 28, 2020
Rationally Inattentive Path-Planning via RRT*Jeb Stefan, Ali Reza Pedram, Riku Funada et al.
We consider a path-planning scenario for a mobile robot traveling in a configuration space with obstacles under the presence of stochastic disturbances. A novel path length metric is proposed on the uncertain configuration space and then integrated with the existing RRT* algorithm. The metric is a weighted sum of two terms which capture both the Euclidean distance traveled by the robot and the perception cost, i.e., the amount of information the robot must perceive about the environment to follow the path safely. The continuity of the path length function with respect to the topology of the total variation metric is shown and the optimality of the Rationally Inattentive RRT* algorithm is discussed. Three numerical studies are presented which display the utility of the new algorithm.
SYFeb 2, 2020
SARSA(0) Reinforcement Learning over Fully Homomorphic EncryptionJihoon Suh, Takashi Tanaka
We consider a cloud-based control architecture in which the local plants outsource the control synthesis task to the cloud. In particular, we consider a cloud-based reinforcement learning (RL), where updating the value function is outsourced to the cloud. To achieve confidentiality, we implement computations over Fully Homomorphic Encryption (FHE). We use a CKKS encryption scheme and a modified SARSA(0) reinforcement learning to incorporate the encryption-induced delays. We then give a convergence result for the delayed updated rule of SARSA(0) with a blocking mechanism. We finally present a numerical demonstration via implementing on a classical pole-balancing problem.
SYSep 18, 2018
Transfer Entropy in MDPs with Temporal Logic SpecificationsSuda Bharadwaj, Mohamadreza Ahmadi, Takashi Tanaka et al.
Emerging applications in autonomy require control techniques that take into account uncertain environments, communication and sensing constraints, while satisfying highlevel mission specifications. Motivated by this need, we consider a class of Markov decision processes (MDPs), along with a transfer entropy cost function. In this context, we study highlevel mission specifications as co-safe linear temporal logic (LTL) formulae. We provide a method to synthesize a policy that minimizes the weighted sum of the transfer entropy and the probability of failure to satisfy the specification. We derive a set of coupled non-linear equations that an optimal policy must satisfy. We then use a modified Arimoto-Blahut algorithm to solve the non-linear equations. Finally, we demonstrated the proposed method on a navigation and path planning scenario of a Mars rover.