OCNov 17, 2015
Optimal load-side control for frequency regulation in smart gridsEnrique Mallada, Changhong Zhao, Steven H. Low
Frequency control rebalances supply and demand while maintaining the network state within operational margins. It is implemented using fast ramping reserves that are expensive and wasteful, and which are expected to grow with the increasing penetration of renewables. The most promising solution to this problem is the use of demand response, i.e. load participation in frequency control. Yet it is still unclear how to efficiently integrate load participation without introducing instabilities and violating operational constraints. In this paper we present a comprehensive load-side frequency control mechanism that can maintain the grid within operational constraints. In particular, our controllers can rebalance supply and demand after disturbances, restore the frequency to its nominal value and preserve inter-area power flows. Furthermore, our controllers are distributed (unlike the currently implemented frequency control), can allocate load updates optimally, and can maintain line flows within thermal limits. We prove that such a distributed load-side control is globally asymptotically stable and robust to unknown load parameters. We illustrate its effectiveness through simulations.
OCJul 28, 2014
Skewless Network Clock Synchronization Without Discontinuity: Convergence and PerformanceEnrique Mallada, Xiaoqiao Meng, Michel Hack et al.
This paper examines synchronization of computer clocks connected via a data network and proposes a skewless algorithm to synchronize them. Unlike existing solutions, which either estimate and compensate the frequency difference (skew) among clocks or introduce offset corrections that can generate jitter and possibly even backward jumps, our solution achieves synchronization without these problems. We first analyze the convergence property of the algorithm and provide explicit necessary and sufficient conditions on the parameters to guarantee synchronization. We then study the effect of noisy measurements (jitter) and frequency drift (wander) on the offsets and synchronization frequency, and further optimize the parameter values to minimize their variance. Our study reveals a few insights, for example, we show that our algorithm can converge even in the presence of timing loops and noise, provided that there is a well defined leader. This marks a clear contrast with current standards such as NTP and PTP, where timing loops are specifically avoided. Furthermore, timing loops can even be beneficial in our scheme as it is demonstrated that highly connected subnetworks can collectively outperform individual clients when the time source has large jitter. The results are supported by experiments running on a cluster of IBM BladeCenter servers with Linux.
43.0SYMay 8
Learning Reachability of Energy Storage ArbitrageTomás Tapia, Agustin Castellano, Enrique Mallada et al.
Power systems face increasing weather-driven variability and, therefore, increasingly rely on flexible but energy-limited storage resources. Energy storage can buffer this variability, but its value depends on intertemporal decisions under uncertain prices. Without accounting for the future reliability value of stored energy, batteries may act myopically, discharging too early or failing to preserve reserves during critical hours. This paper introduces a stopping-time reward that, together with a state-of-charge (SoC) range target penalty, aligns arbitrage incentives with system reliability by rewarding storage that maintains sufficient SoC before critical hours. We formulate the problem as an online optimization with a chance-constrained terminal SoC and embed it in an end-to-end (E2E) learning framework, jointly training the price predictor and control policy. The proposed design enhances reachability of target SoC ranges, improves profit under volatile conditions, and reduces its standard deviation.
OCAug 19, 2013
Skewless Network Clock SynchronizationEnrique Mallada, Xiaoqiao Meng, Michel Hack et al.
This paper examines synchronization of computer clocks connected via a data network and proposes a skewless algorithm to synchronize them. Unlike existing solutions, which either estimate and compensate the frequency difference (skew) among clocks or introduce offset corrections that can generate jitter and possibly even backward jumps, our algorithm achieves synchronization without these problems. We first analyze the convergence property of the algorithm and provide necessary and sufficient conditions on the parameters to guarantee synchronization. We then implement our solution on a cluster of IBM BladeCenter servers running Linux and study its performance. In particular, both analytically and experimentally, we show that our algorithm can converge in the presence of timing loops. This marks a clear contrast with current standards such as NTP and PTP, where timing loops are specifically avoided. Furthermore, timing loops can even be beneficial in our scheme. For example, it is demonstrated that highly connected subnetworks can collectively outperform individual clients when the time source has large jitter. It is also experimentally demonstrated that our algorithm outperforms other well-established software-based solutions such as the NTPv4 and IBM Coordinated Cluster Time (IBM CCT).
OCOct 27, 2017
An integral quadratic constraint framework for real-time steady-state optimization of linear time-invariant systemsZachary E. Nelson, Enrique Mallada
Achieving optimal steady-state performance in real-time is an increasingly necessary requirement of many critical infrastructure systems. In pursuit of this goal, this paper builds a systematic design framework of feedback controllers for Linear Time-Invariant (LTI) systems that continuously track the optimal solution of some predefined optimization problem. The proposed solution can be logically divided into three components. The first component estimates the system state from the output measurements. The second component uses the estimated state and computes a drift direction based on an optimization algorithm. The third component computes an input to the LTI system that aims to drive the system toward the optimal steady-state. We analyze the equilibrium characteristics of the closed-loop system and provide conditions for optimality and stability. Our analysis shows that the proposed solution guarantees optimal steady-state performance, even in the presence of constant disturbances. Furthermore, by leveraging recent results on the analysis of optimization algorithms using integral quadratic constraints (IQCs), the proposed framework is able to translate input-output properties of our optimization component into sufficient conditions, based on linear matrix inequalities (LMIs), for global exponential asymptotic stability of the closed loop system. We illustrate the versatility of our framework using several examples.
OCDec 17, 2016
iDroop: A dynamic droop controller to decouple power grid's steady-state and dynamic performanceEnrique Mallada
This paper presents a novel Dynam-i-c Droop (iDroop) control mechanism to perform primary frequency control with gird-connected inverters that improves the network dynamic performance. The work is motivated by the dynamic degradation experienced by the power grid due to the increase in asynchronous inverted-based generation. We show that the widely suggested virtual inertia solution suffers from unbounded noise amplification (infinite $\mathcal H_2$ norm) when measurement noise is considered. This suggests that virtual inertia could potentially further degrade the grid performance once broadly deployed. This motivates the proposed solution in this paper that overcomes the limitations of virtual inertia controllers while sharing the same advantages of traditional droop control. In particular, our iDroop controllers are decentralized, rebalance supply and demand, and provide power sharing. Furthermore, our solution can improve the dynamic performance without affecting the steady state solution. Our algorithm can be incrementally deployed and can be guaranteed to be stable using a decentralized sufficient stability condition on the parameter values. We illustrate several features of our solution using numerical simulations.
SYMay 1, 2017
Performance tradeoffs of dynamically controlled grid-connected inverters in low inertia power systemsYan Jiang, Richard Pates, Enrique Mallada
Implementing frequency response using grid-connected inverters is one of the popular proposed alternatives to mitigate the dynamic degradation experienced in low inertia power systems. However, such solution faces several challenges as inverters do not intrinsically possess the natural response to power fluctuations that synchronous generators have. Thus, to synthetically generate this response, inverters need to take frequency measurements, which are usually noisy, and subsequently make changes in the output power, which are therefore delayed. This paper explores the system-wide performance tradeoffs that arise when measurement noise, power disturbances, and delayed actions are considered in the design of dynamic controllers for grid-connected inverters. Using a recently proposed dynamic droop (iDroop) control for grid-connected inverters, which is inspired by classical first order lead-lag compensation, we show that the sets of parameters that result in highest noise attenuation, power disturbance mitigation, and delay robustness do not necessarily have a common intersection. In particular, lead compensation is desired in systems where power disturbances are the predominant source of degradation, while lag compensation is a better alternative when the system is dominated by delays or frequency noise. Our analysis further shows that iDroop can outperform the standard droop alternative in both joint noise and disturbance mitigation, and delay robustness.
LGJul 24, 2023
Early Neuron Alignment in Two-layer ReLU Networks with Small InitializationHancheng Min, Enrique Mallada, René Vidal
This paper studies the problem of training a two-layer ReLU network for binary classification using gradient flow with small initialization. We consider a training dataset with well-separated input vectors: Any pair of input data with the same label are positively correlated, and any pair with different labels are negatively correlated. Our analysis shows that, during the early phase of training, neurons in the first layer try to align with either the positive data or the negative data, depending on its corresponding weight on the second layer. A careful analysis of the neurons' directional dynamics allows us to provide an $\mathcal{O}(\frac{\log n}{\sqrtμ})$ upper bound on the time it takes for all neurons to achieve good alignment with the input data, where $n$ is the number of data points and $μ$ measures how well the data are separated. After the early alignment phase, the loss converges to zero at a $\mathcal{O}(\frac{1}{t})$ rate, and the weight matrix on the first layer is approximately low-rank. Numerical experiments on the MNIST dataset illustrate our theoretical findings.
SYSep 12, 2019
Dynamics Concentration of Large-Scale Tightly-Connected NetworksHancheng Min, Enrique Mallada
The ability to achieve coordinated behavior --engineered or emergent-- on networked systems has attracted widespread interest over several fields. This has led to remarkable advances on the development of a theoretical understanding of the conditions under which agents within a network can reach agreement (consensus) or develop coordinated behaviors such as synchronization. However, fewer advances have been made toward explaining another commonly observed phenomena in tightly-connected networks systems: output responses of nodes in the networks are almost identical to each other despite heterogeneity in their individual dynamics. In this paper, we leverage tools from high-dimensional probability to provide an initial answer to this phenomena. More precisely, we show that for linear networks of nodal random transfer functions, as the network size and connectivity grows, every node in the network follows the same response to an input or disturbance --irrespectively of the source of this input. We term this behavior as dynamics concentration since it stems from the fact that the network transfer matrix uniformly converges in probability, i.e., it concentrates, to a unique dynamic response determined by the distribution of the random transfer function of each node. We further discuss the implications of our analysis in the context of model reduction and robustness, and provide numerical evidence that similar phenomena occur in small deterministic networks over a properly defined frequency band.
LGApr 21, 2022
Model-free Learning of Regions of Attraction via Recurrent SetsYue Shen, Maxim Bichuch, Enrique Mallada
We consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. Rather than leveraging approximate models with bounded uncertainty to find a (robust) invariant set contained in the ROA, we propose to learn sets that satisfy a more relaxed notion of containment known as recurrence. We define a set to be $τ$-recurrent (resp. $k$-recurrent) if every trajectory that starts within the set, returns to it after at most $τ$ seconds (resp. $k$ steps). We show that under mild assumptions a $τ$-recurrent set containing a stable equilibrium must be a subset of its ROA. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allow them to continue being executed even after an initial offline training stage. We further provide an upper bound on the number of counter-examples used by the algorithm, and almost sure convergence guarantees.
SYOct 26, 2018
Voltage Collapse Stabilization: A Game Theory ViewpointCharalampos Avraam, Jesse Rines, Aurik Sarker et al.
Voltage collapse is a type of blackout-inducing dynamic instability that occurs when the power demand exceeds the maximum power that can be transferred through the network. The traditional (preventive) approach to avoid voltage collapse is based on ensuring that the network never reaches its maximum capacity. However, such an approach leads to inefficiencies as it prevents operators to fully utilize the network resources and does not account for unprescribed events. To overcome this limitation, this paper seeks to initiate the study of voltage collapse stabilization. More precisely, for a DC network, we formulate the problem of voltage stability as a dynamic problem where each load seeks to achieve a constant power consumption by updating its conductance as the voltage changes. We show that such a system can be interpreted as a dynamic game, where each player (load) seeks to myopically maximize their utility, and where every stable power flow solution amounts to a Local Nash Equilibrium. Using this framework, we show that voltage collapse is equivalent to the non-existence of a Local Nash Equilibrium in the game and, as a result, it is caused by the lack of cooperation between loads. Finally, we propose a Voltage Collapse Stabilizer (VCS) controller that uses (flexible) loads that are willing to cooperate and provides a fair allocation of the curtailed demand. Our solution stabilizes voltage collapse even in the presence of non-cooperative loads. Numerical simulations validate several features of our controllers.
LGDec 3, 2022
Constrained Reinforcement Learning via Dissipative Saddle Flow DynamicsTianqi Zheng, Pengcheng You, Enrique Mallada
In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward while satisfying minimum requirements in secondary cumulative reward constraints. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods are based on stochastic gradient descent ascent algorithms whose trajectories are connected to the optimal policy only after a mixing output stage that depends on the algorithm's history. As a result, there is a mismatch between the behavioral policy and the optimal one. In this work, we propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories converge to the optimal policy almost surely.
39.6SYApr 3
Safety-Critical Control via Recurrent Tracking FunctionsJixian Liu, Enrique Mallada
This paper addresses the challenge of synthesizing safety-critical controllers for high-order nonlinear systems, where constructing valid Control Barrier Functions (CBFs) remains computationally intractable. Leveraging layered control, we design CBFs in reduced-order models (RoMs) while regulating full-order models' (FoMs) dynamics at the same time. Traditional Lyapunov tracking functions are required to decrease monotonically, and systematic synthesis methods for such functions exist only for fully-actuated systems. To overcome this limitation, we introduce Recurrent Tracking Functions (RTFs), which replace the monotonic decay requirement with a weaker finite-time recurrence condition. This relaxation permits transient deviations of tracking errors while ensuring safety. By integrating CBFs for RoMs with RTFs, we construct recurrent CBFs (RCBFs) whose zero-superlevel set is control $Ï$-recurrent, and guarantee safety for all initial states in such a set when RTFs are satisfied. We establish theoretical safety guarantees and validate the approach through a proof-of-concept numerical experiment, demonstrating RTFs' effectiveness and the safety of FoMs.
SYNov 28, 2022
Learning Coherent Clusters in Weakly-Connected Network SystemsHancheng Min, Enrique Mallada
We propose a structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components. First, the coherent groups are identified by a spectral clustering algorithm on the graph Laplacian matrix that models the network feedback. Then, a reduced network is built, where each node represents the aggregate dynamics of each coherent group, and the reduced network captures the dynamic coupling between the groups. We provide an upper bound on the approximation error when the network graph is randomly generated from a weight stochastic block model. Finally, numerical experiments align with and validate our theoretical findings.
51.9OCApr 19
Symplectic Inductive Bias for Data-Driven Target Reachability in Hamiltonian SystemsZhuo Ouyang, Jixian Liu, Enrique Mallada
Inductive bias refers to restrictions on the hypothesis class that enable a learning method to generalize effectively from limited data. A canonical example in control is linearity, which underpins low sample-complexity guarantees for stabilization and optimal control. For general nonlinear dynamics, by contrast, guarantees often rely on smoothness assumptions (e.g., Lipschitz continuity) which, when combined with covering arguments, can lead to data requirements that grow exponentially with the ambient dimension. In this paper we argue that data-efficient nonlinear control demands exploiting inductive bias embedded in nature itself, namely, structure imposed by physical laws. Focusing on Hamiltonian systems, we leverage symplectic geometry and intrinsic recurrence on energy level sets to solve target reachability problems. Our approach combines the recurrence property with a recently proposed class of policies, called chain policies, which composes locally certified trajectory segments extracted from demonstrations to achieve target reachability. We provide sufficient conditions for reachability under this construction and show that the resulting data requirements depend on explicit geometric and recurrence properties of the Hamiltonian rather than the state dimension.
LGJan 23, 2024
Learning safety critics via a non-contractive binary bellman operatorAgustin Castellano, Hancheng Min, Juan Andrés Bazerque et al.
The inability to naturally enforce safety in Reinforcement Learning (RL), with limited failures, is a core challenge impeding its use in real-world applications. One notion of safety of vast practical relevance is the ability to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.
LGMar 10, 2025
Understanding the Learning Dynamics of LoRA: A Gradient Flow Perspective on Low-Rank Adaptation in Matrix FactorizationZiqing Xu, Hancheng Min, Lachlan Ewen MacDonald et al.
Despite the empirical success of Low-Rank Adaptation (LoRA) in fine-tuning pre-trained models, there is little theoretical understanding of how first-order methods with carefully crafted initialization adapt models to new tasks. In this work, we take the first step towards bridging this gap by theoretically analyzing the learning dynamics of LoRA for matrix factorization (MF) under gradient flow (GF), emphasizing the crucial role of initialization. For small initialization, we theoretically show that GF converges to a neighborhood of the optimal solution, with smaller initialization leading to lower final error. Our analysis shows that the final error is affected by the misalignment between the singular spaces of the pre-trained model and the target matrix, and reducing the initialization scale improves alignment. To address this misalignment, we propose a spectral initialization for LoRA in MF and theoretically prove that GF with small spectral initialization converges to the fine-tuning task with arbitrary precision. Numerical experiments from MF and image classification validate our findings.
LGNov 8, 2024
Variance-Aware Linear UCB with Deep Representation for Neural Contextual BanditsHa Manh Bui, Enrique Mallada, Anqi Liu
By leveraging the representation power of deep neural networks, neural upper confidence bound (UCB) algorithms have shown success in contextual bandits. To further balance the exploration and exploitation, we propose Neural-$σ^2$-LinearUCB, a variance-aware algorithm that utilizes $σ^2_t$, i.e., an upper bound of the reward noise variance at round $t$, to enhance the uncertainty quantification quality of the UCB, resulting in a regret performance improvement. We provide an oracle version for our algorithm characterized by an oracle variance upper bound $σ^2_t$ and a practical version with a novel estimation for this variance bound. Theoretically, we provide rigorous regret analysis for both versions and prove that our oracle algorithm achieves a better regret guarantee than other neural-UCB algorithms in the neural contextual bandits setting. Empirically, our practical method enjoys a similar computational efficiency, while outperforming state-of-the-art techniques by having a better calibration and lower regret across multiple standard settings, including on the synthetic, UCI, MNIST, and CIFAR-10 datasets.
SYNov 17, 2025
Data-driven Acceleration of MPC with GuaranteesAgustin Castellano, Shijie Pan, Enrique Mallada
Model Predictive Control (MPC) is a powerful framework for optimal control but can be too slow for low-latency applications. We present a data-driven framework to accelerate MPC by replacing online optimization with a nonparametric policy constructed from offline MPC solutions. Our policy is greedy with respect to a constructed upper bound on the optimal cost-to-go, and can be implemented as a nonparametric lookup rule that is orders of magnitude faster than solving MPC online. Our analysis shows that under sufficient coverage condition of the offline data, the policy is recursively feasible and admits provable, bounded optimality gap. These conditions establish an explicit trade-off between the amount of data collected and the tightness of the bounds. Our experiments show that this policy is between 100 and 1000 times faster than standard MPC, with only a modest hit to optimality, showing potential for real-time control tasks.
OCMar 14, 2024
Dissipative Gradient Descent Ascent Method: A Control Theory Inspired Algorithm for Min-max OptimizationTianqi Zheng, Nicolas Loizou, Pengcheng You et al.
Gradient Descent Ascent (GDA) methods for min-max optimization problems typically produce oscillatory behavior that can lead to instability, e.g., in bilinear settings. To address this problem, we introduce a dissipation term into the GDA updates to dampen these oscillations. The proposed Dissipative GDA (DGDA) method can be seen as performing standard GDA on a state-augmented and regularized saddle function that does not strictly introduce additional convexity/concavity. We theoretically show the linear convergence of DGDA in the bilinear and strongly convex-strongly concave settings and assess its performance by comparing DGDA with other methods such as GDA, Extra-Gradient (EG), and Optimistic GDA. Our findings demonstrate that DGDA surpasses these methods, achieving superior convergence rates. We support our claims with two numerical examples that showcase DGDA's effectiveness in solving saddle point problems.
LGDec 9, 2021
Reinforcement Learning with Almost Sure ConstraintsAgustin Castellano, Hancheng Min, Juan Bazerque et al.
In this work we address the problem of finding feasible policies for Constrained Markov Decision Processes under probability one constraints. We argue that stationary policies are not sufficient for solving this problem, and that a rich class of policies can be found by endowing the controller with a scalar quantity, so called budget, that tracks how close the agent is to violating the constraint. We show that the minimal budget required to act safely can be obtained as the smallest fixed point of a Bellman-like operator, for which we analyze its convergence properties. We also show how to learn this quantity when the true kernel of the Markov decision process is not known, while providing sample-complexity bounds. The utility of knowing this minimal budget relies in that it can aid in the search of optimal or near-optimal policies by shrinking down the region of the state space the agent must navigate. Simulations illustrate the different nature of probability one constraints against the typically used constraints in expectation.
SYMay 18, 2021
Learning to Act Safely with Limited Exposure and Almost Sure CertaintyAgustin Castellano, Hancheng Min, Juan Bazerque et al.
This paper puts forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials. This is indeed possible, provided that one is willing to navigate trade-offs between optimality, level of exposure to unsafe events, and the maximum detection time of unsafe actions. We illustrate this concept in two complementary settings. We first focus on the canonical multi-armed bandit problem and study the intrinsic trade-offs of learning safety in the presence of uncertainty. Under mild assumptions on sufficient exploration, we provide an algorithm that provably detects all unsafe machines in an (expected) finite number of rounds. The analysis also unveils a trade-off between the number of rounds needed to secure the environment and the probability of discarding safe machines. We then consider the problem of finding optimal policies for a Markov Decision Process (MDP) with almost sure constraints. We show that the action-value function satisfies a barrier-based decomposition which allows for the identification of feasible policies independently of the reward process. Using this decomposition, we develop a Barrier-learning algorithm, that identifies such unsafe state-action pairs in a finite expected number of steps. Our analysis further highlights a trade-off between the time lag for the underlying MDP necessary to detect unsafe actions, and the level of exposure to unsafe events. Simulations corroborate our theoretical findings, further illustrating the aforementioned trade-offs, and suggesting that safety constraints can speed up the learning process.
LGMay 13, 2021
Convergence and Implicit Bias of Gradient Flow on Overparametrized Linear NetworksHancheng Min, Salma Tarmoun, Rene Vidal et al.
Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon is to study how initialization and overparametrization affect convergence and implicit bias of training algorithms. In this paper, we present a novel analysis of single-hidden-layer linear networks trained under gradient flow, which connects initialization, optimization, and overparametrization. Firstly, we show that the squared loss converges exponentially to its optimum at a rate that depends on the level of imbalance and the margin of the initialization. Secondly, we show that proper initialization constrains the dynamics of the network parameters to lie within an invariant set. In turn, minimizing the loss over this set leads to the min-norm solution. Finally, we show that large hidden layer width, together with (properly scaled) random initialization, ensures proximity to such an invariant set during training, allowing us to derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.
LGDec 24, 2020
Assured RL: Reinforcement Learning with Almost Sure ConstraintsAgustin Castellano, Juan Bazerque, Enrique Mallada
We consider the problem of finding optimal policies for a Markov Decision Process with almost sure constraints on state transitions and action triplets. We define value and action-value functions that satisfy a barrier-based decomposition which allows for the identification of feasible policies independently of the reward process. We prove that, given a policy π, certifying whether certain state-action pairs lead to feasible trajectories under π is equivalent to solving an auxiliary problem aimed at finding the probability of performing an unfeasible transition. Using this interpretation,we develop a Barrier-learning algorithm, based on Q-Learning, that identifies such unsafe state-action pairs. Our analysis motivates the need to enhance the Reinforcement Learning (RL) framework with an additional signal, besides rewards, called here damage function that provides feasibility information and enables the solution of RL problems with model-free constraints. Moreover, our Barrier-learning algorithm wraps around existing RL algorithms, such as Q-Learning and SARSA, giving them the ability to solve almost-surely constrained problems.
LGOct 1, 2020
Learning to be safe, in finite timeAgustin Castellano, Juan Bazerque, Enrique Mallada
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials, provided that one is willing to relax its optimality requirements mildly. We focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning. More precisely, by defining a handicap metric that counts the number of unsafe actions, we provide an algorithm for discarding unsafe machines (or actions), with probability one, that achieves constant handicap. Our algorithm is rooted in the classical sequential probability ratio test, redefined here for continuing tasks. Under standard assumptions on sufficient exploration, our rule provably detects all unsafe machines in an (expected) finite number of rounds. The analysis also unveils a trade-off between the number of rounds needed to secure the environment and the probability of discarding safe machines. Our decision rule can wrap around any other algorithm to optimize a specific auxiliary goal since it provides a safe environment to search for (approximately) optimal policies. Simulations corroborate our theoretical findings and further illustrate the aforementioned trade-offs.
SYMay 16, 2019
Global analysis of synchronization performance for power systems: bridging the theory-practice gapFernando Paganini, Enrique Mallada
The issue of synchronization in the power grid is receiving renewed attention, as new energy sources with different dynamics enter the picture. Global metrics have been proposed to evaluate performance and analyzed under highly simplified assumptions. In this paper, we extend this approach to more realistic network scenarios and more closely connect it with metrics used in power engineering practice. In particular, our analysis covers networks with generators of heterogeneous ratings and richer dynamic models of machines. Under a suitable proportionality assumption in the parameters, we show that the step response of bus frequencies can be decomposed in two components. The first component is a {system-wide frequency} that captures the aggregate grid behavior, and the residual component represents the individual bus frequency deviations from the aggregate. Using this decomposition, we define --and compute in closed form-- several metrics that capture dynamic behaviors that are of relevance for power engineers. In particular, using the \emph{system frequency}, we define industry-style metrics (Nadir, RoCoF) that are evaluated through a representative machine. We further use the norm of the residual component to define a \emph{synchronization cost} that can appropriately quantify inter-area oscillations. Finally, we employ robustness analysis tools to evaluate deviations from our proportionality assumption. We show that the system frequency still captures the grid steady-state deviation, and becomes an accurate reduced-order model of the grid as the network connectivity grows. Simulation studies with practically relevant data are included to validate the theory and further illustrate the impact of network structure and parameters on synchronization. Our analysis gives conclusions of practical interest, sometimes challenging the conventional wisdom in the field.
OCMay 14, 2019
Global performance metrics for synchronization of heterogeneously rated power systems: The role of machine models and inertiaFernando Paganini, Enrique Mallada
A recent trend in control of power systems has sought to quantify the synchronization dynamics in terms of a global performance metric, compute it under very simplified assumptions, and use it to gain insight on the role of system parameters, in particular, inertia. In this paper, we wish to extend this approach to more realistic scenarios, by incorporating the heterogeneity of machine ratings, more complete machine models, and also to more closely map it to classical power engineering notions such as Nadir, Rate of Change of Frequency (RoCoF), and inter-area oscillations. We consider the system response to a step change in power excitation, and define the system frequency as a weighted average of generator frequencies (with weights proportional to each machine's rating); we characterize Nadir and RoCoF by the $L_\infty$ norm of the system frequency and its derivative, respectively, and inter-areas oscillations by the $L_2$ norm of the error of the vector of bus frequencies w.r.t. the system frequency. For machine models where the dynamic parameters (inertia, damping, etc.) are proportional to rating, we analytically compute these norms and use them to show that the role of inertia is more nuanced than in the conventional wisdom. With the classical swing dynamics, inertia constant plays a secondary role in performance. It is only when the turbine dynamics are introduced that the benefits of inertia become more prominent.
SYJun 2, 2017
Understanding the Inefficiency of Security-Constrained Economic DispatchMohammad H. Hajiesmaili, Desmond Cai, Enrique Mallada
The security-constrained economic dispatch (SCED) problem tries to maintain the reliability of a power network by ensuring that a single failure does not lead to a global outage. The previous research has mainly investigated SCED by formulating the problem in different modalities, e.g. preventive or corrective, and devising efficient solutions for SCED. In this paper, we tackle a novel and important direction, and analyze the economic cost of incorporating security constraints in economic dispatch. Inspired by existing inefficiency metrics in game theory and computer science, we introduce notion of price of security as a metric that formally characterizes the economic inefficiency of security-constrained economic dispatch as compared to the original problem without security constraints. Then, we focus on the preventive approach in a simple topology comprising two buses and two lines, and investigate the impact of generation availability and demand distribution on the price of security. Moreover, we explicitly derive the worst-case input instance that leads to the maximum price of security. By extensive experimental study on two test-cases, we verify the analytical results and provide insights for characterizing the price of security in general networks.