OCAug 4, 2012
Provably Safe and Robust Learning-Based Model Predictive ControlAnil Aswani, Humberto Gonzalez, S. Shankar Sastry et al.
Controller design faces a trade-off between robustness and performance, and the reliability of linear controllers has caused many practitioners to focus on the former. However, there is renewed interest in improving system performance to deal with growing energy constraints. This paper describes a learning-based model predictive control (LBMPC) scheme that provides deterministic guarantees on robustness, while statistical identification tools are used to identify richer models of the system in order to improve performance; the benefits of this framework are that it handles state and input constraints, optimizes system performance with respect to a cost function, and can be designed to use a wide variety of parametric or nonparametric statistical tools. The main insight of LBMPC is that safety and performance can be decoupled under reasonable conditions in an optimization framework by maintaining two models of the system. The first is an approximate model with bounds on its uncertainty, and the second model is updated by statistical methods. LBMPC improves performance by choosing inputs that minimize a cost subject to the learned dynamics, and it ensures safety and robustness by checking whether these same inputs keep the approximate model stable when it is subject to uncertainty. Furthermore, we show that if the system is sufficiently excited, then the LBMPC control action probabilistically converges to that of an MPC computed using the true dynamics.
SYApr 25, 2012
Quantitative Methods for Comparing Different HVAC Control SchemesAnil Aswani, Neal Master, Jay Taneja et al.
Experimentally comparing the energy usage and comfort characteristics of different controllers in heating, ventilation, and air-conditioning (HVAC) systems is difficult because variations in weather and occupancy conditions preclude the possibility of establishing equivalent experimental conditions across the order of hours, days, and weeks. This paper is concerned with defining quantitative metrics of energy usage and occupant comfort, which can be computed and compared in a rigorous manner that is capable of determining whether differences between controllers are statistically significant in the presence of such environmental fluctuations. Experimental case studies are presented that compare two alternative controllers (a schedule controller and a hybrid system learning-based model predictive controller) to the default controller in a building-wide HVAC system. Lastly, we discuss how our proposed methodology may also be able to quantify the efficiency of other building automation systems.
OCJul 11, 2012
Incentive Design for Efficient Building Quality of ServiceAnil Aswani, Claire Tomlin
Buildings are a large consumer of energy, and reducing their energy usage may provide financial and societal benefits. One challenge in achieving efficient building operation is the fact that few financial motivations exist for encouraging low energy configuration and operation of buildings. As a result, incentive schemes for managers of large buildings are being proposed for the purpose of saving energy. This paper focuses on incentive design for the configuration and operation of building-wide heating, ventilation, and air-conditioning (HVAC) systems, because these systems constitute the largest portion of energy usage in most buildings. We begin with an empirical model of a building-wide HVAC system, which describes the tradeoffs between energy consumption, quality of service (as defined by occupant satisfaction), and the amount of work required for maintenance and configuration. The model has significant non-convexities, and so we derive some results regarding qualitative properties of non-convex optimization problems with certain partial-ordering features. These results are used to show that "baselining" incentive schemes suffer from moral hazard problems, and they also encourage energy reductions at the expense of also decreasing occupant satisfaction. We propose an alternative incentive scheme that has the interpretation of a performance-based bonus. A theoretical analysis shows that this encourages energy and monetary savings and modest gains in occupant satisfaction and quality of service, which is confirmed by our numerical simulations.
OCSep 25, 2017
Dynamic Watermarking for General LTI SystemsPedro Hespanhol, Matthew Porter, Ram Vasudevan et al.
Detecting attacks in control systems is an important aspect of designing secure and resilient control systems. Recently, a dynamic watermarking approach was proposed for detecting malicious sensor attacks for SISO LTI systems with partial state observations and MIMO LTI systems with a full rank input matrix and full state observations; however, these previous approaches cannot be applied to general LTI systems that are MIMO and have partial state observations. This paper designs a dynamic watermarking approach for detecting malicious sensor attacks for general LTI systems, and we provide a new set of asymptotic and statistical tests. We prove these tests can detect attacks that follow a specified attack model (more general than replay attacks), and we also show that these tests simplify to existing tests when the system is SISO or has full rank input matrix and full state observations. The benefit of our approach is demonstrated with a simulation analysis of detecting sensor attacks in autonomous vehicles. Our approach can distinguish between sensor attacks and wind disturbance (through an internal model principle framework), whereas improperly designed tests cannot distinguish between sensor attacks and wind disturbance.
OCSep 25, 2017
Statistical Watermarking for Networked Control SystemsPedro Hespanhol, Matthew Porter, Ram Vasudevan et al.
Watermarking can detect sensor attacks in control systems by injecting a private signal into the control, whereby attacks are identified by checking the statistics of the sensor measurements and private signal. However, past approaches assume full state measurements or a centralized controller, which is not found in networked LTI systems with subcontrollers. Since generally the entire system is neither controllable nor observable by a single subcontroller, communication of sensor measurements is required to ensure closed-loop stability. The possibility of attacking the communication channel has not been explicitly considered by previous watermarking schemes, and requires a new design. In this paper, we derive a statistical watermarking test that can detect both sensor and communication attacks. A unique (compared to the non-networked case) aspect of the implementing this test is the state-feedback controller must be designed so that the closed-loop system is controllable by each sub-controller, and we provide two approaches to design such a controller using Heymann's lemma and a multi-input generalization of Heymann's lemma. The usefulness of our approach is demonstrated with a simulation of detecting attacks in a platoon of autonomous vehicles. Our test allows each vehicle to independently detect attacks on both the communication channel between vehicles and on the sensor measurements.
OCSep 25, 2017
Local Water Storage Control for the Developing WorldYonatan Mintz, Zuo-Jun Max Shen, Anil Aswani
Most cities in India do not have water distribution networks that provide water throughout the entire day. As a result, it is common for homes and apartment buildings to utilize water storage systems that are filled during a small window of time in the day when the water distribution network is active. However, these water storage systems do not have disinfection capabilities, and so long durations of storage (i.e., as few as four days) of the same water leads to substantial increases in the amount of bacteria and viruses in that water. This paper considers the stochastic control problem of deciding how much water to store each day in the system, as well as deciding when to completely empty the water system, in order to tradeoff: the financial costs of the water, the health costs implicit in long durations of storing the same water, the potential for a shortfall in the quantity of stored versus demanded water, and water wastage from emptying the system. To solve this problem, we develop a new Binary Dynamic Search (BiDS) algorithm that is able to use binary search in one dimension to compute the value function of stochastic optimal control problems with controlled resets to a single state and with constraints on the maximum time span in between resets of the system.
LGApr 14, 2023
Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge AgentsIlgin Dogan, Zuo-Jun Max Shen, Anil Aswani
Motivated by a number of real-world applications from domains like healthcare and sustainable transportation, in this paper we study a scenario of repeated principal-agent games within a multi-armed bandit (MAB) framework, where: the principal gives a different incentive for each bandit arm, the agent picks a bandit arm to maximize its own expected reward plus incentive, and the principal observes which arm is chosen and receives a reward (different than that of the agent) for the chosen arm. Designing policies for the principal is challenging because the principal cannot directly observe the reward that the agent receives for their chosen actions, and so the principal cannot directly learn the expected reward using existing estimation techniques. As a result, the problem of designing policies for this scenario, as well as similar ones, remains mostly unexplored. In this paper, we construct a policy that achieves a low regret (i.e., square-root regret up to a log factor) in this scenario for the case where the agent has perfect-knowledge about its own expected rewards for each bandit arm. We design our policy by first constructing an estimator for the agent's expected reward for each bandit arm. Since our estimator uses as data the sequence of incentives offered and subsequently chosen arms, the principal's estimation can be regarded as an analogy of online inverse optimization in MAB's. Next we construct a policy that we prove achieves a low regret by deriving finite-sample concentration bounds for our estimator. We conclude with numerical simulations demonstrating the applicability of our policy to real-life setting from collaborative transportation planning.
LGAug 13, 2023
Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden RewardsIlgin Dogan, Zuo-Jun Max Shen, Anil Aswani
In practice, incentive providers (i.e., principals) often cannot observe the reward realizations of incentivized agents, which is in contrast to many principal-agent models that have been previously studied. This information asymmetry challenges the principal to consistently estimate the agent's unknown rewards by solely watching the agent's decisions, which becomes even more challenging when the agent has to learn its own rewards. This complex setting is observed in various real-life scenarios ranging from renewable energy storage contracts to personalized healthcare incentives. Hence, it offers not only interesting theoretical questions but also wide practical relevance. This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal. The agent tackles a multi-armed bandit (MAB) problem to maximize their expected reward plus incentive. On top of the agent's learning, the principal trains a parallel algorithm and faces a trade-off between consistently estimating the agent's unknown rewards and maximizing their own utility by offering adaptive incentives to lead the agent. For a non-parametric model, we introduce an estimator whose only input is the history of principal's incentives and agent's choices. We unite this estimator with a proposed data-driven incentive policy within a MAB framework. Without restricting the type of the agent's algorithm, we prove finite-sample consistency of the estimator and a rigorous regret bound for the principal by considering the sequential externality imposed by the agent. Lastly, our theoretical results are reinforced by simulations justifying applicability of our framework to green energy aggregator contracts.
OCMar 18, 2019
Surrogate Optimal Control for Strategic Multi-Agent SystemsPedro Hespanhol, Anil Aswani
This paper studies how to design a platform to optimally control constrained multi-agent systems with a single coordinator and multiple strategic agents. In our setting, the agents cannot apply control inputs and only the coordinator applies control inputs; however, the coordinator does not know the objective functions of the agents, and so must choose control actions based on information provided by the agents. One major challenge is that if the platform is not correctly designed then the agents may provide false information to the coordinator in order to achieve improved outcomes for themselves at the expense of the overall system efficiency. Here, we design an interaction mechanism between the agents and the coordinator such that the mechanism: ensures agents truthfully report their information, has low communication requirements, and leads to a control action that achieves efficiency by achieving a Nash equilibrium. In particular, we design a mechanism in which each agent does not need to posses full knowledge of the system dynamics nor the objective functions of other agents. We illustrate our proposed mechanism in a model predictive control (MPC) application involving heating, ventilation, air-conditioning (HVAC) control by a building manager of an apartment building. Our results showcase how such a mechanism can be potentially used in the context of distributed MPC.
OCSep 29, 2017
Designing Real-Time Prices to Reduce Load Variability with HVACJohn Audie Cabrera, Yonatan Mintz, Jhoanna Rhodette Pedrasa et al.
Utilities use demand response to shift or reduce electricity usage of flexible loads, to better match electricity demand to power generation. A common mechanism is peak pricing (PP), where consumers pay reduced (increased) prices for electricity during periods of low (high) demand, and its simplicity allows consumers to understand how their consumption affects costs. However, new consumer technologies like internet-connected smart thermostats simplify real-time pricing (RP), because such devices can automate the tradeoff between costs and consumption. These devices enable consumer choice under RP by abstracting this tradeoff into a question of quality of service (e.g., comfort) versus price. This paper uses a principal-agent framework to design PP and RP rates for heating, ventilation, and air-conditioning (HVAC) to address adverse selection due to variations in consumer comfort preferences. We formulate the pricing problem as a stochastic bilevel program, and numerically solve it by reformulation as a mixed integer program (MIP). Last, we compare the effectiveness of different pricing schemes on reductions of peak load or load variability. We find that PP pricing induces HVAC consumption to spike high (before), spike low (during), and spike high (after) the PP event, whereas RP achieves reductions in peak loads and load variability while preventing large spikes in electricity usage.
LGNov 28, 2022
Accelerated Nonnegative Tensor Completion via Integer ProgrammingWenhao Pan, Anil Aswani, Chen Chen
The problem of tensor completion has applications in healthcare, computer vision, and other domains. However, past approaches to tensor completion have faced a tension in that they either have polynomial-time computation but require exponentially more samples than the information-theoretic rate, or they use fewer samples but require solving NP-hard problems for which there are no known practical algorithms. A recent approach, based on integer programming, resolves this tension for nonnegative tensor completion. It achieves the information-theoretic sample complexity rate and deploys the Blended Conditional Gradients algorithm, which requires a linear (in numerical tolerance) number of oracle steps to converge to the global optimum. The tradeoff in this approach is that, in the worst case, the oracle step requires solving an integer linear program. Despite this theoretical limitation, numerical experiments show that this algorithm can, on certain instances, scale up to 100 million entries while running on a personal computer. The goal of this paper is to further enhance this algorithm, with the intention to expand both the breadth and scale of instances that can be solved. We explore several variants that can maintain the same theoretical guarantees as the algorithm, but offer potentially faster computation. We consider different data structures, acceleration of gradient descent steps, and the use of the Blended Pairwise Conditional Gradients algorithm. We describe the original approach and these variants, and conduct numerical experiments in order to explore various tradeoffs in these algorithmic design choices.
OCApr 10, 2014Code
Practical Comparison of Optimization Algorithms for Learning-Based MPC with Linear ModelsAnil Aswani, Patrick Bouffard, Xiaojing Zhang et al.
Learning-based control methods are an attractive approach for addressing performance and efficiency challenges in robotics and automation systems. One such technique that has found application in these domains is learning-based model predictive control (LBMPC). An important novelty of LBMPC lies in the fact that its robustness and stability properties are independent of the type of online learning used. This allows the use of advanced statistical or machine learning methods to provide the adaptation for the controller. This paper is concerned with providing practical comparisons of different optimization algorithms for implementing the LBMPC method, for the special case where the dynamic model of the system is linear and the online learning provides linear updates to the dynamic model. For comparison purposes, we have implemented a primal-dual infeasible start interior point method that exploits the sparsity structure of LBMPC. Our open source implementation (called LBmpcIPM) is available through a BSD license and is provided freely to enable the rapid implementation of LBMPC on other platforms. This solver is compared to the dense active set solvers LSSOL and qpOASES using a quadrotor helicopter platform. Two scenarios are considered: The first is a simulation comparing hovering control for the quadrotor, and the second is on-board control experiments of dynamic quadrotor flight. Though the LBmpcIPM method has better asymptotic computational complexity than LSSOL and qpOASES, we find that for certain integrated systems (like our quadrotor testbed) these methods can outperform LBmpcIPM. This suggests that actual benchmarks should be used when choosing which algorithm is used to implement LBMPC on practical systems.
GTApr 5
Collusion-proof Auction Design using Side InformationSukanya Kudva, Edward Dowling, Anil Aswani
We consider a multi-unit auction of identical items with single-minded bidders, where a subset of bidders may collude by coordinating bids and transferring payments and items among themselves. Classical collusion-proof mechanisms are largely restricted to posted-price formats, which fail to guarantee even approximate efficiency. We therefore adopt a learning-augmented approach to leverage side information about which bidders are colluding and obtain improved welfare and revenue guarantees. In our setting, colluding bidders optimally shade their bids to suppress prices. Using this characterization, we establish a Bulow-Klemperer type result showing that recruiting more honest bidders is better than the best collusion-proof auction mechanism. We then consider a setting in which a black-box collusion detection algorithm labels bidders as colluding or non-colluding, and propose a VCG Posted Price (V-PoP) mechanism that applies VCG to non-colluding bidders and posted prices to colluding bidders. We show that V-PoP is ex-post dominant-strategy incentive compatible (DSIC) even when it uses select bidder information to calculate an optimal split of items between the subgroups. Additionally, we derive probabilistic guarantees on expected welfare and revenue under both known and unknown valuation distributions, and analyze the robustness of V-PoP to bidder misclassification errors. Numerical experiments across several distributions demonstrate that V-PoP consistently outperforms VCG restricted to non-colluding bidders and approaches the performance of the ideal VCG mechanism assuming universal truthfulness. Our results provide a principled framework for incorporating collusion detection into mechanism design, advancing the theory of auctions under collusion.
OCApr 30
Moral Hazard in LTI Dynamics: A Hypothesis Testing ApproachJaewon Jeong, Pan-Yang Su, S. Shankar Sastry et al.
Many incentive design problems must contend with information asymmetries due to non-observation of efficiency (adverse selection) or non-observation of effort (moral hazard). And although a growing body of literature considers incentive design in control systems, the problem of designing incentives for control systems under information asymmetries has been less well-studied. This paper considers a model of moral hazard within control systems. In our model, the control system is described by an (affine) linear time-invariant (LTI) system with process noise. There is an agent who gets to choose (from between two choices) a linear state-feedback controller to apply to the LTI system, with one of the state-feedback controllers having a higher quadratic cost on the control inputs than the other. Our goal is to design a payment scheme that incentivizes the agent to choose the state-feedback controller that minimizes a quadratic cost on system states plus the time-discounted payment amount, subject to the understanding that the agent bears the control cost while being risk-averse with respect to their time-discounted payment. We formulate the problem as a constrained optimization, and prove that for a payment given after a fixed (but optimizable) time horizon the optimal payment scheme chooses the payment amount using a likelihood ratio hypothesis test. We numerically demonstrate our results by applying the derived optimal payment scheme to two examples: load frequency control (LFC) in power systems and wellness interventions for body weight loss.
LGApr 3, 2024
Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical VentilationJoo Seung Lee, Malini Mahendra, Anil Aswani
Mechanical ventilation is a critical life support intervention that delivers controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and alignment with domain knowledge, hindering clinical adoption. This paper presents a methodology for interpretable reinforcement learning (RL) aimed at improving mechanical ventilation control as part of connected health systems. Using a causal, nonparametric model-based off-policy evaluation, we assess RL policies for their ability to enhance patient-specific outcomes-specifically, increasing blood oxygen levels (SpO2), while avoiding aggressive ventilator settings that may cause ventilator-induced lung injuries and other complications. Through numerical experiments on real-world ICU data from the MIMIC-III database, we demonstrate that our interpretable decision tree policy achieves performance comparable to state-of-the-art deep RL methods while outperforming standard behavior cloning approaches. The results highlight the potential of interpretable, data-driven decision support systems to improve safety and efficiency in personalized ventilation strategies, paving the way for seamless integration into connected healthcare environments.
OCFeb 6, 2024
Tensor Completion via Integer OptimizationXin Chen, Sukanya Kudva, Yongzheng Dai et al.
The main challenge with the tensor completion problem is a fundamental tension between computation power and the information-theoretic sample complexity rate. Past approaches either achieve the information-theoretic rate but lack practical algorithms to compute the corresponding solution, or have polynomial-time algorithms that require an exponentially-larger number of samples for low estimation error. This paper develops a novel tensor completion algorithm that resolves this tension by achieving both provable convergence (in numerical tolerance) in a linear number of oracle steps and the information-theoretic rate. Our approach formulates tensor completion as a convex optimization problem constrained using a gauge-based tensor norm, which is defined in a way that allows the use of integer linear optimization to solve linear separation problems over the unit-ball in this new norm. Adaptations based on this insight are incorporated into a Frank-Wolfe variant to build our algorithm. We show our algorithm scales-well using numerical experiments on tensors with up to ten million entries.
LGNov 27, 2021
Learning from learning machines: a new generation of AI technology to meet the needs of scienceLuca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin et al.
We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and data-driven AI learning machines, then we expect that these AI models can transform hypothesis generation, scientific discovery, and the scientific process itself.
LGNov 8, 2021
Nonnegative Tensor Completion via Integer OptimizationCaleb Bugg, Chen Chen, Anil Aswani
Unlike matrix completion, tensor completion does not have an algorithm that is known to achieve the information-theoretic sample complexity rate. This paper develops a new algorithm for the special case of completion for nonnegative tensors. We prove that our algorithm converges in a linear (in numerical tolerance) number of oracle steps, while achieving the information-theoretic rate. Our approach is to define a new norm for nonnegative tensors using the gauge of a particular 0-1 polytope; integer linear programming can, in turn, be used to solve linear separation problems over this polytope. We combine this insight with a variant of the Frank-Wolfe algorithm to construct our numerical algorithm, and we demonstrate its effectiveness and scalability through computational experiments using a laptop on tensors with up to one-hundred million entries.
LGOct 18, 2021
Protecting Anonymous Speech: A Generative Adversarial Network Methodology for Removing Stylistic Indicators in TextRishi Balakrishnan, Stephen Sloan, Anil Aswani
With Internet users constantly leaving a trail of text, whether through blogs, emails, or social media posts, the ability to write and protest anonymously is being eroded because artificial intelligence, when given a sample of previous work, can match text with its author out of hundreds of possible candidates. Existing approaches to authorship anonymization, also known as authorship obfuscation, often focus on protecting binary demographic attributes rather than identity as a whole. Even those that do focus on obfuscating identity require manual feedback, lose the coherence of the original sentence, or only perform well given a limited subset of authors. In this paper, we develop a new approach to authorship anonymization by constructing a generative adversarial network that protects identity and optimizes for three different losses corresponding to anonymity, fluency, and content preservation. Our fully automatic method achieves comparable results to other methods in terms of content preservation and fluency, but greatly outperforms baselines in regards to anonymization. Moreover, our approach is able to generalize well to an open-set context and anonymize sentences from authors it has not encountered before.
OCAug 4, 2021
Regret Analysis of Learning-Based MPC with Partially-Unknown Cost FunctionIlgin Dogan, Zuo-Jun Max Shen, Anil Aswani
The exploration/exploitation trade-off is an inherent challenge in data-driven adaptive control. Though this trade-off has been studied for multi-armed bandits (MAB's) and reinforcement learning for linear systems; it is less well-studied for learning-based control of nonlinear systems. A significant theoretical challenge in the nonlinear setting is that there is no explicit characterization of an optimal controller for a given set of cost and system parameters. We propose the use of a finite-horizon oracle controller with full knowledge of parameters as a reasonable surrogate to optimal controller. This allows us to develop policies in the context of learning-based MPC and MAB's and conduct a control-theoretic analysis using techniques from MPC- and optimization-theory to show these policies achieve low regret with respect to this finite-horizon oracle. Our simulations exhibit the low regret of our policy on a heating, ventilation, and air-conditioning model with partially-unknown cost function.
SYMar 31, 2020
Covariance-Robust Dynamic WatermarkingMatt Olfat, Stephen Sloan, Pedro Hespanhol et al.
Attack detection and mitigation strategies for cyberphysical systems (CPS) are an active area of research, and researchers have developed a variety of attack-detection tools such as dynamic watermarking. However, such methods often make assumptions that are difficult to guarantee, such as exact knowledge of the distribution of measurement noise. Here, we develop a new dynamic watermarking method that we call covariance-robust dynamic watermarking, which is able to handle uncertainties in the covariance of measurement noise. Specifically, we consider two cases. In the first this covariance is fixed but unknown, and in the second this covariance is slowly-varying. For our tests, we only require knowledge of a set within which the covariance lies. Furthermore, we connect this problem to that of algorithmic fairness and the nascent field of fair hypothesis testing, and we show that our tests satisfy some notions of fairness. Finally, we exhibit the efficacy of our tests on empirical examples chosen to reflect values observed in a standard simulation model of autonomous vehicles.
RONov 6, 2018
Dynamic Regret Convergence Analysis and an Adaptive Regularization Algorithm for On-Policy Robot Imitation LearningJonathan N. Lee, Michael Laskey, Ajay Kumar Tanwani et al.
On-policy imitation learning algorithms such as DAgger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for DAgger. However, in more realistic models for robotics, the underlying trajectory distribution is dynamic because it is a function of the policy. Recent results show it is possible to prove convergence of DAgger when a regularity condition on the rate of change of the trajectory distributions is satisfied. In this article, we reframe this result using dynamic regret theory from the field of online optimization and show that dynamic regret can be applied to any on-policy algorithm to analyze its convergence and optimality. These results inspire a new algorithm, Adaptive On-Policy Regularization (AOR), that ensures the conditions for convergence. We present simulation results with cart-pole balancing and locomotion benchmarks that suggest AOR can significantly decrease dynamic regret and chattering as the robot learns. To our knowledge, this the first application of dynamic regret theory to imitation learning.
LGOct 9, 2018
Average Margin Regularization for ClassifiersMatt Olfat, Anil Aswani
Adversarial robustness has become an important research topic given empirical demonstrations on the lack of robustness of deep neural networks. Unfortunately, recent theoretical results suggest that adversarial training induces a strict tradeoff between classification accuracy and adversarial robustness. In this paper, we propose and then study a new regularization for any margin classifier or deep neural network. We motivate this regularization by a novel generalization bound that shows a tradeoff in classifier accuracy between maximizing its margin and average margin. We thus call our approach an average margin (AM) regularization, and it consists of a linear term added to the objective. We theoretically show that for certain distributions AM regularization can both improve classifier accuracy and robustness to adversarial attacks. We conclude by using both synthetic and real data to empirically show that AM regularization can strictly improve both accuracy and robustness for support vector machine's (SVM's), relative to unregularized classifiers and adversarially trained classifiers.
SYOct 17, 2018
Simulation and Real-World Evaluation of Attack Detection SchemesMatthew Porter, Arnav Joshi, Pedro Hespanhol et al.
A variety of anomaly detection schemes have been proposed to detect malicious attacks to Cyber-Physical Systems. Among these schemes, Dynamic Watermarking methods have been proven highly effective at detecting a wide range of attacks. Unfortunately, in contrast to other anomaly detectors, no method has been presented to design a Dynamic Watermarking detector to achieve a user-specified false alarm rate, or subsequently evaluate the capabilities of an attacker under such a selection. This paper describes methods to measure the capability of an attacker, to numerically approximate this metric, and to design a Dynamic Watermarking detector that can achieve a user-specified rate of false alarms. The performance of the Dynamic Watermarking detector is compared to three classical anomaly detectors in simulation and on a real-world platform. These experiments illustrate that the attack capability under the Dynamic Watermarking detector is comparable to those of classic anomaly detectors. Importantly, these experiments also make clear that the Dynamic Watermarking detector is consistently able to detect attacks that the other class of detectors are unable to identify.
LGFeb 11, 2018
Convex Formulations for Fair Principal Component AnalysisMatt Olfat, Anil Aswani
Though there is a growing body of literature on fairness for supervised learning, the problem of incorporating fairness into unsupervised learning has been less well-studied. This paper studies fairness in the context of principal component analysis (PCA). We first present a definition of fairness for dimensionality reduction, and our definition can be interpreted as saying that a reduction is fair if information about a protected class (e.g., race or gender) cannot be inferred from the dimensionality-reduced data points. Next, we develop convex optimization formulations that can improve the fairness (with respect to our definition) of PCA and kernel PCA. These formulations are semidefinite programs (SDP's), and we demonstrate the effectiveness of our formulations using several datasets. We conclude by showing how our approach can be used to perform a fair (with respect to age) clustering of health data that may be used to set health insurance rates.
LGOct 16, 2017
Spectral Algorithms for Computing Fair Support Vector MachinesMatt Olfat, Anil Aswani
Classifiers and rating scores are prone to implicitly codifying biases, which may be present in the training data, against protected classes (i.e., age, gender, or race). So it is important to understand how to design classifiers and scores that prevent discrimination in predictions. This paper develops computationally tractable algorithms for designing accurate but fair support vector machines (SVM's). Our approach imposes a constraint on the covariance matrices conditioned on each protected class, which leads to a nonconvex quadratic constraint in the SVM formulation. We develop iterative algorithms to compute fair linear and kernel SVM's, which solve a sequence of relaxations constructed using a spectral decomposition of the nonconvex constraint. Its effectiveness in achieving high prediction accuracy while ensuring fairness is shown through numerical experiments on several data sets.
OCJul 26, 2017
Non-Stationary Bandits with Habituation and Recovery DynamicsYonatan Mintz, Anil Aswani, Philip Kaminsky et al.
Many settings involve sequential decision-making where a set of actions can be chosen at each time step, each action provides a stochastic reward, and the distribution for the reward of each action is initially unknown. However, frequent selection of a specific action may reduce its expected reward, while abstaining from choosing an action may cause its expected reward to increase. Such non-stationary phenomena are observed in many real world settings such as personalized healthcare-adherence improving interventions and targeted online advertising. Though finding an optimal policy for general models with non-stationarity is PSPACE-complete, we propose and analyze a new class of models called ROGUE (Reducing or Gaining Unknown Efficacy) bandits, which we show in this paper can capture these phenomena and are amenable to the design of effective policies. We first present a consistent maximum likelihood estimator for the parameters of these models. Next, we construct finite sample concentration bounds that lead to an upper confidence bound policy called the ROGUE Upper Confidence Bound (ROGUE-UCB) algorithm. We prove that under proper conditions the ROGUE-UCB algorithm achieves logarithmic in time regret, unlike existing algorithms which result in linear regret. We conclude with a numerical experiment using real data from a personalized healthcare-adherence improving intervention to increase physical activity. In this intervention, the goal is to optimize the selection of messages (e.g., confidence increasing vs. knowledge increasing) to send to each individual each day to increase adherence and physical activity. Our results show that ROGUE-UCB performs better in terms of regret and average reward as compared to state of the art algorithms, and the use of ROGUE-UCB increases daily step counts by roughly 1,000 steps a day (about a half-mile more of walking) as compared to other algorithms.
STDec 1, 2014
Low-Rank Approximation and Completion of Positive TensorsAnil Aswani
Unlike the matrix case, computing low-rank approximations of tensors is NP-hard and numerically ill-posed in general. Even the best rank-1 approximation of a tensor is NP-hard. In this paper, we use convex optimization to develop polynomial-time algorithms for low-rank approximation and completion of positive tensors. Our approach is to use algebraic topology to define a new (numerically well-posed) decomposition for positive tensors, which we show is equivalent to the standard tensor decomposition in important cases. Though computing this decomposition is a nonconvex optimization problem, we prove it can be exactly reformulated as a convex optimization problem. This allows us to construct polynomial-time randomized algorithms for computing this decomposition and for solving low-rank tensor approximation problems. Among the consequences is that best rank-1 approximations of positive tensors can be computed in polynomial time. Our framework is next extended to the tensor completion problem, where noisy entries of a tensor are observed and then used to estimate missing entries. We provide a polynomial-time algorithm that for specific cases requires a polynomial (in tensor order) number of measurements, in contrast to existing approaches that require an exponential number of measurements. These algorithms are extended to exploit sparsity in the tensor to reduce the number of measurements needed. We conclude by providing a novel interpretation of statistical regression problems with categorical variables as tensor completion problems, and numerical examples with synthetic data and data from a bioengineered metabolic network show the improved performance of our approach on this problem.
OCAug 3, 2012
Statistical Results on Filtering and Epi-convergence for Learning-Based Model Predictive ControlAnil Aswani, Humberto Gonzalez, S. Shankar Sastry et al.
Learning-based model predictive control (LBMPC) is a technique that provides deterministic guarantees on robustness, while statistical identification tools are used to identify richer models of the system in order to improve performance. This technical note provides proofs that elucidate the reasons for our choice of measurement model, as well as giving proofs concerning the stochastic convergence of LBMPC. The first part of this note discusses simultaneous state estimation and statistical identification (or learning) of unmodeled dynamics, for dynamical systems that can be described by ordinary differential equations (ODE's). The second part provides proofs concerning the epi-convergence of different statistical estimators that can be used with the learning-based model predictive control (LBMPC) technique. In particular, we prove results on the statistical properties of a nonparametric estimator that we have designed to have the correct deterministic and stochastic properties for numerical implementation when used in conjunction with LBMPC.
OCApr 20, 2012
Energy-Efficient Building HVAC Control Using Hybrid System LBMPCAnil Aswani, Neal Master, Jay Taneja et al.
Improving the energy-efficiency of heating, ventilation, and air-conditioning (HVAC) systems has the potential to realize large economic and societal benefits. This paper concerns the system identification of a hybrid system model of a building-wide HVAC system and its subsequent control using a hybrid system formulation of learning-based model predictive control (LBMPC). Here, the learning refers to model updates to the hybrid system model that incorporate the heating effects due to occupancy, solar effects, outside air temperature (OAT), and equipment, in addition to integrator dynamics inherently present in low-level control. Though we make significant modeling simplifications, our corresponding controller that uses this model is able to experimentally achieve a large reduction in energy usage without any degradations in occupant comfort. It is in this way that we justify the modeling simplifications that we have made. We conclude by presenting results from experiments on our building HVAC testbed, which show an average of 1.5MWh of energy savings per day (p = 0.002) with a 95% confidence interval of 1.0MWh to 2.1MWh of energy savings.