LGMar 11, 2022Code
Enhancing Adversarial Training with Second-Order Statistics of WeightsGaojie Jin, Xinping Yi, Wei Huang et al.
Adversarial training has been shown to be one of the most effective approaches to improve the robustness of deep neural networks. It is formalized as a min-max optimization over model weights and adversarial perturbations, where the weights can be optimized through gradient descent methods like SGD. In this paper, we show that treating model weights as random variables allows for enhancing adversarial training through \textbf{S}econd-Order \textbf{S}tatistics \textbf{O}ptimization (S$^2$O) with respect to the weights. By relaxing a common (but unrealistic) assumption of previous PAC-Bayesian frameworks that all weights are statistically independent, we derive an improved PAC-Bayesian adversarial generalization bound, which suggests that optimizing second-order statistics of weights can effectively tighten the bound. In addition to this theoretical insight, we conduct an extensive set of experiments, which show that S$^2$O not only improves the robustness and generalization of the trained neural networks when used in isolation, but also integrates easily in state-of-the-art adversarial training techniques like TRADES, AWP, MART, and AVMixup, leading to a measurable improvement of these techniques. The code is available at \url{https://github.com/Alexkael/S2O}.
LGMar 1Code
S2O: Enhancing Adversarial Training with Second-Order Statistics of WeightsGaojie Jin, Xinping Yi, Wei Huang et al.
Adversarial training has emerged as a highly effective way to improve the robustness of deep neural networks (DNNs). It is typically conceptualized as a min-max optimization problem over model weights and adversarial perturbations, where the weights are optimized using gradient descent methods, such as SGD. In this paper, we propose a novel approach by treating model weights as random variables, which paves the way for enhancing adversarial training through \textbf{S}econd-Order \textbf{S}tatistics \textbf{O}ptimization (S$^2$O) over model weights. We challenge and relax a prevalent, yet often unrealistic, assumption in prior PAC-Bayesian frameworks: the statistical independence of weights. From this relaxation, we derive an improved PAC-Bayesian robust generalization bound. Our theoretical developments suggest that optimizing the second-order statistics of weights can substantially tighten this bound. We complement this theoretical insight by conducting an extensive set of experiments that demonstrate that S$^2$O not only enhances the robustness and generalization of neural networks when used in isolation, but also seamlessly augments other state-of-the-art adversarial training techniques. The code is available at https://github.com/Alexkael/S2O.
LGAug 14, 2023
Omega-Regular Reward MachinesErnst Moritz Hahn, Mateo Perez, Sven Schewe et al.
Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and omega-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces omega-regular reward machines, which integrate reward machines with omega-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute epsilon-optimal strategies against omega-egular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.
GTJul 8, 2011
Efficient Approximation of Optimal Control for Markov GamesJohn Fearnley, Markus Rabe, Sven Schewe et al.
We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs) and games (CTMGs). Existing techniques for this problem use discretisation techniques to break time into discrete intervals, and optimal control is approximated for each interval separately. Current techniques provide an accuracy of O(ε^2) on each interval, which leads to an infeasibly large number of intervals. We propose a sequence of approximations that achieve accuracies of O(ε^3), O(ε^4), and O(ε^5), that allow us to drastically reduce the number of intervals that are considered. For CTMDPs, the performance of the resulting algorithms is comparable to the heuristic approach given by Buckholz and Schulz, while also being theoretically justified. All of our results generalise to CTMGs, where our results yield the first practically implementable algorithms for this problem. We also provide positional strategies for both players that achieve similar error bounds.
LONov 14, 2011
Practical Distributed Control SynthesisDoron Peled, Sven Schewe
Classic distributed control problems have an interesting dichotomy: they are either trivial or undecidable. If we allow the controllers to fully synchronize, then synthesis is trivial. In this case, controllers can effectively act as a single controller with complete information, resulting in a trivial control problem. But when we eliminate communication and restrict the supervisors to locally available information, the problem becomes undecidable. In this paper we argue in favor of a middle way. Communication is, in most applications, expensive, and should hence be minimized. We therefore study a solution that tries to communicate only scarcely and, while allowing communication in order to make joint decision, favors local decisions over joint decisions that require communication.
SYOct 9, 2012
Rapid Recovery for Systems with Scarce FaultsChung-Hao Huang, Doron Peled, Sven Schewe et al.
Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.
LGJun 23, 2022
Recursive Reinforcement LearningErnst Moritz Hahn, Mateo Perez, Sven Schewe et al.
Recursion is the fundamental paradigm to finitely describe potentially infinite objects. As state-of-the-art reinforcement learning (RL) algorithms cannot directly reason about recursion, they must rely on the practitioner's ingenuity in designing a suitable "flat" representation of the environment. The resulting manual feature constructions and approximations are cumbersome and error-prone; their lack of transparency hampers scalability. To overcome these challenges, we develop RL algorithms capable of computing optimal policies in environments described as a collection of Markov decision processes (MDPs) that can recursively invoke one another. Each constituent MDP is characterized by several entry and exit points that correspond to input and output values of these invocations. These recursive MDPs (or RMDPs) are expressively equivalent to probabilistic pushdown systems (with call-stack playing the role of the pushdown stack), and can model probabilistic programs with recursive procedural calls. We introduce Recursive Q-learning -- a model-free RL algorithm for RMDPs -- and prove that it converges for finite, single-exit and deterministic multi-exit RMDPs under mild assumptions.
FLMay 6, 2022
Alternating Good-for-MDP AutomataErnst Moritz Hahn, Mateo Perez, Sven Schewe et al.
When omega-regular objectives were first proposed in model-free reinforcement learning (RL) for controlling MDPs, deterministic Rabin automata were used in an attempt to provide a direct translation from their transitions to scalar values. While these translations failed, it has turned out that it is possible to repair them by using good-for-MDPs (GFM) Büchi automata instead. These are nondeterministic Büchi automata with a restricted type of nondeterminism, albeit not as restricted as in good-for-games automata. Indeed, deterministic Rabin automata have a pretty straightforward translation to such GFM automata, which is bi-linear in the number of states and pairs. Interestingly, the same cannot be said for deterministic Streett automata: a translation to nondeterministic Rabin or Büchi automata comes at an exponential cost, even without requiring the target automaton to be good-for-MDPs. Do we have to pay more than that to obtain a good-for-MDP automaton? The surprising answer is that we have to pay significantly less when we instead expand the good-for-MDP property to alternating automata: like the nondeterministic GFM automata obtained from deterministic Rabin automata, the alternating good-for-MDP automata we produce from deterministic Streett automata are bi-linear in the the size of the deterministic automaton and its index, and can therefore be exponentially more succinct than minimal nondeterministic Büchi automata.
FLMar 31
Resolving Nondeterminism by ChanceSoumyajit Paul, David Purser, Sven Schewe et al.
History-deterministic automata are those in which nondeterministic choices can be correctly resolved stepwise: there is a strategy to select a continuation of a run given the next input letter so that if the overall input word admits some accepting run, then the constructed run is also accepting. Motivated by checking qualitative properties in probabilistic verification, we consider the setting where the resolver strategy can randomize and only needs to succeed with lower-bounded probability. We study the expressiveness of such stochastically-resolvable automata as well as consider the decision questions of whether a given automaton has this property. In particular, we show that it is undecidable to check if a given NFA is $λ$-stochastically resolvable. This problem is decidable for finitely-ambiguous automata. We also present complexity upper and lower bounds for several well-studied classes of automata for which this problem remains decidable.
FLNov 12, 2025
Good-for-MDP State Reduction for Stochastic LTL PlanningChristoph Weinhuber, Giuseppe De Giacomo, Yong Li et al.
We study stochastic planning problems in Markov Decision Processes (MDPs) with goals specified in Linear Temporal Logic (LTL). The state-of-the-art approach transforms LTL formulas into good-for-MDP (GFM) automata, which feature a restricted form of nondeterminism. These automata are then composed with the MDP, allowing the agent to resolve the nondeterminism during policy synthesis. A major factor affecting the scalability of this approach is the size of the generated automata. In this paper, we propose a novel GFM state-space reduction technique that significantly reduces the number of automata states. Our method employs a sophisticated chain of transformations, leveraging recent advances in good-for-games minimisation developed for adversarial settings. In addition to our theoretical contributions, we present empirical results demonstrating the practical effectiveness of our state-reduction technique. Furthermore, we introduce a direct construction method for formulas of the form $\mathsf{G}\mathsf{F}\varphi$, where $\varphi$ is a co-safety formula. This construction is provably single-exponential in the worst case, in contrast to the general doubly-exponential complexity. Our experiments confirm the scalability advantages of this specialised construction.
AIMay 12
Optimal LTLf SynthesisYujian Cao, Sven Schewe, Qiyi Tang et al.
Strategy synthesis typically follows an all-or-nothing paradigm, returning unrealisable whenever a specification cannot be guaranteed in an uncertain environment. In this paper, we introduce optimal LTLf synthesis, where the goal is to realise as many objectives as possible from a given specification consisting of multiple objectives, especially for the case that they are not all jointly realisable. We first consider max-guarantee synthesis, which commits to a maximal set of objectives that we can a priori guarantee to realise. We then introduce max-observation synthesis, which maximises a posteriori realised objectives that may be incomparable on different executions. Finally, we present incremental max-observation synthesis, which further improves strategies by exploiting opportunities for stronger guarantees when they arise during an execution. Experimental results show that different variations of optimal synthesis scale broadly equally well, solving a large fraction of the benchmark instances within the given timeout, demonstrating the practical feasibility of the approach.
GTMay 12
Social Welfare under Heterogeneous Time PreferencesSarvin Bahmani, Soumyajit Paul, Sven Schewe et al.
In several socioeconomic-critical decision-making settings, such as fair resource allocation, climate policy, or AI alignment, multiple principals interact within a common arena. While it is well established that these principals may have differing preferences, decision-making under heterogeneous time preferences remains relatively unexplored. In particular, principals may weigh future outcomes differently and may derive distinct utilities from the same decisions. Motivated by such scenarios, we introduce the notion of heterogeneous time preferences in MDPs, where multiple principals possess distinct reward functions and apply different discount factors to future rewards. To compute meaningful decisions in such settings, an AI agent must rely on a notion of optimality that accounts for the preferences of all principals. We adopt a utilitarian notion of social welfare, defined as the sum of utilities accrued to all principals, and study the synthesis of agent strategies that maximise this welfare. Under heterogeneous time preferences, we show that optimal strategies are no longer positional, even when all principals receive identical rewards. Nevertheless, optimal strategies remain structurally simple: they can be realized as pure finite-memory counting strategies, require only polynomial memory in the system size, and can be synthesized in polynomial time. On the other hand, we show that deciding threshold questions for optimal positional strategies is NP-hard, exposing a poor trade-off: insisting on positional simplicity neither makes synthesis tractable nor preserves social welfare.
LGMar 5, 2021Code
Abstraction and Symbolic Execution of Deep Neural Networks with Bayesian Approximation of Hidden FeaturesNicolas Berthier, Amany Alshareef, James Sharp et al.
Intensive research has been conducted on the verification and validation of deep neural networks (DNNs), aiming to understand if, and how, DNNs can be applied to safety critical applications. However, existing verification and validation techniques are limited by their scalability, over both the size of the DNN and the size of the dataset. In this paper, we propose a novel abstraction method which abstracts a DNN and a dataset into a Bayesian network (BN). We make use of dimensionality reduction techniques to identify hidden features that have been learned by hidden layers of the DNN, and associate each hidden feature with a node of the BN. On this BN, we can conduct probabilistic inference to understand the behaviours of the DNN processing data. More importantly, we can derive a runtime monitoring approach to detect in operational time rare inputs and covariate shift of the input data. We can also adapt existing structural coverage-guided testing techniques (i.e., based on low-level elements of the DNN such as neurons), in order to generate test cases that better exercise hidden features. We implement and evaluate the BN abstraction technique using our DeepConcolic tool available at https://github.com/TrustAI/DeepConcolic.
LODec 14, 2023
Omega-Regular Decision ProcessesErnst Moritz Hahn, Mateo Perez, Sven Schewe et al.
Regular decision processes (RDPs) are a subclass of non-Markovian decision processes where the transition and reward functions are guarded by some regular property of the past (a lookback). While RDPs enable intuitive and succinct representation of non-Markovian decision processes, their expressive power coincides with finite-state Markov decision processes (MDPs). We introduce omega-regular decision processes (ODPs) where the non-Markovian aspect of the transition and reward functions are extended to an omega-regular lookahead over the system evolution. Semantically, these lookaheads can be considered as promises made by the decision maker or the learning agent about her future behavior. In particular, we assume that, if the promised lookaheads are not met, then the payoff to the decision maker is $\bot$ (least desirable payoff), overriding any rewards collected by the decision maker. We enable optimization and learning for ODPs under the discounted-reward objective by reducing them to lexicographic optimization and learning over finite MDPs. We present experimental results demonstrating the effectiveness of the proposed reduction.
LGJan 23, 2022
Weight Expansion: A New Perspective on Dropout and GeneralizationGaojie Jin, Xinping Yi, Pengfei Yang et al.
While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of \emph{weight expansion}, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an \emph{indicator} of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp.\ contraction), and found that they generally lead to an increased (resp.\ decreased) generalization ability. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion.
SENov 30, 2021
Reliability Assessment and Safety Arguments for Machine Learning Components in System AssuranceYi Dong, Wei Huang, Vibhav Bharti et al.
The increasing use of Machine Learning (ML) components embedded in autonomous systems -- so-called Learning-Enabled Systems (LESs) -- has resulted in the pressing need to assure their functional safety. As for traditional functional safety, the emerging consensus within both, industry and academia, is to use assurance cases for this purpose. Typically assurance cases support claims of reliability in support of safety, and can be viewed as a structured way of organising arguments and evidence generated from safety analysis and reliability modelling activities. While such assurance activities are traditionally guided by consensus-based standards developed from vast engineering experience, LESs pose new challenges in safety-critical application due to the characteristics and design of ML models. In this article, we first present an overall assurance framework for LESs with an emphasis on quantitative aspects, e.g., breaking down system-level safety targets to component-level requirements and supporting claims stated in reliability metrics. We then introduce a novel model-agnostic Reliability Assessment Model (RAM) for ML classifiers that utilises the operational profile and robustness verification evidence. We discuss the model assumptions and the inherent challenges of assessing ML reliability uncovered by our RAM and propose solutions to practical use. Probabilistic safety argument templates at the lower ML component-level are also developed based on the RAM. Finally, to evaluate and demonstrate our methods, we not only conduct experiments on synthetic/benchmark datasets but also scope our methods with case studies on simulated Autonomous Underwater Vehicles and physical Unmanned Ground Vehicles.
LGJun 16, 2021
Mungojerrie: Reinforcement Learning of Linear-Time ObjectivesErnst Moritz Hahn, Mateo Perez, Sven Schewe et al.
Reinforcement learning synthesizes controllers without prior knowledge of the system. At each timestep, a reward is given. The controllers optimize the discounted sum of these rewards. Applying this class of algorithms requires designing a reward scheme, which is typically done manually. The designer must ensure that their intent is accurately captured. This may not be trivial, and is prone to error. An alternative to this manual programming, akin to programming directly in assembly, is to specify the objective in a formal language and have it "compiled" to a reward scheme. Mungojerrie (https://plv.colorado.edu/mungojerrie/) is a tool for testing reward schemes for $ω$-regular objectives on finite models. The tool contains reinforcement learning algorithms and a probabilistic model checker. Mungojerrie supports models specified in PRISM and $ω$-automata specified in HOA.
LGJun 12, 2021
Model-free Reinforcement Learning for Branching Markov Decision ProcessesErnst Moritz Hahn, Mateo Perez, Sven Schewe et al.
We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.
LGJun 2, 2021
Assessing the Reliability of Deep Learning Classifiers Through Robustness Evaluation and Operational ProfilesXingyu Zhao, Wei Huang, Alec Banks et al.
The utilisation of Deep Learning (DL) is advancing into increasingly more sophisticated applications. While it shows great potential to provide transformational capabilities, DL also raises new challenges regarding its reliability in critical functions. In this paper, we present a model-agnostic reliability assessment method for DL classifiers, based on evidence from robustness evaluation and the operational profile (OP) of a given application. We partition the input space into small cells and then "assemble" their robustness (to the ground truth) according to the OP, where estimators on the cells' robustness and OPs are provided. Reliability estimates in terms of the probability of misclassification per input (pmi) can be derived together with confidence levels. A prototype tool is demonstrated with simplified case studies. Model assumptions and extension to real-world applications are also discussed. While our model easily uncovers the inherent difficulties of assessing the DL dependability (e.g. lack of data with ground truth and scalability issues), we provide preliminary/compromised solutions to advance in this research direction.
SEApr 13, 2021
Detecting Operational Adversarial Examples for Reliable Deep LearningXingyu Zhao, Wei Huang, Sven Schewe et al.
The utilisation of Deep Learning (DL) raises new challenges regarding its dependability in critical applications. Sound verification and validation methods are needed to assure the safe and reliable use of DL. However, state-of-the-art debug testing methods on DL that aim at detecting adversarial examples (AEs) ignore the operational profile, which statistically depicts the software's future operational use. This may lead to very modest effectiveness on improving the software's delivered reliability, as the testing budget is likely to be wasted on detecting AEs that are unrealistic or encountered very rarely in real-life operation. In this paper, we first present the novel notion of "operational AEs" which are AEs that have relatively high chance to be seen in future operation. Then an initial design of a new DL testing method to efficiently detect "operational AEs" is provided, as well as some insights on our prospective research plan.
LGOct 12, 2020
How does Weight Correlation Affect the Generalisation Ability of Deep Neural NetworksGaojie Jin, Xinping Yi, Liang Zhang et al.
This paper studies the novel concept of weight correlation in deep neural networks and discusses its impact on the networks' generalisation ability. For fully-connected layers, the weight correlation is defined as the average cosine similarity between weight vectors of neurons, and for convolutional layers, the weight correlation is defined as the cosine similarity between filter matrices. Theoretically, we show that, weight correlation can, and should, be incorporated into the PAC Bayesian framework for the generalisation of neural networks, and the resulting generalisation bound is monotonic with respect to the weight correlation. We formulate a new complexity measure, which lifts the PAC Bayes measure with weight correlation, and experimentally confirm that it is able to rank the generalisation errors of a set of networks more precisely than existing measures. More importantly, we develop a new regulariser for training, and provide extensive experiments that show that the generalisation error can be greatly reduced with our novel approach.
LOSep 26, 2018
Omega-Regular Objectives in Model-Free Reinforcement LearningErnst Moritz Hahn, Mateo Perez, Sven Schewe et al.
We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almost- sure reachability problem and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of ω-regular properties into limit- deterministic Buechi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.
LOJun 29, 2017
Optimal Control for Multi-Mode Systems with Discrete CostsMahmoud A. A. Mousa, Sven Schewe, Dominik Wojtczak
This paper studies optimal time-bounded control in multi-mode systems with discrete costs. Multi-mode systems are an important subclass of linear hybrid systems, in which there are no guards on transitions and all invariants are global. Each state has a continuous cost attached to it, which is linear in the sojourn time, while a discrete cost is attached to each transition taken. We show that an optimal control for this model can be computed in NEXPTIME and approximated in PSPACE. We also show that the one-dimensional case is simpler: although the problem is NP-complete (and in LOGSPACE for an infinite time horizon), we develop an FPTAS for finding an approximate solution.
SEDec 21, 2013
Coverage Games for Testing Nondeterministic SystemsFarn Wang, Jung-Hsuan Wu, Sven Schewe et al.
Modern software systems may exhibit a nondeterministic behavior due to many unpredictable factors. In this work, we propose the node coverage game, a two player turn-based game played on a finite game graph, as a formalization of the problem to test such systems. Each node in the graph represents a {\em functional equivalence class} of the software under test (SUT). One player, the tester, wants to maximize the node coverage, measured by the number of nodes visited when exploring the game graphs, while his opponent, the SUT, wants to minimize it. An optimal test would maximize the cover, and it is an interesting problem to find the maximal number of nodes that the tester can guarantee to visit, irrespective of the responses of the SUT. We show that the decision problem of whether the guarantee is less than a given number is NP-complete. Then we present techniques for testing nondeterministic SUTs with existing test suites for deterministic models. Finally, we report our implementation and experiments.
LOJul 3, 2012
Proceedings First Workshop on SynthesisDoron Peled, Sven Schewe
This volume contains the proceedings of the First Workshop on Synthesis (SYNT 2012). The workshop is held is held in Berkeley, California, on June 6th and 7th, as a satellite event to the 24th International Conference on Computer Aided Verification (CAV 2012). SYNT aims at bringing together and providing an open platform for researchers interested in synthesis.