SYJul 8, 2016
Sapo: Reachability Computation and Parameter Synthesis of Polynomial Dynamical SystemsTommaso Dreossi
Sapo is a C++ tool for the formal analysis of polynomial dynamical systems. Its main features are: 1) Reachability computation, i.e., the calculation of the set of states reachable from a set of initial conditions, and 2) Parameter synthesis, i.e., the refinement of a set of parameters so that the system satisfies a given specification. Sapo can represent reachable sets as unions of boxes, parallelotopes, or parallelotope bundles (symbolic representation of polytopes). Sets of parameters are represented with polytopes while specifications are formalized as Signal Temporal Logic (STL) formulas.
MLOct 14, 2021Code
IB-GAN: A Unified Approach for Multivariate Time Series Classification under Class ImbalanceGrace Deng, Cuize Han, Tommaso Dreossi et al.
Classification of large multivariate time series with strong class imbalance is an important task in real-world applications. Standard methods of class weights, oversampling, or parametric data augmentation do not always yield significant improvements for predicting minority classes of interest. Non-parametric data augmentation with Generative Adversarial Networks (GANs) offers a promising solution. We propose Imputation Balanced GAN (IB-GAN), a novel method that joins data augmentation and classification in a one-step process via an imputation-balancing approach. IB-GAN uses imputation and resampling techniques to generate higher quality samples from randomly masked vectors than from white noise, and augments classification through a class-balanced set of real and synthetic samples. Imputation hyperparameter $p_{miss}$ allows for regularization of classifier variability by tuning innovations introduced via generator imputation. IB-GAN is simple to train and model-agnostic, pairing any deep learning classifier with a generator-discriminator duo and resulting in higher accuracy for under-observed classes. Empirical experiments on open-source UCR data and proprietary 90K product dataset show significant performance gains against state-of-the-art parametric and GAN baselines.
68.7SPMay 6
Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter ConditioningTommaso Dreossi, Christopher M. Bryant, Hao Liu et al.
Inverse design of RF passive components from S-parameters is a high-dimensional, ill-posed problem, and prior generative approaches are limited to single-layer binary-metallization structures. This paper presents an inverse design approach that generates passive components from partial S-parameter inputs on an $8\times8$ mm board discretized at $64\times64$ pixels with sub-pixel grayscale metallization across 1-20 GHz. The framework generates two-layer copper layouts with vias, with hard physical constraints on feed locations enforced through annealed Langevin projection, flexible multi-modal conditioning on partial S-parameter specifications, port locations, dielectric properties, reference topology, and variable port placement. Candidate designs are generated in seconds, with surrogate-predicted S-parameters matching targets to within $0.77 \pm 1.28$ dB weighted mean absolute error. We validate the approach with two fabricated designs on RO4003C: a manufacturable alternative to a hairpin filter whose coupling gaps violate fabrication rules, and a combline bandpass filter designed from scratch given only target S-parameters.
LOJan 14, 2021
Analysis of E-commerce Ranking Signals via Signal Temporal LogicTommaso Dreossi, Giorgio Ballardin, Parth Gupta et al.
The timed position of documents retrieved by learning to rank models can be seen as signals. Signals carry useful information such as drop or rise of documents over time or user behaviors. In this work, we propose to use the logic formalism called Signal Temporal Logic (STL) to characterize document behaviors in ranking accordingly to the specified formulas. Our analysis shows that interesting document behaviors can be easily formalized and detected thanks to STL formulas. We validate our idea on a dataset of 100K product signals. Through the presented framework, we uncover interesting patterns, such as cold start, warm start, spikes, and inspect how they affect our learning to ranks models.
PLOct 13, 2020
Scenic: A Language for Scenario Specification and Data GenerationDaniel J. Fremont, Edward Kim, Tommaso Dreossi et al.
We propose a new probabilistic programming language for the design and analysis of cyber-physical systems, especially those based on machine learning. Specifically, we consider the problems of training a system to be robust to rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs, then sampling these to generate specialized training and test data. More generally, such languages can be used to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems like autonomous cars and robots, whose environment at any point in time is a 'scene', a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing scenarios that are distributions over scenes and the behaviors of their agents over time. As a probabilistic programming language, Scenic allows assigning distributions to features of the scene, as well as declaratively imposing hard and soft constraints over the scene. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic's domain-specific syntax. Finally, we apply Scenic in a case study on a convolutional neural network designed to detect cars in road images, improving its performance beyond that achieved by state-of-the-art synthetic data generation methods.
LGMar 24, 2019
A Formalization of Robustness for Deep Neural NetworksTommaso Dreossi, Shromona Ghosh, Alberto Sangiovanni-Vincentelli et al.
Deep neural networks have been shown to lack robustness to small input perturbations. The process of generating the perturbations that expose the lack of robustness of neural networks is known as adversarial input generation. This process depends on the goals and capabilities of the adversary, In this paper, we propose a unifying formalization of the adversarial input generation process from a formal methods perspective. We provide a definition of robustness that is general enough to capture different formulations. The expressiveness of our formalization is shown by modeling and comparing a variety of adversarial attack techniques.
AIFeb 12, 2019
VERIFAI: A Toolkit for the Design and Analysis of Artificial Intelligence-Based SystemsTommaso Dreossi, Daniel J. Fremont, Shromona Ghosh et al.
We present VERIFAI, a software toolkit for the formal design and analysis of systems that include artificial intelligence (AI) and machine learning (ML) components. VERIFAI particularly seeks to address challenges with applying formal methods to perception and ML components, including those based on neural networks, and to model and analyze system behavior in the presence of environment uncertainty. We describe the initial version of VERIFAI which centers on simulation guided by formal models and specifications. Several use cases are illustrated with examples, including temporal-logic falsification, model-based systematic fuzz testing, parameter synthesis, counterexample analysis, and data set augmentation.
PLSep 25, 2018
Scenic: A Language for Scenario Specification and Scene GenerationDaniel J. Fremont, Tommaso Dreossi, Shromona Ghosh et al.
We propose a new probabilistic programming language for the design and analysis of perception systems, especially those based on machine learning. Specifically, we consider the problems of training a perception system to handle rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs and sampling these to generate specialized training and test sets. More generally, such languages can be used for cyber-physical systems and robotics to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems like autonomous cars and robots, whose environment is a "scene", a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing "scenarios" that are distributions over scenes. As a probabilistic programming language, Scenic allows assigning distributions to features of the scene, as well as declaratively imposing hard and soft constraints over the scene. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic's domain-specific syntax. Finally, we apply Scenic in a case study on a convolutional neural network designed to detect cars in road images, improving its performance beyond that achieved by state-of-the-art synthetic data generation methods.
LGMay 17, 2018
Counterexample-Guided Data AugmentationTommaso Dreossi, Shromona Ghosh, Xiangyu Yue et al.
We present a novel framework for augmenting data sets for machine learning based on counterexamples. Counterexamples are misclassified examples that have important properties for retraining and improving the model. Key components of our framework include a counterexample generator, which produces data items that are misclassified by the model and error tables, a novel data structure that stores information pertaining to misclassifications. Error tables can be used to explain the model's vulnerabilities and are used to efficiently generate counterexamples for augmentation. We show the efficacy of the proposed framework by comparing it to classical augmentation techniques on a case study of object detection in autonomous driving based on deep neural networks.
LGApr 19, 2018
Semantic Adversarial Deep LearningTommaso Dreossi, Somesh Jha, Sanjit A. Seshia
Fueled by massive amounts of data, models produced by machine-learning (ML) algorithms, especially deep neural networks, are being used in diverse domains where trustworthiness is a concern, including automotive systems, finance, health care, natural language processing, and malware detection. Of particular concern is the use of ML algorithms in cyber-physical systems (CPS), such as self-driving cars and aviation, where an adversary can cause serious consequences. However, existing approaches to generating adversarial examples and devising robust ML algorithms mostly ignore the semantics and context of the overall system containing the ML component. For example, in an autonomous vehicle using deep learning for perception, not every adversarial example for the neural network might lead to a harmful consequence. Moreover, one may want to prioritize the search for adversarial examples towards those that significantly modify the desired semantics of the overall system. Along the same lines, existing algorithms for constructing robust ML algorithms ignore the specification of the overall system. In this paper, we argue that the semantics and specification of the overall system has a crucial role to play in this line of research. We present preliminary research results that support this claim.
CVAug 10, 2017
Systematic Testing of Convolutional Neural Networks for Autonomous DrivingTommaso Dreossi, Shromona Ghosh, Alberto Sangiovanni-Vincentelli et al.
We present a framework to systematically analyze convolutional neural networks (CNNs) used in classification of cars in autonomous vehicles. Our analysis procedure comprises an image generator that produces synthetic pictures by sampling in a lower dimension image modification subspace and a suite of visualization tools. The image generator produces images which can be used to test the CNN and hence expose its vulnerabilities. The presented framework can be used to extract insights of the CNN classifier, compare across classification models, or generate training and validation datasets.
SYMar 2, 2017
Compositional Falsification of Cyber-Physical Systems with Machine Learning ComponentsTommaso Dreossi, Alexandre Donzé, Sanjit A. Seshia
Cyber-physical systems (CPS), such as automotive systems, are starting to include sophisticated machine learning (ML) components. Their correctness, therefore, depends on properties of the inner ML modules. While learning algorithms aim to generalize from examples, they are only as good as the examples provided, and recent efforts have shown that they can produce inconsistent output under small adversarial perturbations. This raises the question: can the output from learning components can lead to a failure of the entire CPS? In this work, we address this question by formulating it as a problem of falsifying signal temporal logic (STL) specifications for CPS with ML components. We propose a compositional falsification framework where a temporal logic falsifier and a machine learning analyzer cooperate with the aim of finding falsifying executions of the considered model. The efficacy of the proposed technique is shown on an automatic emergency braking system model with a perception component based on deep neural networks.