LGNov 16, 2021
Inverse-Weighted Survival GamesXintian Han, Mark Goldstein, Aahlad Puli et al.
Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverse-weighting by the censoring distribution. However, estimating the censoring model under these metrics requires inverse-weighting by the failure distribution. The objective for each model requires the other, but neither are known. To resolve this dilemma, we introduce Inverse-Weighted Survival Games. In these games, objectives for each model are built from re-weighted estimates featuring the other model, where the latter is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and critically-ill patient data.
CLOct 1, 2020
Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role LabelingYan Shvartzshnaider, Ananth Balashankar, Vikas Patidar et al.
This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity, an established social theory framework for reasoning about privacy norms. Privacy policies, written by lawyers, are lengthy and often comprise incomplete and vague statements. In this paper, we show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem and provide poor precision and recall. We describe 4 different types of conventional methods that can be partially adapted to address the parameter extraction task with varying degrees of success: Hidden Markov Models, BERT fine-tuned models, Dependency Type Parsing (DP) and Semantic Role Labeling (SRL). Based on a detailed evaluation across 36 real-world privacy policies of major enterprises, we demonstrate that a solution combining syntactic DP coupled with type-specific SRL tasks provides the highest accuracy for retrieving contextual privacy parameters from privacy statements. We also observe that incorporating domain-specific knowledge is critical to achieving high precision and recall, thus inspiring new NLP research to address this important problem in the privacy domain.
SEJan 29, 2020
TarTar: A Timed Automata Repair ToolMartin Koelbl, Stefan Leue, Thomas Wies
We present TarTar, an automatic repair analysis tool that, given a timed diagnostic trace (TDT) obtained during the model checking of a timed automaton model, suggests possible syntactic repairs of the analyzed model. The suggested repairs include modified values for clock bounds in location invariants and transition guards, adding or removing clock resets, etc. The proposed repairs are guaranteed to eliminate executability of the given TDT, while preserving the overall functional behavior of the system. We give insights into the design and architecture of TarTar, and show that it can successfully repair 69% of the seeded errors in system models taken from a diverse suite of case studies.
CRNov 7, 2017
The VACCINE Framework for Building DLP SystemsYan Shvartzshnaider, Zvonimir Pavlinovic, Thomas Wies et al.
Conventional Data Leakage Prevention (DLP) systems suffer from the following major drawback: Privacy policies that define what constitutes data leakage cannot be seamlessly defined and enforced across heterogeneous forms of communication. Administrators have the dual burden of: (1) manually self-interpreting policies from handbooks to specify rules (which is error-prone); (2) extracting relevant information flows from heterogeneous communication protocols and enforcing policies to determine which flows should be admissible. To address these issues, we present the Verifiable and ACtionable Contextual Integrity Norms Engine (VACCINE), a framework for building adaptable and modular DLP systems. VACCINE relies on (1) the theory of contextual integrity to provide an abstraction layer suitable for specifying reusable protocol-agnostic leakage prevention rules and (2) programming language techniques to check these rules against correctness properties and to enforce them faithfully within a DLP system implementation. We applied VACCINE to the Family Educational Rights and Privacy Act and Enron Corporation privacy regulations. We show that by using contextual integrity in conjunction with verification techniques, we can effectively create reusable privacy rules with specific correctness guarantees, and check the integrity of information flows against these rules. Our experiments in emulated enterprise settings indicate that VACCINE improves over current DLP system design approaches and can be deployed in enterprises involving tens of thousands of actors.
SEAug 30, 2016
Error Invariants for Concurrent TracesAndreas Holzer, Daniel Schwartz-Narbonne, Mitra Tabaei Befrouei et al.
Error invariants are assertions that over-approximate the reachable program states at a given position in an error trace while only capturing states that will still lead to failure if execution of the trace is continued from that position. Such assertions reflect the effect of statements that are involved in the root cause of an error and its propagation, enabling slicing of statements that do not contribute to the error. Previous work on error invariants focused on sequential programs. We generalize error invariants to concurrent traces by augmenting them with additional information about hazards such as write-after-write events, which are often involved in race conditions and atomicity violations. By providing the option to include varying levels of details in error invariants-such as hazards and branching information-our approach allows the programmer to systematically analyze individual aspects of an error trace.We have implemented a hazard-sensitive slicing tool for concurrent traces based on error invariants and evaluated it on benchmarks covering a broad range of real-world concurrency bugs. Hazard-sensitive slicing significantly reduced the length of the considered traces and still maintained the root causes of the concurrency bugs.
PLJan 20, 2015
Learning Invariants using Decision TreesSiddharth Krishna, Christian Puhrsch, Thomas Wies
The problem of inferring an inductive invariant for verifying program safety can be formulated in terms of binary classification. This is a standard problem in machine learning: given a sample of good and bad points, one is asked to find a classifier that generalizes from the sample and separates the two sets. Here, the good points are the reachable states of the program, and the bad points are those that reach a safety property violation. Thus, a learned classifier is a candidate invariant. In this paper, we propose a new algorithm that uses decision trees to learn candidate invariants in the form of arbitrary Boolean combinations of numerical inequalities. We have used our algorithm to verify C programs taken from the literature. The algorithm is able to infer safe invariants for a range of challenging benchmarks and compares favorably to other ML-based invariant inference techniques. In particular, it scales well to large sample sets.
SENov 20, 2013
Dynamic Package Interfaces - Extended VersionShahram Esmaeilsabzali, Rupak Majumdar, Thomas Wies et al.
A hallmark of object-oriented programming is the ability to perform computation through a set of interacting objects. A common manifestation of this style is the notion of a package, which groups a set of commonly used classes together. A challenge in using a package is to ensure that a client follows the implicit protocol of the package when calling its methods. Violations of the protocol can cause a runtime error or latent invariant violations. These protocols can extend across different, potentially unboundedly many, objects, and are specified informally in the documentation. As a result, ensuring that a client does not violate the protocol is hard. We introduce dynamic package interfaces (DPI), a formalism to explicitly capture the protocol of a package. The DPI of a package is a finite set of rules that together specify how any set of interacting objects of the package can evolve through method calls and under what conditions an error can happen. We have developed a dynamic tool that automatically computes an approximation of the DPI of a package, given a set of abstraction predicates. A key property of DPI is that the unbounded number of configurations of objects of a package are summarized finitely in an abstract domain. This uses the observation that many packages behave monotonically: the semantics of a method call over a configuration does not essentially change if more objects are added to the configuration. We have exploited monotonicity and have devised heuristics to obtain succinct yet general DPIs. We have used our tool to compute DPIs for several commonly used Java packages with complex protocols, such as JDBC, HashSet, and ArrayList.
SENov 19, 2013
A Notion of Dynamic Interface for Depth-Bounded Object-Oriented PackagesShahram Esmaeilsabzali, Rupak Majumdar, Thomas Wies et al.
Programmers using software components have to follow protocols that specify when it is legal to call particular methods with particular arguments. For example, one cannot use an iterator over a set once the set has been changed directly or through another iterator. We formalize the notion of dynamic package interfaces (DPI), which generalize state-machine interfaces for single objects, and give an algorithm to statically compute a sound abstraction of a DPI. States of a DPI represent (unbounded) sets of heap configurations and edges represent the effects of method calls on the heap. We introduce a novel heap abstract domain based on depth-bounded systems to deal with potentially unboundedly many objects and the references among them. We have implemented our algorithm and show that it is effective in computing representations of common patterns of package usage, such as relationships between viewer and label, container and iterator, and JDBC statements and cursors.