Alexander Pretschner

h-index42

20papers

339citations

Novelty37%

AI Score25

Ranked #167,239 of 194,257 authors (top 86%)#2,012 in SE (top 66%)

20 Papers

6.9SEMar 7, 2019Code

Compositional Fuzzing Aided by Targeted Symbolic Execution

Saahil Ognawala, Fabian Kilger, Alexander Pretschner

Guided fuzzing has, in recent years, been able to uncover many new vulnerabilities in real-world software due to its fast input mutation strategies guided by path-coverage. However, most fuzzers are unable to achieve high coverage in deeper parts of programs. Moreover, fuzzers heavily rely on the diversity of the seed inputs, often manually provided, to be able to produce meaningful results. In this paper, we present Wildfire, a novel open-source compositional fuzzing framework. Wildfire finds vulnerabilities by fuzzing isolated functions in a C-program and, then, using targeted symbolic execution it determines the feasibility of exploitation for these vulnerabilities. Based on our evaluation of 23 open-source programs (nearly 1 million LOC), we show that Wildfire, as a result of the increased coverage, finds more true-positives than baseline symbolic execution and fuzzing tools, as well as state-of-the-art coverage-guided tools, in only 10% of the analysis time taken by them. Additionally, Wildfire finds many other potential vulnerabilities whose feasibility can be determined compositionally to confirm if they are false-positives. Wildfire could also reproduce all of the known vulnerabilities and found several previously-unknown vulnerabilities in three open-source libraries.

10.8SEJul 24, 2018Code

Automatically Assessing Vulnerabilities Discovered by Compositional Analysis

Saahil Ognawala, Ricardo Nales Amato, Alexander Pretschner et al.

Testing is the most widely employed method to find vulnerabilities in real-world software programs. Compositional analysis, based on symbolic execution, is an automated testing method to find vulnerabilities in medium- to large-scale programs consisting of many interacting components. However, existing compositional analysis frameworks do not assess the severity of reported vulnerabilities. In this paper, we present a framework to analyze vulnerabilities discovered by an existing compositional analysis tool and assign CVSS3 (Common Vulnerability Scoring System v3.0) scores to them, based on various heuristics such as interaction with related components, ease of reachability, complexity of design and likelihood of accepting unsanitized input. By analyzing vulnerabilities reported with CVSS3 scores in the past, we train simple machine learning models. By presenting our interactive framework to developers of popular open-source software and other security experts, we gather feedback on our trained models and further improve the features to increase the accuracy of our predictions. By providing qualitative (based on community feedback) and quantitative (based on prediction accuracy) evidence from 21 open-source programs, we show that our severity prediction framework can effectively assist developers with assessing vulnerabilities.

15.3SENov 26, 2017

Improving Function Coverage with Munch: A Hybrid Fuzzing and Directed Symbolic Execution Approach

Saahil Ognawala, Thomas Hutzelmann, Eirini Psallida et al.

Fuzzing and symbolic execution are popular techniques for finding vulnerabilities and generating test-cases for programs. Fuzzing, a blackbox method that mutates seed input values, is generally incapable of generating diverse inputs that exercise all paths in the program. Due to the path-explosion problem and dependence on SMT solvers, symbolic execution may also not achieve high path coverage. A hybrid technique involving fuzzing and symbolic execution may achieve better function coverage than fuzzing or symbolic execution alone. In this paper, we present Munch, an open source framework implementing two hybrid techniques based on fuzzing and symbolic execution. We empirically show using nine large open-source programs that overall, Munch achieves higher (in-depth) function coverage than symbolic execution or fuzzing alone. Using metrics based on total analyses time and number of queries issued to the SMT solver, we also show that Munch is more efficient at achieving better function coverage.

3.6SEJul 15, 2021

Empowered and Embedded: Ethics and Agile Processes

Niina Zuber, Severin Kacianka, Jan Gogoll et al.

In this article we focus on the structural aspects of the development of ethical software, and argue that ethical considerations need to be embedded into the (agile) software development process. In fact, we claim that agile processes of software development lend themselves specifically well for this endeavour. First, we contend that ethical evaluations need to go beyond the use of software products and include an evaluation of the software itself. This implies that software engineers influence peoples' lives through the features of their designed products. Embedded values are thus approached best by software engineers themselves. Therefore, we put emphasis on the possibility to implement ethical deliberations in already existing and well established agile software development processes. Our approach relies on software engineers making their own judgments throughout the entire development process to ensure that technical features and ethical evaluation can be addressed adequately to transport and foster desirable values and norms. We argue that agile software development processes may help the implementation of ethical deliberation for five reasons: 1) agile methods are widely spread, 2) their emphasis on flat hierarchies promotes independent thinking, 3) their reliance on existing team structures serve as an incubator for deliberation, 4) agile development enhances object-focused techno-ethical realism, and, finally, 5) agile structures provide a salient endpoint to deliberation.

8.6SEMar 19, 2021Code

Trustworthy Transparency by Design

Valentin Zieglmeier, Alexander Pretschner

Individuals lack oversight over systems that process their data. This can lead to discrimination and hidden biases that are hard to uncover. Recent data protection legislation tries to tackle these issues, but it is inadequate. It does not prevent data misusage while stifling sensible use cases for data. We think the conflict between data protection and increasingly data-based systems should be solved differently. When access to data is given, all usages should be made transparent to the data subjects. This enables their data sovereignty, allowing individuals to benefit from sensible data usage while addressing potentially harmful consequences of data misusage. We contribute to this with a technical concept and an empirical evaluation. First, we conceptualize a transparency framework for software design, incorporating research on user trust and experience. Second, we instantiate and empirically evaluate the framework in a focus group study over three months, centering on the user perspective. Our transparency framework enables developing software that incorporates transparency in its design. The evaluation shows that it satisfies usability and trustworthiness requirements. The provided transparency is experienced as beneficial and participants feel empowered by it. This shows that our framework enables Trustworthy Transparency by Design.

15.7SEJan 20, 2021

Designing Accountable Systems

Severin Kacianka, Alexander Pretschner

Accountability is an often called for property of technical systems. It is a requirement for algorithmic decision systems, autonomous cyber-physical systems, and for software systems in general. As a concept, accountability goes back to the early history of Liberalism and is suggested as a tool to limit the use of power. This long history has also given us many, often slightly differing, definitions of accountability. The problem that software developers now face is to understand what accountability means for their systems and how to reflect it in a system's design. To enable the rigorous study of accountability in a system, we need models that are suitable for capturing such a varied concept. In this paper, we present a method to express and compare different definitions of accountability using Structural Causal Models. We show how these models can be used to evaluate a system's design and present a small use case based on an autonomous car.

10.2CRJul 1, 2020Code

Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection

Aleieldin Salem, Sebastian Banescu, Alexander Pretschner

The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 antiviral scanners. Unfortunately, there are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies (e.g., if ten or more scanners deem an app malicious, it is considered malicious). While some of the utilized thresholds may be able to accurately approximate the ground truths of apps, the fact that VirusTotal changes the set and versions of the scanners it uses makes such thresholds unsustainable over time. We implemented a method, Maat, that tackles these issues of standardization and sustainability by automatically generating a Machine Learning (ML)-based labeling scheme, which outperforms threshold-based labeling strategies. Using the VirusTotal scan reports of 53K Android apps that span one year, we evaluated the applicability of Maat's ML-based labeling strategies by comparing their performance against threshold-based strategies. We found that such ML-based strategies (a) can accurately and consistently label apps based on their VirusTotal scan reports, and (b) contribute to training ML-based detection methods that are more effective at classifying out-of-sample apps than their threshold-based counterparts.

8.4AIJun 5, 2020

From Checking to Inference: Actual Causality Computations as Optimization Problems

Amjad Ibrahim, Alexander Pretschner

Actual causality is increasingly well understood. Recent formal approaches, proposed by Halpern and Pearl, have made this concept mature enough to be amenable to automated reasoning. Actual causality is especially vital for building accountable, explainable systems. Among other reasons, causality reasoning is computationally hard due to the requirements of counterfactuality and the minimality of causes. Previous approaches presented either inefficient or restricted, and domain-specific, solutions to the problem of automating causality reasoning. In this paper, we present a novel approach to formulate different notions of causal reasoning, over binary acyclic models, as optimization problems, based on quantifiable notions within counterfactual computations. We contribute and compare two compact, non-trivial, and sound integer linear programming (ILP) and Maximum Satisfiability (MaxSAT) encodings to check causality. Given a candidate cause, both approaches identify what a minimal cause is. Also, we present an ILP encoding to infer causality without requiring a candidate cause. We show that both notions are efficiently automated. Using models with more than $8000$ variables, checking is computed in a matter of seconds, with MaxSAT outperforming ILP in many cases. In contrast, inference is computed in a matter of minutes.

5.3SEMay 7, 2020

Expressing Accountability Patterns using Structural Causal Models

Severin Kacianka, Amjad Ibrahim, Alexander Pretschner

While the exact definition and implementation of accountability depend on the specific context, at its core accountability describes a mechanism that will make decisions transparent and often provides means to sanction "bad" decisions. As such, accountability is specifically relevant for Cyber-Physical Systems, such as robots or drones, that embed themselves into a human society, take decisions and might cause lasting harm. Without a notion of accountability, such systems could behave with impunity and would not fit into society. Despite its relevance, there is currently no agreement on its meaning and, more importantly, no way to express accountability properties for these systems. As a solution we propose to express the accountability properties of systems using Structural Causal Models. They can be represented as human-readable graphical models while also offering mathematical tools to analyze and reason over them. Our central contribution is to show how Structural Causal Models can be used to express and analyze the accountability properties of systems and that this approach allows us to identify accountability patterns. These accountability patterns can be catalogued and used to improve systems and their architectures.

2.7CRSep 25, 2019

VirtSC: Combining Virtualization Obfuscation with Self-Checksumming

Mohsen Ahmadvand, Daniel Below, Sebastian Banescu et al.

Self-checksumming (SC) is a tamper-proofing technique that ensures certain program segments (code) in memory hash to known values at runtime. SC has few restrictions on application and hence can protect a vast majority of programs. The code verification in SC requires computation of the expected hashes after compilation, as the machine-code is not known before. This means the expected hash values need to be adjusted in the binary executable, hence combining SC with other protections is limited due to this adjustment step. However, obfuscation protections are often necessary, as SC protections can be otherwise easily detected and disabled via pattern matching. In this paper, we present a layered protection using virtualization obfuscation, yielding an architecture-agnostic SC protection that requires no post-compilation adjustment. We evaluate the performance of our scheme using a dataset of 25 real-world programs (MiBench and 3 CLI games). Our results show that the SC scheme induces an average overhead of 43% for a complete protection (100% coverage). The overhead is tolerable for less CPU-intensive programs (e.g. games) and when only parts of programs (e.g. license checking) are protected. However, large overheads stemming from the virtualization obfuscation were encountered.

3.6AIApr 30, 2019Code

Efficiently Checking Actual Causality with SAT Solving

Amjad Ibrahim, Simon Rehwald, Alexander Pretschner

Recent formal approaches towards causality have made the concept ready for incorporation into the technical world. However, causality reasoning is computationally hard; and no general algorithmic approach exists that efficiently infers the causes for effects. Thus, checking causality in the context of complex, multi-agent, and distributed socio-technical systems is a significant challenge. Therefore, we conceptualize an intelligent and novel algorithmic approach towards checking causality in acyclic causal models with binary variables, utilizing the optimization power in the solvers of the Boolean Satisfiability Problem (SAT). We present two SAT encodings, and an empirical evaluation of their efficiency and scalability. We show that causality is computed efficiently in less than 5 seconds for models that consist of more than 4000 variables.

10.9CRMar 25, 2019

Don't Pick the Cherry: An Evaluation Methodology for Android Malware Detection Methods

Aleieldin Salem, Sebastian Banescu, Alexander Pretschner

In evaluating detection methods, the malware research community relies on scan results obtained from online platforms such as VirusTotal. Nevertheless, given the lack of standards on how to interpret the obtained data to label apps, researchers hinge on their intuitions and adopt different labeling schemes. The dynamicity of VirusTotal's results along with adoption of different labeling schemes significantly affect the accuracies achieved by any given detection method even on the same dataset, which gives subjective views on the method's performance and hinders the comparison of different malware detection techniques. In this paper, we demonstrate the effect of varying (1) time, (2) labeling schemes, and (3) attack scenarios on the performance of an ensemble of Android repackaged malware detection methods, called dejavu, using over 30,000 real-world Android apps. Our results vividly show the impact of varying the aforementioned 3 dimensions on dejavu's performance. With such results, we encourage the adoption of a standard methodology that takes into account those 3 dimensions in evaluating newly-devised methods to detect Android (repackaged) malware.

8.2SEOct 23, 2018

Understanding and Formalizing Accountability for Cyber-Physical Systems

Severin Kacianka, Alexander Pretschner

Accountability is the property of a system that enables the uncovering of causes for events and helps understand who or what is responsible for these events. Definitions and interpretations of accountability differ; however, they are typically expressed in natural language that obscures design decisions and the impact on the overall system. This paper presents a formal model to express the accountability properties of cyber-physical systems. To illustrate the usefulness of our approach, we demonstrate how three different interpretations of accountability can be expressed using the proposed model and describe the implementation implications through a case study. This formal model can be used to highlight context specific-elements of accountability mechanisms, define their capabilities, and express different notions of accountability. In addition, it makes design decisions explicit and facilitates discussion, analysis and comparison of different approaches.

4.3LOOct 11, 2018

Model-Based Safety and Security Engineering

Vivek Nigam, Alexander Pretschner, Harald Ruess

By exploiting the increasing surface attack of systems, cyber-attacks can cause catastrophic events, such as, remotely disable safety mechanisms. This means that in order to avoid hazards, safety and security need to be integrated, exchanging information, such as, key hazards/threats, risk evaluations, mechanisms used. This white paper describes some steps towards this integration by using models. We start by identifying some key technical challenges. Then we demonstrate how models, such as Goal Structured Notation (GSN) for safety and Attack Defense Trees (ADT) for security, can address these challenges. In particular, (1) we demonstrate how to extract in an automated fashion security relevant information from safety assessments by translating GSN-Models into ADTs; (2) We show how security results can impact the confidence of safety assessments; (3) We propose a collaborative development process where safety and security assessments are built by incrementally taking into account safety and security analysis; (4) We describe how to carry out trade-off analysis in an automated fashion, such as identifying when safety and security arguments contradict each other and how to solve such contradictions. We conclude pointing out that these are the first steps towards a wide range of techniques to support Safety and Security Engineering. As a white paper, we avoid being too technical, preferring to illustrate features by using examples and thus being more accessible.

4.9SEMar 13, 2018

Reviewing KLEE's Sonar-Search Strategy in Context of Greybox Fuzzing

Saahil Ognawala, Alexander Pretschner, Thomas Hutzelmann et al.

Automatic test-case generation techniques of symbolic execution and fuzzing are the most widely used methods to discover vulnerabilities in, both, academia and industry. However, both these methods suffer from fundamental drawbacks that stop them from achieving high path coverage that may, consequently, lead to discovering vulnerabilities at the numerical scale of static analysis. In this presentation, we examine systems-under-test (SUTs) at the granularity level of functions and postulate that achieving higher function coverage (execution of functions in a program at least once) than, both, symbolic execution and fuzzing may be a necessary condition for discovering more vulnerabilities than both. We will start this presentation with the design of a targeted search strategy for KLEE, sonar-search, that prioritizes paths leading to a target function, rather than maximizing overall path coverage in the program. Then, we will show that examining SUTs at the level of functions (compositional analysis) leads to discovering more vulnerabilities than symbolic execution from a single entry point. Using this finding, we will, then, demonstrate a greybox fuzzing method that can achieve higher function coverage than symbolic execution. Finally, we will present a framework to effectively manage vulnerabilities and assess their severities.

10.1SEJan 24, 2017

One evaluation of model-based testing and its automation

Alexander Pretschner, Wolfgang Prenninger, Stefan Wagner et al.

Model-based testing relies on behavior models for the generation of model traces: input and expected output---test cases---for an implementation. We use the case study of an automotive network controller to assess different test suites in terms of error detection, model coverage, and implementation coverage. Some of these suites were generated automatically with and without models, purely at random, and with dedicated functional test selection criteria. Other suites were derived manually, with and without the model at hand. Both automatically and manually derived model-based test suites detected significantly more requirements errors than hand-crafted test suites that were directly derived from the requirements. The number of detected programming errors did not depend on the use of models. Automatically generated model-based test suites detected as many errors as hand-crafted model-based suites with the same number of tests. A sixfold increase in the number of model-based tests led to an 11% increase in detected errors.

3.3SENov 30, 2016

A Field Study on the Elicitation and Classification of Defects for Defect Models

D. Holling, D. Méndez Fernández, A. Pretschner

Defect models capture faults and methods to provoke failures. To integrate such defect models into existing quality assurance processes, we developed a defect model lifecycle framework, in which the elicitation and classification of context-specific defects forms a crucial step. Although we could gather first insights from its practical application, we still have little knowledge about its benefits and limitations. We aim at qualitatively analyzing the context-specific elicitation and classification of defects to explore the suitability of our approach for practical application. We apply case study research in multiple contexts and analyze (1) what kind of defects we can elicit and the degree to which the defects matter to a context only, (2) the extent to which it leads to results useful enough for describing and operationalizing defect models, and (3) if there is a perceived additional immediate benefit from a practitioner's perspective. Our results strengthen our confidence on the suitability of our approach to elicit defects that are context-specific as well as context-independent. We conclude so far that our approach is suitable to provide a blueprint on how to elicit and classify defects for specific contexts to be used for the improvement of quality assurance techniques.

4.3CYAug 29, 2016

Towards a Unified Model of Accountability Infrastructures

Severin Kacianka, Florian Kelbert, Alexander Pretschner

Accountability aims to provide explanations for why unwanted situations occurred, thus providing means to assign responsibility and liability. As such, accountability has slightly different meanings across the sciences. In computer science, our focus is on providing explanations for technical systems, in particular if they interact with their physical environment using sensors and actuators and may do serious harm. Accountability is relevant when considering safety, security and privacy properties and we realize that all these incarnations are facets of the same core idea. Hence, in this paper we motivate and propose a model for accountability infrastructures that is expressive enough to capture all of these domains. At its core, this model leverages formal causality models from the literature in order to provide a solid reasoning framework. We show how this model can be instantiated for several real-world use cases.

5.7CRFeb 11, 2015

FEEBO: An Empirical Evaluation Framework for Malware Behavior Obfuscation

Sebastian Banescu, Tobias Wüchner, Marius Guggenmos et al.

Program obfuscation is increasingly popular among malware creators. Objectively comparing different malware detection approaches with respect to their resilience against obfuscation is challenging. To the best of our knowledge, there is no common empirical framework for evaluating the resilience of malware detection approaches w.r.t. behavior obfuscation. We propose and implement such a framework that obfuscates the observable behavior of malware binaries. To assess the framework's utility, we use it to obfuscate known malware binaries and then investigate the impact on detection effectiveness of different $n$-gram based detection approaches. We find that the obfuscation transformations employed by our framework significantly affect the precision of such detection approaches. Several $n$-gram-based approaches can hence be concluded not to be resilient against this simple kind of obfuscation.

10.9CRFeb 5, 2015

Robust and Effective Malware Detection through Quantitative Data Flow Graph Metrics

Tobias Wüchner, Martín Ochoa, Alexander Pretschner

We present a novel malware detection approach based on metrics over quantitative data flow graphs. Quantitative data flow graphs (QDFGs) model process behavior by interpreting issued system calls as aggregations of quantifiable data flows.Due to the high abstraction level we consider QDFG metric based detection more robust against typical behavior obfuscation like bogus call injection or call reordering than other common behavioral models that base on raw system calls. We support this claim with experiments on obfuscated malware logs and demonstrate the superior obfuscation robustness in comparison to detection using n-grams. Our evaluations on a large and diverse data set consisting of about 7000 malware and 500 goodware samples show an average detection rate of 98.01% and a false positive rate of 0.48%. Moreover, we show that our approach is able to detect new malware (i.e. samples from malware families not included in the training set) and that the consideration of quantities in itself significantly improves detection precision.