David Clark

h-index45

6papers

157citations

Novelty53%

AI Score25

Ranked #163,871 of 194,257 authors (top 84%)#1,940 in SE (top 64%)

6 Papers

3.0SENov 21, 2020

An Empirical Study on Failed Error Propagation in Java Programs with Real Faults

Gunel Jahangirova, David Clark, Mark Harman et al.

During testing, developers can place oracles externally or internally with respect to a method. Given a faulty execution state, i.e., one that differs from the expected one, an oracle might be unable to expose the fault if it is placed at a program point with no access to the incorrect program state or where the program state is no longer corrupted. In such a case, the oracle is subject to failed error propagation. We conducted an empirical study to measure failed error propagation on Defects4J, the reference benchmark for Java programs with real faults, considering all 6 projects available (386 real bugs and 459 fixed methods). Our results indicate that the prevalence of failed error propagation is negligible when testing is performed at the unit level. However, when system-level inputs are provided, the prevalence of failed error propagation increases substantially. This indicates that it is enough for method postconditions to predicate only on the externally observable state/data and that intermediate steps should be checked when testing at system level.

3.0SEJun 29, 2020

A Generative Neural Network Framework for Automated Software Testing

Leonid Joffe, David J. Clark

Search Based Software Testing (SBST) is a popular automated testing technique which uses a feedback mechanism to search for faults in software. Despite its popularity, it has fundamental challenges related to the design, construction and interpretation of the feedback. Neural Networks (NN) have been hugely popular in recent years for a wide range of tasks. We believe that they can address many of the issues inherent to common SBST approaches. Unfortunately, NNs require large and representative training datasets. In this work we present an SBST framework based on a deconvolutional generative neural network. Not only does it retain the beneficial qualities that make NNs appropriate for SBST tasks, it also produces its own training data which circumvents the problem of acquiring a training dataset that limits the use of NNs. We demonstrate through a series of experiments that this architecture is possible and practical. It generates diverse, sensible program inputs, while exploring the space of program behaviours. It also creates a meaningful ordering over program behaviours and is able to find crashing executions. This is all done without any prior knowledge of the program. We believe this proof of concept opens new directions for future work at the intersection of SBST and neural networks.

2.7SEJun 26, 2018

Indexing Operators to Extend the Reach of Symbolic Execution

Earl T. Barr, David Clark, Mark Harman et al.

Traditional program analysis analyses a program language, that is, all programs that can be written in the language. There is a difference, however, between all possible programs that can be written and the corpus of actual programs written in a language. We seek to exploit this difference: for a given program, we apply a bespoke program transformation Indexify to convert expressions that current SMT solvers do not, in general, handle, such as constraints on strings, into equisatisfiable expressions that they do handle. To this end, Indexify replaces operators in hard-to-handle expressions with homomorphic versions that behave the same on a finite subset of the domain of the original operator, and return bottom denoting unknown outside of that subset. By focusing on what literals and expressions are most useful for analysing a given program, Indexify constructs a small, finite theory that extends the power of a solver on the expressions a target program builds. Indexify's bespoke nature necessarily means that its evaluation must be experimental, resting on a demonstration of its effectiveness in practice. We have developed Indexif}, a tool for Indexify. We demonstrate its utility and effectiveness by applying it to two real world benchmarks --- string expressions in coreutils and floats in fdlibm53. Indexify reduces time-to-completion on coreutils from Klee's 49.5m on average to 6.0m. It increases branch coverage on coreutils from 30.10% for Klee and 14.79% for Zesti to 66.83%. When indexifying floats in fdlibm53, Indexifyl increases branch coverage from 34.45% to 71.56% over Klee. For a restricted class of inputs, Indexify permits the symbolic execution of program paths unreachable with previous techniques: it covers more than twice as many branches in coreutils as Klee.

7.5CRSep 8, 2016

ITect: Scalable Information Theoretic Similarity for Malware Detection

Sukriti Bhattacharya, Hector D. Menendez, Earl Barr et al.

Malware creators have been getting their way for too long now. String-based similarity measures can leverage ground truth in a scalable way and can operate at a level of abstraction that is difficult to combat from the code level. We introduce ITect, a scalable approach to malware similarity detection based on information theory. ITect targets file entropy patterns in different ways to achieve 100% precision with 90% accuracy but it could target 100% recall instead. It outperforms VirusTotal for precision and accuracy on combined Kaggle and VirusShare malware.

24.0SEJun 10, 2015

Test Set Diameter: Quantifying the Diversity of Sets of Test Cases

Robert Feldt, Simon Poulding, David Clark et al.

A common and natural intuition among software testers is that test cases need to differ if a software system is to be tested properly and its quality ensured. Consequently, much research has gone into formulating distance measures for how test cases, their inputs and/or their outputs differ. However, common to these proposals is that they are data type specific and/or calculate the diversity only between pairs of test inputs, traces or outputs. We propose a new metric to measure the diversity of sets of tests: the test set diameter (TSDm). It extends our earlier, pairwise test diversity metrics based on recent advances in information theory regarding the calculation of the normalized compression distance (NCD) for multisets. An advantage is that TSDm can be applied regardless of data type and on any test-related information, not only the test inputs. A downside is the increased computational time compared to competing approaches. Our experiments on four different systems show that the test set diameter can help select test sets with higher structural and fault coverage than random selection even when only applied to test inputs. This can enable early test design and selection, prior to even having a software system to test, and complement other types of test automation and analysis. We argue that this quantification of test set diversity creates a number of opportunities to better understand software quality and provides practical ways to increase it.

12.2CRFeb 26, 2015

Detecting Malware with Information Complexity

Nadia Alshahwan, Earl T. Barr, David Clark et al.

This work focuses on a specific front of the malware detection arms-race, namely the detection of persistent, disk-resident malware. We exploit normalised compression distance (NCD), an information theoretic measure, applied directly to binaries. Given a zoo of labelled malware and benign-ware, we ask whether a suspect program is more similar to our malware or to our benign-ware. Our approach classifies malware with 97.1% accuracy and a false positive rate of 3%. We achieve our results with off-the-shelf compressors and a standard machine learning classifier and without any specialised knowledge. An end-user need only collect a zoo of malware and benign-ware and then can immediately apply our techniques. We apply statistical rigour to our experiments and our selection of data. We demonstrate that accuracy can be optimised by combining NCD with the compressibility rates of the executables. We demonstrate that malware reported within a more narrow time frame of a few days is more homogenous than malware reported over a longer one of two years but that our method still classifies the latter with 95.2% accuracy and a 5% false positive rate. Due to the use of compression, the time and computation cost of our method is non-trivial. We show that simple approximation techniques can improve the time complexity of our approach by up to 63%. We compare our results to the results of applying the 59 anti-malware programs used on the VirusTotal web site to our malware. Our approach does better than any single one of them as well as the 59 used collectively.