30.5SEApr 26Code
Ranking Plausible Patches by Historic Feature FrequenciesShifat Sahariar Bhuiyan, Abhishek Tiwari, Yu Pei et al.
Automated program repair (APR) techniques have achieved conspicuous progress, and are now capable of producing genuinely correct fixes in scenarios that were well beyond their capabilities only a few years ago. Nevertheless, even when an APR technique can find a correct fix for a bug, it still runs the risk of ranking the fix lower than other patches that are plausible (they pass all available tests) but incorrect. This can seriously hurt the technique's practical effectiveness, as the user will have to peruse a larger number of patches before finding the correct one. This paper presents PrevaRank, a technique that ranks plausible patches produced by any APR technique according to their feature similarity with historic programmer-written fixes for similar bugs. PrevaRank implements simple heuristics, which help make it scalable and applicable to any APR tool that produces plausible patches. In our experimental evaluation, after training PrevaRank on the fix history of 81 open-source Java projects, we used it to rank patches produced by 8 Java APR tools on 168 Defects4J bugs. PrevaRank consistently improved the ranking of correct fixes: for example, it ranked a correct fix within the top-3 positions in 27% more cases than the original tools did. Other experimental results indicate that PrevaRank works robustly with a variety of APR tools and bugs, with negligible overhead.
18.6SEMay 7
Empirical Derivations from an Evolving Test SuiteJukka Ruohonen, Abhishek Tiwari
The paper presents a longitudinal empirical analysis of the automated, continuous, and virtualization-based software test suite of the NetBSD operating system. The longitudinal period observed spans from the initial roll out of the test suite in the early 2010s to late 2025. According to the results, the test suite has grown continuously, currently covering over ten thousand individual test cases. Failed test cases exhibit overall stability, although there have been shorter periods marked with more frequent failures. A similar observation applies to build failures, failures of the test suite to complete, and installation failures, all of which are also captured by the NetBSD's testing framework. Finally, code churn and kernel modifications do not provide longitudinally consistent statistical explanations for the failures. Although some periods exhibit larger effects, including particularly with respect to the kernel modifications, the effects are small on average. Even though only in an exploratory manner, these empirical observations contribute to efforts to draw conclusions from large-scale and evolving software test suites.
SEAug 5, 2021Code
HIPPODROME: Data Race Repair using Static Analysis SummariesAndreea Costea, Abhishek Tiwari, Sigmund Chianasta et al.
Implementing bug-free concurrent programs is a challenging task in modern software development. State-of-the-art static analyses find hundreds of concurrency bugs in production code, scaling to large codebases. Yet, fixing these bugs in constantly changing codebases represents a daunting effort for programmers, particularly because a fix in the concurrent code can introduce other bugs in a subtle way. In this work, we show how to harness compositional static analysis for concurrency bug detection, to enable a new Automated Program Repair (APR) technique for data races in large concurrent Java codebases. The key innovation of our work is an algorithm that translates procedure summaries inferred by the analysis tool for the purpose of bug reporting, into small local patches that fix concurrency bugs (without introducing new ones). This synergy makes it possible to extend the virtues of compositional static concurrency analysis to APR, making our approach effective (it can detect and fix many more bugs than existing tools for data race repair), scalable (it takes seconds to analyse and suggest fixes for sizeable codebases), and usable (generally, it does not require annotations from the users and can perform continuous automated repair). Our study conducted on popular open-source projects has confirmed that our tool automatically produces concurrency fixes similar to those proposed by the developers in the past.
QUANT-PHMar 30, 2025
Quantum-Assisted Machine Learning Models for Enhanced Weather PredictionSaiyam Sakhuja, Shivanshu Siyanwal, Abhishek Tiwari et al.
Quantum Machine Learning (QML) presents as a revolutionary approach to weather forecasting by using quantum computing to improve predictive modeling capabilities. In this study, we apply QML models, including Quantum Gated Recurrent Units (QGRUs), Quantum Neural Networks (QNNs), Quantum Long Short-Term Memory(QLSTM), Variational Quantum Circuits(VQCs), and Quantum Support Vector Machines(QSVMs), to analyze meteorological time-series data from the ERA5 dataset. Our methodology includes preprocessing meteorological features, implementing QML architectures for both classification and regression tasks. The results demonstrate that QML models can achieve reasonable accuracy in both prediction and classification tasks, particularly in binary classification. However, challenges such as quantum hardware limitations and noise affect scalability and generalization. This research provides insights into the feasibility of QML for weather prediction, paving the way for further exploration of hybrid quantum-classical frameworks to enhance meteorological forecasting.
CRAug 4, 2020
A Large Scale Analysis of Android-Web HybridizationAbhishek Tiwari, Jyoti Prakash, Sascha Gross et al.
Many Android applications embed webpages via WebView components and execute JavaScript code within Android. Hybrid applications leverage dedicated APIs to load a resource and render it in a WebView. Furthermore, Android objects can be shared with the JavaScript world. However, bridging the interfaces of the Android and JavaScript world might also incur severe security threats: Potentially untrusted webpages and their JavaScript might interfere with the Android environment and its access to native features. No general analysis is currently available to assess the implications of such hybrid apps bridging the two worlds. To understand the semantics and effects of hybrid apps, we perform a large-scale study on the usage of the hybridization APIs in the wild. We analyze and categorize the parameters to hybridization APIs for 7,500 randomly selected and the 196 most popular applications from the Google Playstore as well as 1000 malware samples. Our results advance the general understanding of hybrid applications, as well as implications for potential program analyses, and the current security situation: We discovered thousands of flows of sensitive data from Android to JavaScript, the vast majority of which could flow to potentially untrustworthy code. Our analysis identified numerous web pages embedding vulnerabilities, which we exemplarily exploited. Additionally, we discovered a multitude of applications in which potentially untrusted JavaScript code may interfere with (trusted) Android objects, both in benign and malign applications.
SEMay 21, 2020
Concurrency-related Flaky Test Detection in Android appsZhen Dong, Abhishek Tiwari, Xiao Liang Yu et al.
Validation of Android apps via testing is difficult owing to the presence of flaky tests. Due to non-deterministic execution environments, a sequence of events (a test) may lead to success or failure in unpredictable ways. In this work, we present an approach and tool FlakeShovel for detecting flaky tests through systematic exploration of event orders. Our key observation is that for a test in a mobile app, there is a testing framework thread which creates the test events, a main User-Interface (UI) thread processing these events, and there may be several other background threads running asynchronously. For any event e whose execution involves potential non-determinism, we localize the earliest (latest) event after (before) which e must happen.We then efficiently explore the schedules between the upper/lower bound events while grouping events within a single statement, to find whether the test outcome is flaky. We also create a suite of subject programs called DroidFlaker to study flaky tests in Android apps. Our experiments on subject-suite DroidFlaker demonstrate the efficacy of our flaky test detection. Our work is complementary to existing flaky test detection tools like Deflaker which check only failing tests. FlakeShovel can detect flaky tests among passing tests, as shown by our approach and experiments.
SEDec 1, 2019
PointEval: On the Impact of Pointer Analysis FrameworksJyoti Prakash, Abhishek Tiwari, Christian Hammer
Pointer analysis is a foundational analysis leveraged by various static analyses. Therefore, it gathered wide attention in research for decades. Some pointer analysis frameworks are based on succinct declarative specifications. However, these tools are heterogeneous in terms of the underlying intermediate representation (IR), heap abstraction, and programming methodology. This situation complicates a fair comparison of these frameworks and thus hinders further research. Consequently, the literature lacks an evaluation of the strengths and weaknesses of these tools. In this work, we evaluate two major frameworks for pointer analysis, WALA and Doop, on the DaCapo set of benchmarks. We compare the pointer analyses available in Wala and Doop, and conclude that---even though based on a declarative specification---Doop provides a better pointer analysis than Wala in terms of precision and scalability. We also compare the two IRs used in Doop, i.e., Jimple from the Soot framework and IR from the Wala framework. Our evaluation shows that in the majority of the benchmarks Soot's IR gives a more precise and scalable pointer analysis. Finally, we propose a micro-benchmark \emph{PointerBench}, for which we manually validate the points-to statistics to evaluate the results of these tools.
LGJun 20, 2019
Cross-Subject Statistical Shift Estimation for Generalized Electroencephalography-based Mental Workload AssessmentIsabela Albuquerque, João Monteiro, Olivier Rosanne et al.
Assessment of mental workload in real-world conditions is key to ensure the performance of workers executing tasks that demand sustained attention. Previous literature has employed electroencephalography (EEG) to this end despite having observed that EEG correlates of mental workload vary across subjects and physical strain, thus making it difficult to devise models capable of simultaneously presenting reliable performance across users. Domain adaptation consists of a set of strategies that aim at allowing for improving machine learning systems performance on unseen data at training time. Such methods, however, might rely on assumptions over the considered data distributions, which typically do not hold for applications of EEG data. Motivated by this observation, in this work we propose a strategy to estimate two types of discrepancies between multiple data distributions, namely marginal and conditional shifts, observed on data collected from different subjects. Besides shedding light on the assumptions that hold for a particular dataset, the estimates of statistical shifts obtained with the proposed approach can be used for investigating other aspects of a machine learning pipeline, such as quantitatively assessing the effectiveness of domain adaptation strategies. In particular, we consider EEG data collected from individuals performing mental tasks while running on a treadmill and pedaling on a stationary bike and explore the effects of different normalization strategies commonly used to mitigate cross-subject variability. We show the effects that different normalization schemes have on statistical shifts and their relationship with the accuracy of mental workload prediction as assessed on unseen participants at training time.
SEDec 13, 2018
IIFA: Modular Inter-app Intent Information Flow Analysis of Android ApplicationsAbhishek Tiwari, Sascha Groß, Christian Hammer
Android apps cooperate through message passing via intents. However, when apps do not have identical sets of privileges inter-app communication (IAC) can accidentally or maliciously be misused, e.g., to leak sensitive information contrary to users expectations. Recent research considered static program analysis to detect dangerous data leaks due to inter-component communication (ICC) or IAC, but suffers from shortcomings with respect to precision, soundness, and scalability. To solve these issues we propose a novel approach for static ICC/IAC analysis. We perform a fixed-point iteration of ICC/IAC summary information to precisely resolve intent communication with more than two apps involved. We integrate these results with information flows generated by a baseline (i.e. not considering intents) information flow analysis, and resolve if sensitive data is flowing (transitively) through components/apps in order to be ultimately leaked. Our main contribution is the first fully automatic sound and precise ICC/IAC information flow analysis that is scalable for realistic apps due to modularity, avoiding combinatorial explosion: Our approach determines communicating apps using short summaries rather than inlining intent calls, which often requires simultaneously analyzing all tuples of apps. We evaluated our tool IIFA in terms of scalability, precision, and recall. Using benchmarks we establish that precision and recall of our algorithm are considerably better than prominent state-of-the-art analyses for IAC. But foremost, applied to the 90 most popular applications from the Google Playstore, IIFA demonstrated its scalability to a large corpus of real-world apps. IIFA reports 62 problematic ICC-/IAC-related information flows via two or more apps/components.
LGMay 22, 2018
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN TrainingBojian Zheng, Abhishek Tiwari, Nandita Vijaykumar et al.
The Long-Short-Term-Memory Recurrent Neural Networks (LSTM RNNs) are a popular class of machine learning models for analyzing sequential data. Their training on modern GPUs, however, is limited by the GPU memory capacity. Our profiling results of the LSTM RNN-based Neural Machine Translation (NMT) model reveal that feature maps of the attention and RNN layers form the memory bottleneck and runtime is unevenly distributed across different layers when training on GPUs. Based on these two observations, we propose to recompute the feature maps rather than stashing them persistently in the GPU memory. While the idea of feature map recomputation has been considered before, existing solutions fail to deliver satisfactory footprint reduction, as they do not address two key challenges. For each feature map recomputation to be effective and efficient, its effect on (1) the total memory footprint, and (2) the total execution time has to be carefully estimated. To this end, we propose *Echo*, a new compiler-based optimization scheme that addresses the first challenge with a practical mechanism that estimates the memory benefits of recomputation over the entire computation graph, and the second challenge by non-conservatively estimating the recomputation overhead leveraging layer specifics. *Echo* reduces the GPU memory footprint automatically and transparently without any changes required to the training source code, and is effective for models beyond LSTM RNNs. We evaluate *Echo* on numerous state-of-the-art machine learning workloads on real systems with modern GPUs and observe footprint reduction ratios of 1.89X on average and 3.13X maximum. Such reduction can be converted into faster training with a larger batch size, savings in GPU energy consumption (e.g., training with one GPU as fast as with four), and/or an increase in the maximum number of layers under the same GPU memory budget.