Suresh Kothari

SE
3papers
19citations
Novelty28%
AI Score31

3 Papers

SEDec 19, 2025
Holistic Evaluation of State-of-the-Art LLMs for Code Generation

Le Zhang, Suresh Kothari

This study presents a comprehensive empirical evaluation of six state-of-the-art large language models (LLMs) for code generation, including both general-purpose and code-specialized models. Using a dataset of 944 real-world LeetCode problems across five programming languages, we assess model performance using rigorous metrics: compile-time errors, runtime errors, functional failures, and algorithmic suboptimalities. The results reveal significant performance variations, with DeepSeek-R1 and GPT-4.1 consistently outperform others in terms of correctness, efficiency, and robustness. Through detailed case studies, we identify common failure scenarios such as syntax errors, logical flaws, and suboptimal algorithms, highlighting the critical role of prompt engineering and human oversight in improving results. Based on these findings, we provide actionable recommendations for developers and practitioners, emphasizing that successful LLM deployment depends on careful model selection, effective prompt design, and context-aware usage to ensure reliable code generation in real-world software development tasks.

CRApr 7, 2015
Security Toolbox for Detecting Novel and Sophisticated Android Malware

Benjamin Holland, Tom Deering, Suresh Kothari et al.

This paper presents a demo of our Security Toolbox to detect novel malware in Android apps. This Toolbox is developed through our recent research project funded by the DARPA Automated Program Analysis for Cybersecurity (APAC) project. The adversarial challenge ("Red") teams in the DARPA APAC program are tasked with designing sophisticated malware to test the bounds of malware detection technology being developed by the research and development ("Blue") teams. Our research group, a Blue team in the DARPA APAC program, proposed a "human-in-the-loop program analysis" approach to detect malware given the source or Java bytecode for an Android app. Our malware detection apparatus consists of two components: a general-purpose program analysis platform called Atlas, and a Security Toolbox built on the Atlas platform. This paper describes the major design goals, the Toolbox components to achieve the goals, and the workflow for auditing Android apps. The accompanying video (http://youtu.be/WhcoAX3HiNU) illustrates features of the Toolbox through a live audit.

SEApr 4, 2014
Event-Flow Graphs for Efficient Path-Sensitive Analyses

Ahmed Tamrawi, Suresh Kothari

Efficient and accurate path-sensitive analyses pose the challenges of: (a) analyzing an exponentially-increasing number of paths in a control-flow graph (CFG), and (b) checking feasibility of paths in a CFG. We address these challenges by introducing an equivalence relation on the CFG paths to partition them into equivalence classes. It is then sufficient to perform analysis on these equivalence classes rather than on the individual paths in a CFG. This technique has two major advantages: (a) although the number of paths in a CFG can be exponentially large, the essential information to be analyzed is captured by a small number of equivalence classes, and (b) checking path feasibility becomes simpler. The key challenge is how to efficiently compute equivalence classes of paths in a CFG without examining each path in the CFG? In this paper, we present a linear-time algorithm to form equivalence classes without the need for examination of each path in a CFG. The key to this algorithm is construction of an event-flow graph (EFG), a compact derivative of the CFG, in which each path represents an equivalence class of paths in the corresponding CFG. EFGs are defined with respect to the set of events that are in turn defined by the analyzed property. The equivalence classes are thus guaranteed to preserve all the event traces in the original CFG. We present an empirical evaluation of the Linux kernel (v3.12). The EFGs in our evaluation are defined with respect to events of the spin safe-synchronization property. Evaluation results show that there are many fewer EFG-based equivalence classes compared to the corresponding number of paths in a CFG. This reduction is close to 99% for CFGs with a large number of paths. Moreover, our controlled experiment results show that EFGs are human comprehensible and compact compared to their corresponding CFGs.