62.7PLApr 6
Search-Based Multi-Trajectory Refinement for Safe C-to-Rust Translation with Large Language ModelsHoHyun Sim, Hyeonjoong Cho, Yeonghyeon Go et al.
The C programming language has been foundational in building system-level software. However, its manual memory management model frequently leads to memory safety issues. In response, Rust has emerged as a memory-safe alternative. Moreover, automating the C-to-Rust translation empowered by the rapid advancements of the generative capabilities of LLMs is gaining growing interest for large volumes of legacy C code. Leveraging LLM for the C-to-Rust translation introduces distinct challenges, unlike the math or commonsense QA domains where the LLMs have been predominantly applied. First, the scarcity of parallel C-to-Rust datasets hinders the retrieval of suitable code translation exemplars for in-context learning. Second, unlike math or commonsense QA problems, the intermediate steps required for C-to-Rust are not well-defined. Third, it remains unclear how to organize and cascade these intermediate steps to construct a correct translation trajectory. While existing LLM-based approaches have achieved some success, they have relied on iterative code refinement along a single search trajectory on a C-to-Rust problem space and have not explored the use of systematic search mechanisms to navigate the space of possible refinement trajectories. To address these challenges in the C-to-Rust translation, we propose the MCTS-Guided LLM refinement technique for automated C-to-safe-Rust translation (LAC2R). LAC2R uses MCTS to systematically explore multiple refinement trajectories and organize the LLM-induced intermediate steps for correct translation. We experimentally demonstrated that LAC2R effectively conducts C-to-Rust translation on large-scale, real-world benchmarks. On small-scale benchmarks, LAC2R is the only method that simultaneously attains the highest safety ratio, perfect project-level correctness, and the fewest linter warnings among the compared methods.
SEAug 2, 2021Code
DepRes: A Tool for Resolving Fully Qualified Names and Their DependenciesAli Shokri, Mehdi Mirakhorli
Reusing code snippets shared by other programmers on Q&A forums (e.g., StackOverflow) is a common practice followed by software developers. However, lack of sufficient information about the fully qualified name (FQN) of identifiers in borrowed code snippets, results in serious compile errors. Programmers either have to manually search for the correct FQN of identifiers which is a tedious and error-prone process, or use tools developed to automatically identify correct FQNs. Despite the efforts made by researchers to automatically identify FQNs in code snippets, the current approaches suffer from low accuracy when it comes to practice. Moreover, while these tools focus on resolving the FQN for an identifier in a code snippet, they leave the challenge of finding the correct third-party library (i.e., dependency) implementing that FQN unresolved. Using an incorrect dependency or incorrect version of a dependency might lead to a semantic error which is not detectable by compilers. Therefore, it can result in serious damages in the run-time. In this paper, we introduce DepRes, a tool that leverages a sketch-based approach to resolve FQNs in java-based code snippets and recommend the correct dependency for each FQN. The source code, documentation, as well as a demo video of DepRes tool is available from its code repository at https://github.com/SoftwareDesignLab/DepRes-Tool.
21.8SEApr 6
ENCRUST: Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust TranslationHohyun Sim, Hyeonjoong Cho, Ali Shokri et al.
We present Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation, a two-phase pipeline for translating real-world C projects to safe Rust. Existing approaches either produce unsafe output without memory-safety guarantees or translate functions in isolation, failing to detect cross-unit type mismatches or handle unsafe constructs requiring whole-program reasoning. Furthermore, function-level LLM pipelines require coordinated caller updates when type signatures change, while project-scale systems often fail to produce compilable output under real-world dependency complexity. Encrust addresses these limitations by decoupling boundary adaptation from function logic via an Application Binary Interface (ABI)-preserving wrapper pattern and validating each intermediate state against the integrated codebase. Phase 1 (Encapsulated Substitution) translates each function using an ABI-preserving wrapper that splits it into two components: a caller-transparent shim retaining the original raw-pointer signature, and a safe inner function targeted by the LLM with a clean, scope-limited prompt. This enables independent per-function type changes with automatic rollback on failure, without coordinated caller updates. A deterministic, type-directed wrapper elimination pass then removes wrappers after successful translation. Phase 2 (Agentic Refinement) resolves unsafe constructs beyond per-function scope, including static mut globals, skipped wrapper pairs, and failed translations, using an LLM agent operating on the whole codebase under a baseline-aware verification gate. We evaluate Encrust on 7 GNU Coreutils programs and 8 libraries from the Laertes benchmark, showing substantial unsafe-construct reduction across all 15 programs while maintaining full test-vector correctness.
SEAug 16, 2021
A Program Synthesis Approach for Adding Architectural Tactics to An Existing Code BaseAli Shokri
Automatically constructing a program based on given specifications has been studied for decades. Despite the advances in the field of Program Synthesis, the current approaches still synthesize a block of code snippet and leave the task of reusing it in an existing code base to program developers. Due to its program-wide effects, synthesizing an architectural tactic and reusing it in a program is even more challenging. Architectural tactics need to be synthesized based on the context of different locations of the program, broken down to smaller pieces, and added to corresponding locations in the code. Moreover, each piece needs to establish correct data- and control-dependencies to its surrounding environment as well as to the other synthesized pieces. This is an error-prone and challenging task, especially for novice program developers. In this paper, we introduce a novel program synthesis approach that synthesizes architectural tactics and adds them to an existing code base.
SEMar 11, 2021
ArCode: A Tool for Supporting Comprehension andImplementation of Architectural ConcernsAli Shokri, Mehdi Mirakhorli
Integrated development environments (IDE) play an important role in supporting developers during program comprehension and completion. Many of these supportive features focus on low-level programming and debugging activities. Unfortunately, there is less support in understanding and implementing architectural concerns in the form of patterns, tactics and/or other concerns. In this paper we present ArCode, a tool designed as a plugin for a popular IDE, IntelliJ IDEA. ArCode is able to learn correct ways of using frameworks' API to implement architectural concerns such as Authentication and Authorization. Analyzing the programmer's code, this tool is able to find deviations from correct implementation and provide fix recommendations alongside with graphical demonstrations to better communicate the recommendations with the developers. We showcase how programmers can benefit from ArCode by providing an API misuse detection and API recommendation scenario for a famous Java framework, Java Authentication and Authorization (JAAS) security framework.
SEFeb 16, 2021
ArCode: Facilitating the Use of Application Frameworks to Implement Tactics and PatternsAli Shokri, Joanna C. S. Santos, Mehdi Mirakhorli
Software designers and developers are increasingly relying on application frameworks as first-class design concepts. They instantiate the services that frameworks provide to implement various architectural tactics and patterns. One of the challenges in using frameworks for such tasks is the difficulty of learning and correctly using frameworks' APIs. This paper introduces a learning-based approach called ArCode to help novice programmers correctly use frameworks' APIs to implement architectural tactics and patterns. ArCode has several novel components: a graph-based approach for learning specification of a framework from a limited number of training software, a program analysis algorithm to eliminate erroneous training data, and a recommender module to help programmers use APIs correctly and identify API misuses in their programs. We evaluated our technique across two popular frameworks: JAAS security framework used for authentication and authorization tactic and Java RMI framework used to enable remote method invocation between client and server and other object-oriented patterns. Our evaluation results show (i) the feasibility of using ArCode to learn the specification of a framework; (ii) ArCode generates accurate recommendations for finding the next API call to implement an architectural tactic/pattern based on the context of the programmer's code; (iii) it accurately detects API misuses in the code that implements a tactic/pattern and provides fix recommendations. Comparison of ArCode with two prior techniques (MAPO and GrouMiner) on API recommendation and misuse detection shows that ArCode outperforms these approaches.
NANov 1, 2014
High phase-lag order trigonometrically fitted two-step Obrechkoff methods for the numerical solution of periodic initial value problemsAli Shokri, Hosein Saadat
In this paper, we present the two-step trigonometrically fitted symmetric Obrechkoff methods with algebraic order of twelve. The method is based on the symmetric two-step Obrechkoff method, with 12 algebraic order, high phase-lag order and is constructed to solve IVPs with periodic solutions such as orbital problems. We compare the new method to some recently constructed optimized methods from the literature. The numerical results obtained by the new method for some problems show its superiority in efficiency, accuracy and stability.