CR AI PLApr 21, 2025

C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

Melih Sirlanci, Carter Yagemann, Zhiqiang Lin

arXiv:2504.15144v12 citationsh-index: 8

Originality Incremental advance

AI Analysis

This provides a minimized, representative dataset for evaluating C-to-Rust transpilation frameworks, addressing a bottleneck in memory safety migration efforts.

The authors tackled the lack of a comprehensive evaluation dataset for C-to-Rust transpilation by building a method to select functions from a large set, resulting in C2RUST-BENCH containing 2,905 representative functions selected from 15,503 functions of real-world programs.

Despite the effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a critical problem. Recent reports suggest that the key solution is to migrate to memory-safe languages. To this end, C-to-Rust transpilation becomes popular to resolve memory-safety issues in C programs. Recent works propose C-to-Rust transpilation frameworks; however, a comprehensive evaluation dataset is missing. Although one solution is to put together a large enough dataset, this increases the analysis time in automated frameworks as well as in manual efforts for some cases. In this work, we build a method to select functions from a large set to construct a minimized yet representative dataset to evaluate the C-to-Rust transpilation. We propose C2RUST-BENCH that contains 2,905 functions, which are representative of C-to-Rust transpilation, selected from 15,503 functions of real-world programs.

View on arXiv PDF

Similar