SE CLMar 9, 2025

DependEval: Benchmarking LLMs for Repository Dependency Understanding

Junjia Du, Yadi Liu, Hongcheng Guo, Jiawei Wang, Haojian Huang, Yunyi Ni, Zhoujun Li

arXiv:2503.06689v114 citationsh-index: 16ACL

Originality Synthesis-oriented

AI Analysis

This addresses the need for better benchmarking of LLMs in repository-level code understanding for software development, though it is incremental as it focuses on evaluation rather than new methods.

The authors tackled the problem of evaluating large language models' ability to understand complex code repositories by introducing DependEval, a hierarchical benchmark based on 15,576 real-world repositories across 8 programming languages, which revealed substantial performance gaps among over 25 LLMs.

While large language models (LLMs) have shown considerable promise in code generation, real-world software development demands advanced repository-level reasoning. This includes understanding dependencies, project structures, and managing multi-file changes. However, the ability of LLMs to effectively comprehend and handle complex code repositories has yet to be fully explored. To address challenges, we introduce a hierarchical benchmark designed to evaluate repository dependency understanding (DependEval). Benchmark is based on 15,576 repositories collected from real-world websites. It evaluates models on three core tasks: Dependency Recognition, Repository Construction, and Multi-file Editing, across 8 programming languages from actual code repositories. Our evaluation of over 25 LLMs reveals substantial performance gaps and provides valuable insights into repository-level code understanding.

View on arXiv PDF

Similar