SE AIFeb 5, 2025

Large Language Model Guided Self-Debugging Code Generation

Muntasir Adnan, Zhiwei Xu, Carlos C. N. Kuhn

arXiv:2502.02928v216.87 citationsh-index: 2

Originality Highly original

AI Analysis

This addresses computational efficiency and error correction in code generation for AI systems, representing a strong specific gain.

The paper tackles the problem of automated code generation by proposing PyCapsule, a framework with a two-agent pipeline and self-debugging modules for Python, achieving improvements of up to 5.7% on HumanEval, 10.3% on HumanEval-ET, and 24.4% on BigCodeBench compared to state-of-the-art methods.

Automated code generation is gaining significant importance in intelligent computer programming and system deployment. However, current approaches often face challenges in computational efficiency and lack robust mechanisms for code parsing and error correction. In this work, we propose a novel framework, PyCapsule, with a simple yet effective two-agent pipeline and efficient self-debugging modules for Python code generation. PyCapsule features sophisticated prompt inference, iterative error handling, and case testing, ensuring high generation stability, safety, and correctness. Empirically, PyCapsule achieves up to 5.7% improvement of success rate on HumanEval, 10.3% on HumanEval-ET, and 24.4% on BigCodeBench compared to the state-of-art methods. We also observe a decrease in normalized success rate given more self-debugging attempts, potentially affected by limited and noisy error feedback in retention. PyCapsule demonstrates broader impacts on advancing lightweight and efficient code generation for artificial intelligence systems.

View on arXiv PDF

Similar