SE HC LGMar 19, 2025

Enhancing Code LLM Training with Programmer Attention

Yifan Zhang, Chen Huang, Zachary Karas, Dung Thuy Nguyen, Kevin Leach, Yu Huang

arXiv:2503.14936v28.04 citationsh-index: 9SIGSOFT FSE Companion

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing code intelligence for software engineering by integrating human attention data, though it is incremental as it builds on existing methods like CodeT5.

The paper tackled the problem of underutilizing human attention signals for training code LLMs by proposing a pipeline with augmentation, pattern abstraction, and reward-based fine-tuning, resulting in a +7.16 improvement in CodeBLEU on the CodeXGlue benchmark for code summarization.

Human attention provides valuable yet underexploited signals for code LLM training, offering a perspective beyond purely machine-driven attention. Despite the complexity and cost of collecting eye-tracking data, there has also been limited progress in systematically using these signals for code LLM training. To address both issues, we propose a cohesive pipeline spanning augmentation and reward-based fine-tuning. Specifically, we introduce (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that refines raw fixations into learnable attention motifs, and (3) a reward-guided strategy for integrating these insights directly into a CodeT5 supervised fine-tuning process. Our experiments yield +7.16 in CodeBLEU on the CodeXGlue benchmark for code summarization, underscoring how uniting human and machine attention can boost code intelligence. We hope this work encourages broader exploration of human-centric methods in next-generation AI4SE.

View on arXiv PDF

Similar