cs.SEComputer Science

Software Engineering

Software development, testing, maintenance

34.6SEMar 11

Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining

Zhiyuan Zeng, Yichi Zhang, Yong Shan et al.

This addresses the problem of LLMs' shallow reasoning in software engineering for developers and AI researchers, representing a new paradigm rather than incremental work.

32.3SEMar 17Code

InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang, Wei Zhang, Jiajun Wu et al.

This addresses performance gaps in industrial code intelligence for domains like chip design and embedded systems, though it appears incremental as it builds on existing foundation model approaches.

29.2SEMar 13

EvoClaw: Evaluating AI Agents on Continuous Software Evolution

Gangda Deng, Zhaoling Chen, Zhongming Yu et al.

This addresses the need for benchmarks that assess AI agents in dynamic, real-world software environments, which is incremental as it builds on existing evaluation methods.

30.5SEMar 26

Composer 2 Technical Report

Cursor Research, Aaron Chan, Ahmed Shalaby et al. · berkeley, microsoft-research

This addresses the need for efficient coding models in software engineering, though it appears incremental as it builds on previous Composer models.

27.8AIMar 10

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Aili Chen, Chi Zhang, Junteng Liu et al.

This addresses the challenge of robust generalization for tool-using LLMs in agentic tasks, offering a scalable solution to improve performance on out-of-distribution benchmarks, though it is incremental as it builds on existing synthesis methods.

31.6SEApr 16

Scaling Test-Time Compute for Agentic Coding

Joongwon Kim, Wannan Yang, Kelvin Niu et al.

For developers of coding agents, this work addresses the bottleneck of scaling test-time compute for long-horizon tasks by focusing on representation and reuse of prior experience.

27.3SEMar 16Code87

Immersion in the GitHub Universe: Scaling Coding Agents to Mastery

Jiale Zhao, Guoxin Chen, Fanzhe Meng et al.

This addresses the data bottleneck for training LLM-based software engineering agents, though it is incremental in automating data construction rather than a fundamental breakthrough.

25.8SEMar 18

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

Lintang Sutawika, Aditya Bharat Soni, Bharath Sriraam R R et al. · cmu

This addresses the need for efficient code search in software development, offering a simpler, agent-based approach that is incremental over prior methods using specialized tools.

25.5SEMar 16Code

daVinci-Env: Open SWE Environment Synthesis at Scale

Dayuan Fu, Shenyu Wu, Yunze Wu et al.

This provides a scalable, open-source solution for academic researchers to train software engineering agents, addressing a barrier in the field.

23.7SEApr 20Code

WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models

Xinping Lei, Xinyu Che, Junqi Xiong et al.

Provides a comprehensive evaluation framework for web coding capabilities of LLMs, addressing gaps in existing benchmarks.

32.7SEMay 5Code869

ProgramBench: Can Language Models Rebuild Programs From Scratch?

John Yang, Kilian Lieret, Jeffrey Ma et al.

For AI software engineering, it reveals that current models fail at holistic codebase construction and produce monolithic implementations unlike human code.

20.9SEApr 2Code

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Jialin Yang, Dongfu Jiang, Lipeng He et al. · amazon-science, utoronto

This work addresses the need for better evaluation of LLMs in software development workflows, where generating structured outputs is critical, though it is incremental as it builds on prior benchmarking efforts.

21.2ROMar 17

ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K

Kaixuan Wang, Tianxing Chen, Jiawei Liu et al.

This addresses the problem of limited simulation data for robotic manipulation researchers, though it is incremental as it builds on existing simulation-based learning paradigms.

25.7SEApr 9

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Chenyu Zhou, Huacan Chai, Wenteng Chen et al.

This provides a unified review framework for researchers and practitioners building LLM agents, though it is primarily conceptual rather than presenting new experimental results.

22.5AIMay 11Code

Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

Simon Yu, Derek Chong, Ananjan Nandi et al.

Provides an efficient infrastructure for programming meta-agents, enabling runtime intervention, counterfactual optimization, and tree-RL training.

19.2AIMar 17

IQuest-Coder-V1 Technical Report

Jian Yang, Wei Zhang, Shawn Guo et al.

This work addresses the need for more dynamic and efficient code generation models for developers and researchers, though it appears incremental with architectural enhancements.

26.5AIMay 11

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

Bihui Yu, Xinglong Xu, Junjie Jiang et al.

This work addresses the problem of automating visual typesetting optimization for scientific document preparation, a critical but overlooked stage in document automation.

19.6CRMar 16Code

ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems

Yihao Zhang, Zeming Wei, Xiaokun Luan et al.

This addresses critical security risks for users of interconnected multi-agent systems, exposing vulnerabilities that could lead to autonomous attacks without attacker intervention.

12.3CLApr 23Code35

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

Qijun Han, Haoqin Tu, Zijun Wang et al.

For developers of autonomous GUI agents, this work provides a practical modular solution to common failure modes, though it is an incremental engineering contribution combining existing ideas.

20.3SEMar 16

VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?

Srijan Bansal, Jiao Fangkai, Yilun Zhou et al.

This addresses a critical gap for autonomous software engineering by systematically evaluating LLMs' debugging capabilities, revealing foundational limitations in fault reasoning that hinder agentic coding tools.