FailureMem: A Failure-Aware Multimodal Framework for Autonomous Software Repair
This work addresses the challenge of improving repair accuracy in software development by integrating multimodal reasoning, though it appears incremental as it builds on existing methods with specific enhancements.
The paper tackled the problem of multimodal automated program repair by addressing limitations in existing LLM-based systems, such as rigid workflows and lack of localized visual grounding, and proposed FailureMem, which improved the resolved rate by 3.7% over GUIRepair on SWE-bench Multimodal.
Multimodal Automated Program Repair (MAPR) extends traditional program repair by requiring models to jointly reason over source code, textual issue descriptions, and visual artifacts such as GUI screenshots. While recent LLM-based repair systems have shown promising results, existing approaches face several limitations: rigid workflow pipelines restrict exploration during debugging, visual reasoning is often performed over full-page screenshots without localized grounding, and failed repair attempts are rarely transformed into reusable knowledge. To address these challenges, we propose FailureMem, a multimodal repair framework that integrates three key mechanisms: a hybrid workflow-agent architecture that balances structured localization with flexible reasoning, active perception tools that enable region-level visual grounding, and a Failure Memory Bank that converts past repair attempts into reusable guidance. Experiments on SWE-bench Multimodal demonstrate FailureMem improves the resolved rate over GUIRepair by 3.7%.