Hongxiang Zhang

AI
h-index44
6papers
54citations
Novelty55%
AI Score52

6 Papers

SEJun 2
FLARE: Fine-Grained Diagnostic Feedback for LLM Code Refinement

Yinsheng Yao, Hongxiang Zhang, Weixi Tong et al.

Large language models often generate code with bugs. Existing methods rely on feedback signals such as test failures and self-critiques to iteratively refine the generated code. Such signals are either too coarse-grained or too high-level, which is not sufficient to inform the model where to fix the bug. In this work, we present Flare, an iterative framework with a lightweight diagnostic model that predicts line-level suspiciousness signals for bug localization and code refinement. Given the inherent uncertainty of diagnostic predictions, Flare searches over the top-k suspicious regions and selects the best candidate according to execution outcomes. Experiments on LiveCodeBench and BigCodeBench with five base LLMs show that, even without candidate search (k=1), Flare outperforms the strongest baseline with an absolute improvement from 1.72% to 7.42%. Furthermore, searching over 10 candidates yields an average improvement of 8.50% compared with no candidate search. When evaluated in isolation, our lightweight diagnostic model achieves the best performance compared with recent fault localization methods, demonstrating that it can provide reliable fine-grained guidance for code refinement.

AIAug 24, 2023Code
Job Shop Scheduling Benchmark: Environments and Instances for Learning and Non-learning Methods

Robbert Reijnen, Igor G. Smit, Hongxiang Zhang et al.

Job shop scheduling problems address the routing and sequencing of tasks in a job shop setting. Despite significant interest from operations research and machine learning communities over the years, a comprehensive platform for testing and comparing solution methods has been notably lacking. To fill this gap, we introduce a unified implementation of job shop scheduling problems and their solution methods, addressing the long-standing need for a standardized benchmarking platform in this domain. Our platform supports classic Job Shop (JSP), Flow Shop (FSP), Flexible Job Shop (FJSP), and Assembly Job Shop (AJSP), as well as variants featuring Sequence-Dependent Setup Times (SDST), variants with online arrivals of jobs, and combinations of these problems (e.g., FJSP-SDST and FAJSP). The platfrom provides a wide range of scheduling solution methods, from heuristics, metaheuristics, and exact optimization to deep reinforcement learning. The implementation is available as an open-source GitHub repository, serving as a collaborative hub for researchers, practitioners, and those new to the field. Beyond enabling direct comparisons with existing methods on widely studied benchmark problems, this resource serves as a robust starting point for addressing constrained and complex problem variants. By establishing a comprehensive and unified foundation, this platform is designed to consolidate existing knowledge and to inspire the development of next-generation algorithms in job shop scheduling research.

AIMay 28
Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

Hongxiang Zhang, Yuan Tian, Tianyi Zhang

LLM-based multi-agent systems have demonstrated remarkable performance on complex tasks through collaborative reasoning. However, these systems tend to rapidly accumulate extremely long conversation histories during interaction. As conversations lengthen, relevant information is increasingly diluted by irrelevant context, leading to degraded performance. In this work, we present Agent-Radar, a training-free context management method that dynamically steers each agent's attention toward relevant context with a novel temporal and spatial decay mechanism. Our experiments demonstrate that Agent-Radar outperforms state-of-the-art methods across five different benchmarks, yielding gains of up to 7.64 absolute points. Furthermore, our analysis shows that Agent-Radar remains effective and robust as the number of agents and interaction rounds increases. Finally, the ablation study shows that core components in Agent-Radar are crucial to performance and generalizable in different settings.

CLMay 29, 2025
Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation

Hongxiang Zhang, Hao Chen, Muhao Chen et al.

Recent decoding methods improve the factuality of large language models (LLMs) by refining how the next token is selected during generation. These methods typically operate at the token level, leveraging internal representations to suppress superficial patterns. Nevertheless, LLMs remain prone to hallucinations, especially over longer contexts. In this paper, we propose Active Layer-Contrastive Decoding (ActLCD), a novel decoding strategy that actively decides when to apply contrasting layers during generation. By casting decoding as a sequential decision-making problem, ActLCD employs a reinforcement learning policy guided by a reward-aware classifier to optimize factuality beyond the token level. Our experiments demonstrate that ActLCD surpasses state-of-the-art methods across five benchmarks, showcasing its effectiveness in mitigating hallucinations in diverse generation scenarios.

CLOct 3, 2025
Self-Anchor: Large Language Model Reasoning via Step-by-step Attention Alignment

Hongxiang Zhang, Yuan Tian, Tianyi Zhang

To solve complex reasoning tasks for Large Language Models (LLMs), prompting-based methods offer a lightweight alternative to fine-tuning and reinforcement learning. However, as reasoning chains extend, critical intermediate steps and the original prompt will be buried in the context, receiving insufficient attention and leading to errors. In this paper, we propose Self-Anchor, a novel pipeline that leverages the inherent structure of reasoning to steer LLM attention. Self-Anchor decomposes reasoning trajectories into structured plans and automatically aligns the model's attention to the most relevant inference steps, allowing the model to maintain focus throughout generation. Our experiment shows that Self-Anchor outperforms SOTA prompting methods across six benchmarks. Notably, Self-Anchor significantly reduces the performance gap between ``non-reasoning'' models and specialized reasoning models, with the potential to enable most LLMs to tackle complex reasoning tasks without retraining.

CRJun 11, 2024
LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

Hongxiang Zhang, Yuyang Rong, Yifeng He et al.

Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing the Large Language Model to enhance greybox fuzzing for structured data. We utilize the pre-trained knowledge of LLM about data conversion and format to generate new valid inputs. We further fine-tuned it with paired mutation seeds to learn structured format and mutation strategies effectively. Our LLM-based fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing. We conduct experiments on the standard bug-based benchmark Magma and a wide variety of real-world programs. LLAMAFUZZ outperforms our top competitor by 41 bugs on average. We also identified 47 unique bugs across all trials. Moreover, LLAMAFUZZ demonstrated consistent performance on both bug trigger and bug reached. Compared to AFL++, LLAMAFUZZ achieved 27.19% more branches in real-world program sets on average. We also demonstrate a case study to explain how LLMs enhance the fuzzing process in terms of code coverage.