Dazhi Zhang

LG
h-index19
9papers
21citations
Novelty63%
AI Score58

9 Papers

AIApr 16Code
MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation

Pengfei Li, Shijie Wang, Fangyuan Li et al.

Reinforcement learning (RL) paradigms have demonstrated strong performance on reasoning-intensive tasks such as code generation. However, limited trajectory diversity often leads to diminishing returns, which constrains the achievable performance ceiling. Search-enhanced RL alleviates this issue by introducing structured exploration, which remains constrained by the single-agent policy priors. Meanwhile, leveraging multiple interacting policies can acquire more diverse exploratory signals, but existing approaches are typically decoupled from structured search. We propose \textbf{MARS$^2$} (Multi-Agent Reinforced Tree-Search Scaling), a unified RL framework in which multiple independently-optimized agents collaborate within a shared tree-structured search environment. MARS$^2$ models the search tree as a learnable multi-agent interaction environment, enabling heterogeneous agents to collaboratively generate and refine candidate solutions within a shared search topology. To support effective learning, we introduce a path-level group advantage formulation based on tree-consistent reward shaping, which facilitates effective credit assignment across complex search trajectories. Experiments on code generation benchmarks show that MARS$^2$ consistently improves performance across diverse model combinations and training settings, demonstrating the effectiveness of coupling multi-agent collaboration with tree search for enhancing reinforcement learning. Our code is publicly available at https://github.com/TsinghuaC3I/MARTI.

CVJun 29, 2023
Boosting the Generalization Ability for Hyperspectral Image Classification using Spectral-spatial Axial Aggregation Transformer

Enzhe Zhao, Zhichang Guo, Shengzhu Shi et al.

In the hyperspectral image classification (HSIC) task, the most commonly used model validation paradigm is partitioning the training-test dataset through pixel-wise random sampling. By training on a small amount of data, the deep learning model can achieve almost perfect accuracy. However, in our experiments, we found that the high accuracy was reached because the training and test datasets share a lot of information. On non-overlapping dataset partitions, well-performing models suffer significant performance degradation. To this end, we propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, that preserves generalization across dataset partitions. SaaFormer applies a multi-level spectral extraction structure to segment the spectrum into multiple spectrum clips, such that the wavelength continuity of the spectrum across the channel are preserved. For each spectrum clip, the axial aggregation attention mechanism, which integrates spatial features along multiple spectral axes is applied to mine the spectral characteristic. The multi-level spectral extraction and the axial aggregation attention emphasize spectral characteristic to improve the model generalization. The experimental results on five publicly available datasets demonstrate that our model exhibits comparable performance on the random partition, while significantly outperforming other methods on non-overlapping partitions. Moreover, SaaFormer shows excellent performance on background classification.

CVMay 23
SILSM: A Sustainable Interactive Level Set Method for Progressive Refinement

Jiachen Song, Dazhi Zhang, Fanghui Song et al.

Interactive segmentation aims to precisely isolate target objects using sparse user guidance. However, traditional methods often suffer from heavy interaction burdens and parameter sensitivity, while deep learning approaches struggle with data dependency and iterative instability. Motivated by these limitations, we propose the Sustainable Interactive Level Set Method (SILSM). The proposed level set evolution equation incorporates interaction, regularization, and segmentation terms. Specifically, high-order regularization is employed to maintain numerical stability, and unlike traditional methods, we decouple user guidance into an independent interaction term to enable direct manual control over the zero-level set evolution. Furthermore, we develop a numerical algorithm tailored for multiple interactions, which facilitates dynamic refinement by effectively updating the segmentation results based on sequential user inputs. We theoretically demonstrate that the high-order term provides stronger regularization constraints than the conventional length term, while the interaction term ensures segmentation strictly within the user-selected region. Experimental results further demonstrate that the proposed method is robust to interactive inputs, achieves competitive performance at the first interaction, and supports stable multi-round interactions with progressively improved segmentation quality.

LGFeb 19, 2023
Stationary Point Losses for Robust Model

Weiwei Gao, Dazhi Zhang, Yao Li et al.

The inability to guarantee robustness is one of the major obstacles to the application of deep learning models in security-demanding domains. We identify that the most commonly used cross-entropy (CE) loss does not guarantee robust boundary for neural networks. CE loss sharpens the neural network at the decision boundary to achieve a lower loss, rather than pushing the boundary to a more robust position. A robust boundary should be kept in the middle of samples from different classes, thus maximizing the margins from the boundary to the samples. We think this is due to the fact that CE loss has no stationary point. In this paper, we propose a family of new losses, called stationary point (SP) loss, which has at least one stationary point on the correct classification side. We proved that robust boundary can be guaranteed by SP loss without losing much accuracy. With SP loss, larger perturbations are required to generate adversarial examples. We demonstrate that robustness is improved under a variety of adversarial attacks by applying SP loss. Moreover, robust boundary learned by SP loss also performs well on imbalanced datasets.

CVOct 13, 2023
Re-initialization-free Level Set Method via Molecular Beam Epitaxy Equation Regularization for Image Segmentation

Fanghui Song, Jiebao Sun, Shengzhu Shi et al.

Variational level set method has become a powerful tool in image segmentation due to its ability to handle complex topological changes and maintain continuity and smoothness in the process of evolution. However its evolution process can be unstable, which results in over flatted or over sharpened contours and segmentation failure. To improve the accuracy and stability of evolution, we propose a high-order level set variational segmentation method integrated with molecular beam epitaxy (MBE) equation regularization. This method uses the crystal growth in the MBE process to limit the evolution of the level set function, and thus can avoid the re-initialization in the evolution process and regulate the smoothness of the segmented curve. It also works for noisy images with intensity inhomogeneity, which is a challenge in image segmentation. To solve the variational model, we derive the gradient flow and design scalar auxiliary variable (SAV) scheme coupled with fast Fourier transform (FFT), which can significantly improve the computational efficiency compared with the traditional semi-implicit and semi-explicit scheme. Numerical experiments show that the proposed method can generate smooth segmentation curves, retain fine segmentation targets and obtain robust segmentation results of small objects. Compared to existing level set methods, this model is state-of-the-art in both accuracy and efficiency.

LGNov 12, 2025
PDAC: Efficient Coreset Selection for Continual Learning via Probability Density Awareness

Junqi Gao, Zhichang Guo, Dazhi Zhang et al.

Rehearsal-based Continual Learning (CL) maintains a limited memory buffer to store replay samples for knowledge retention, making these approaches heavily reliant on the quality of the stored samples. Current Rehearsal-based CL methods typically construct the memory buffer by selecting a representative subset (referred to as coresets), aiming to approximate the training efficacy of the full dataset with minimal storage overhead. However, mainstream Coreset Selection (CS) methods generally formulate the CS problem as a bi-level optimization problem that relies on numerous inner and outer iterations to solve, leading to substantial computational cost thus limiting their practical efficiency. In this paper, we aim to provide a more efficient selection logic and scheme for coreset construction. To this end, we first analyze the Mean Squared Error (MSE) between the buffer-trained model and the Bayes-optimal model through the perspective of localized error decomposition to investigate the contribution of samples from different regions to MSE suppression. Further theoretical and experimental analyses demonstrate that samples with high probability density play a dominant role in error suppression. Inspired by this, we propose the Probability Density-Aware Coreset (PDAC) method. PDAC leverages the Projected Gaussian Mixture (PGM) model to estimate each sample's joint density, enabling efficient density-prioritized buffer selection. Finally, we introduce the streaming Expectation Maximization (EM) algorithm to enhance the adaptability of PGM parameters to streaming data, yielding Streaming PDAC (SPDAC) for streaming scenarios. Extensive comparative experiments show that our methods outperforms other baselines across various CL settings while ensuring favorable efficiency.

LGJun 4, 2025Code
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration

Junqi Gao, Zhichang Guo, Dazhi Zhang et al.

Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2) fixed data allocation proportions across domains, failing to dynamically adjust according to the target LLM's varying capabilities across domains, leading to a capability imbalance. To overcome these limitations, we propose Bohdi, a synthetic-data-only heterogeneous LLM fusion framework. Through the organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation through multi-model collaboration, thereby comprehensively extracting knowledge from source LLMs. By formalizing domain expansion and data sampling proportion allocation on the knowledge tree as a Hierarchical Multi-Armed Bandit problem, Bohdi leverages the designed DynaBranches mechanism to adaptively adjust sampling proportions based on the target LLM's performance feedback across domains. Integrated with our proposed Introspection-Rebirth (IR) mechanism, DynaBranches dynamically tracks capability shifts during target LLM's updates via Sliding Window Binomial Likelihood Ratio Testing (SWBLRT), further enhancing its online adaptation capability. Comparative experimental results on a comprehensive suite of benchmarks demonstrate that Bohdi significantly outperforms existing baselines on multiple target LLMs, exhibits higher data efficiency, and virtually eliminates the imbalance in the target LLM's capabilities. Our code is available at https://github.com/gjq100/Bohdi.git.

LGJun 8, 2024Code
Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

Junqi Gao, Biqing Qi, Yao Li et al.

The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios. Further theoretical and experimental verification demonstrates that easy samples with low loss are more likely to be located in HSDR. Perturbations towards such easy samples in the target class can avoid density estimation for HSDR location. Based on the above facts, we verified that adding perturbations to easy samples in the target class improves targeted adversarial transferability of existing attack methods. A generative targeted attack strategy named Easy Sample Matching Attack (ESMA) is proposed, which has a higher success rate for targeted attacks and outperforms the SOTA generative method. Moreover, ESMA requires only 5% of the storage space and much less computation time comparing to the current SOTA, as ESMA attacks all classes with only one model instead of seperate models for each class. Our code is available at https://github.com/gjq100/ESMA.

LGApr 30
Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

Junqi Gao, Dazhi Zhang, Zhichang Guo et al.

Model merging has attracted attention as an effective path toward multi-task adaptation by integrating knowledge from multiple task-specific models. Among existing approaches, dynamic merging mitigates performance degradation caused by conflicting parameter updates across tasks by flexibly combining task-specific parameters at inference time, thereby maintaining high performance. However, these methods require storing independent parameters for each task, resulting in prohibitive storage overhead. To address this issue, we first experimentally demonstrate that the fine-tuned weight increments (referred to as task vectors) exhibit an impulse-like activation pattern and high robustness to low-bit representations. Driven by this insight, we propose T-Switch, which decomposes task vectors into three compact components: a binary sparse mask, a sign vector, and a scalar scaling factor, achieving high-fidelity approximation at high compression ratios. We then introduce Auto-Switch, a training-free merging scheme that automatically composes task vectors via feature similarity retrieval. Building on this, we develop Auto-Switch, a training-free merging scheme that automatically assembles task vectors through feature similarity retrieval. Furthermore, to transform task vector sparsification and quantization from static rules to adaptive learning, we propose FlexSwitch, a learnable framework which jointly optimizes the compression strategy for each model unit via Learnable Gating Sparsification (LGS) and Bit-width Adaptive Selection (BAS), while employing the Sparsity-Aware Storage Strategy (SASS) to select the optimal storage encoding structure. Finally, by incorporating a K-Nearest Neighbor (KNN) inference scheme with a learnable low-rank metric, we present Auto-FlexSwitch, a dynamic model merging approach that supports highly efficient task vector compression.