LGApr 3, 2023
SparDL: Distributed Deep Learning Training with Efficient Sparse CommunicationMinjun Zhao, Yichen Yin, Yuren Mao et al.
Top-k sparsification has recently been widely used to reduce the communication volume in distributed deep learning. However, due to the Sparse Gradient Accumulation (SGA) dilemma, the performance of top-k sparsification still has limitations. Recently, a few methods have been put forward to handle the SGA dilemma. Regrettably, even the state-of-the-art method suffers from several drawbacks, e.g., it relies on an inefficient communication algorithm and requires extra transmission steps. Motivated by the limitations of existing methods, we propose a novel efficient sparse communication framework, called SparDL. Specifically, SparDL uses the Spar-Reduce-Scatter algorithm, which is based on an efficient Reduce-Scatter model, to handle the SGA dilemma without additional communication operations. Besides, to further reduce the latency cost and improve the efficiency of SparDL, we propose the Spar-All-Gather algorithm. Moreover, we propose the global residual collection algorithm to ensure fast convergence of model training. Finally, extensive experiments are conducted to validate the superiority of SparDL.
CLDec 31, 2025
Adaptive Dependency-aware Prompt Optimization Framework for Multi-Step LLM PipelineMinjun Zhao, Xinyu Zhang, Shuai Zhang et al.
Multi-step LLM pipelines invoke large language models multiple times in a structured sequence and can effectively solve complex tasks, but their performance heavily depends on the prompts used at each step. Jointly optimizing these prompts is difficult due to missing step-level supervision and inter-step dependencies. Existing end-to-end prompt optimization methods struggle under these conditions and often yield suboptimal or unstable updates. We propose ADOPT, an Adaptive Dependency-aware Prompt Optimization framework for multi-step LLM pipelines. ADOPT explicitly models the dependency between each LLM step and the final task outcome, enabling precise text-gradient estimation analogous to computing analytical derivatives. It decouples textual gradient estimation from gradient updates, reducing multi-prompt optimization to flexible single-prompt optimization steps, and employs a Shapley-based mechanism to adaptively allocate optimization resources. Experiments on real-world datasets and diverse pipeline structures show that ADOPT is effective and robust, consistently outperforming state-of-the-art prompt optimization baselines.
CLJun 17, 2025
Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shotXiang Cheng, Chengyan Pan, Minjun Zhao et al.
In-Context Learning (ICL) is an essential emergent ability of Large Language Models (LLMs), and recent studies introduce Chain-of-Thought (CoT) to exemplars of ICL to enhance the reasoning capability, especially in mathematics tasks. However, given the continuous advancement of model capabilities, it remains unclear whether CoT exemplars still benefit recent, stronger models in such tasks. Through systematic experiments, we find that for recent strong models such as the Qwen2.5 series, adding traditional CoT exemplars does not improve reasoning performance compared to Zero-Shot CoT. Instead, their primary function is to align the output format with human expectations. We further investigate the effectiveness of enhanced CoT exemplars, constructed using answers from advanced models such as \texttt{Qwen2.5-Max} and \texttt{DeepSeek-R1}. Experimental results indicate that these enhanced exemplars still fail to improve the model's reasoning performance. Further analysis reveals that models tend to ignore the exemplars and focus primarily on the instructions, leading to no observable gain in reasoning ability. Overall, our findings highlight the limitations of the current ICL+CoT framework in mathematical reasoning, calling for a re-examination of the ICL paradigm and the definition of exemplars.
LGOct 13, 2021
Finding Materialized Models for Model ReuseMinjun Zhao, Lu Chen, Keyu Yang et al.
Materialized model query aims to find the most appropriate materialized model as the initial model for model reuse. It is the precondition of model reuse, and has recently attracted much attention. {Nonetheless, the existing methods suffer from the need to provide source data, limited range of applications, and inefficiency since they do not construct a suitable metric to measure the target-related knowledge of materialized models. To address this, we present \textsf{MMQ}, a source-data free, general, efficient, and effective materialized model query framework.} It uses a Gaussian mixture-based metric called separation degree to rank materialized models. For each materialized model, \textsf{MMQ} first vectorizes the samples in the target dataset into probability vectors by directly applying this model, then utilizes Gaussian distribution to fit for each class of probability vectors, and finally uses separation degree on the Gaussian distributions to measure the target-related knowledge of the materialized model. Moreover, we propose an improved \textsf{MMQ} (\textsf{I-MMQ}), which significantly reduces the query time while retaining the query performance of \textsf{MMQ}. Extensive experiments on a range of practical model reuse workloads demonstrate the effectiveness and efficiency of \textsf{MMQ}.