99.8AIMar 23Code
Counterfactual Credit Policy Optimization for Multi-Agent CollaborationZhongyi Li, Wan Tian, Yikun Ban et al.
Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles and aggregating diverse hypotheses. Yet, reinforcement learning (RL) for such systems is often undermined by credit assignment: a shared global reward obscures individual contributions, inflating update variance and encouraging free-riding. We introduce Counterfactual Credit Policy Optimization (CCPO), a framework that assigns agent-specific learning signals by estimating each agent's marginal contribution through counterfactual trajectories. CCPO builds dynamic counterfactual baselines that simulate outcomes with an agent's contribution removed, yielding role-sensitive advantages for policy optimization. To further improve stability under heterogeneous tasks and data distributions, we propose a global-history-aware normalization scheme that calibrates advantages using global rollout statistics. We evaluate CCPO on two collaboration topologies: a sequential Think--Reason dyad and multi-agent voting. Across mathematical and logical reasoning benchmarks, CCPO mitigates free-riding and outperforms strong multi-agent RL baselines, yielding finer-grained and more effective credit assignment for collaborative LLM training. Our code is available at https://github.com/bhai114/ccpo.
MLMar 13, 2023
Tight Non-asymptotic Inference via Sub-Gaussian Intrinsic Moment NormHuiming Zhang, Haoyu Wei, Guang Cheng
In non-asymptotic learning, variance-type parameters of sub-Gaussian distributions are of paramount importance. However, directly estimating these parameters using the empirical moment generating function (MGF) is infeasible. To address this, we suggest using the sub-Gaussian intrinsic moment norm [Buldygin and Kozachenko (2000), Theorem 1.3] achieved by maximizing a sequence of normalized moments. Significantly, the suggested norm can not only reconstruct the exponential moment bounds of MGFs but also provide tighter sub-Gaussian concentration inequalities. In practice, we provide an intuitive method for assessing whether data with a finite sample size is sub-Gaussian, utilizing the sub-Gaussian plot. The intrinsic moment norm can be robustly estimated via a simple plug-in approach. Our theoretical findings are also applicable to reinforcement learning, including the multi-armed bandit scenario.
68.7GRMay 22
DrawVideo: Generating Long Video from Storyboard Keyframe SketchesChuanzhi Xu, Huiqi Liang, Bang Shi et al.
Long video generation requires high-fidelity synthesis, coherent narrative structure, and user control over extended time spans. Existing text-to-video methods often rely on a single long prompt, limiting control over pose, composition, layout, and motion. We propose DrawVideo, a sketch-guided, storyboard-driven framework for controllable long-video generation. DrawVideo decomposes long videos into independently controllable shots, each defined by a black-and-white sketch, an appearance prompt, and a motion prompt. The sketch controls pose and layout, the appearance prompt defines identity, scene, and style, and the motion prompt guides temporal dynamics. DrawVideo follows a hierarchical 'global multi-shot, local single-sketch' strategy: it first generates a structure-aligned reference keyframe, then expands the motion prompt into derivative keyframes representing action states, and finally synthesizes clips between adjacent keyframes to build each shot. We also introduce SketchLongVideo, the first dataset for sketch-guided text-to-long-video generation, constructed from animation videos via shot detection, keyframe extraction, vision-language recognition, prompt decomposition, and sketch conversion. Experiments show that DrawVideo achieves strong structural controllability, appearance consistency, visual stability, and coherent long-video generation.
MLDec 28, 2022
Distribution Estimation of Contaminated Data via DNN-based MoM-GANsFang Xie, Lihu Xu, Qiuran Yao et al.
This paper studies the distribution estimation of contaminated data by the MoM-GAN method, which combines generative adversarial net (GAN) and median-of-mean (MoM) estimation. We use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. Theoretically, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator measured by integral probability metrics with the $b$-smoothness Hölder class. The error bound decreases essentially as $n^{-b/p}\vee n^{-1/2}$, where $n$ and $p$ are the sample size and the dimension of input data. We give an algorithm for the MoM-GAN method and implement it through two real applications. The numerical results show that the MoM-GAN outperforms other competitive methods when dealing with contaminated data.
78.7AIMar 23
Adaptive Robust Estimator for Multi-Agent Reinforcement LearningZhongyi Li, Wan Tian, Jingyu Chen et al.
Multi-agent collaboration has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models, yet it suffers from interaction-level ambiguity that blurs generation, critique, and revision, making credit assignment across agents difficult. Moreover, policy optimization in this setting is vulnerable to heavy-tailed and noisy rewards, which can bias advantage estimation and trigger unstable or even divergent training. To address both issues, we propose a robust multi-agent reinforcement learning framework for collaborative reasoning, consisting of two components: Dual-Agent Answer-Critique-Rewrite (DACR) and an Adaptive Robust Estimator (ARE). DACR decomposes reasoning into a structured three-stage pipeline: answer, critique, and rewrite, while enabling explicit attribution of each agent's marginal contribution to its partner's performance. ARE provides robust estimation of batch experience means during multi-agent policy optimization. Across mathematical reasoning and embodied intelligence benchmarks, even under noisy rewards, our method consistently outperforms the baseline in both homogeneous and heterogeneous settings. These results indicate stronger robustness to reward noise and more stable training dynamics, effectively preventing optimization failures caused by noisy reward signals.
73.5MLApr 12
Tail-Aware Information-Theoretic Generalization for RLHF and SGLDHuiming Zhang, Binghan Li, Wan Tian et al.
Classical information-theoretic generalization bounds typically control the generalization gap through KL-based mutual information and therefore rely on boundedness or sub-Gaussian tails via the moment generating function (MGF). In many modern pipelines, such as robust learning, RLHF, and stochastic optimization, losses and rewards can be heavy-tailed, and MGFs may not exist, rendering KL-based tools ineffective. We develop a tail-dependent information-theoretic framework for sub-Weibull data, where the tail parameter $θ$ controls the tail heaviness: $θ=2$ corresponds to sub-Gaussian, $θ=1$ to sub-exponential, and $0<θ<1$ to genuinely heavy tails. Our key technical ingredient is a decorrelation lemma that bounds change-of-measure expectations using a shifted-log $f_θ$-divergence, which admits explicit comparisons to Rényi divergence without MGF arguments. On the empirical-process side, we establish sharp maximal inequalities and a Dudley-type chaining bound for sub-Weibull processes with tail index $θ$, with complexity scaling as $\log^{1/θ}$ and entropy$^{1/θ}$. These tools yield expected and high-probability PAC-Bayes generalization bounds, as well as an information-theoretic chaining inequality based on multiscale Rényi mutual information. We illustrate the consequences in Rényi-regularized RLHF under heavy-tailed rewards and in stochastic gradient Langevin dynamics with heavy-tailed gradient noise.
48.2LGMar 23
Sharper Generalization Bounds for TransformerYawen Li, Tao Hu, Zhouhui Lian et al.
This paper studies generalization error bounds for Transformer models. Based on the offset Rademacher complexity, we derive sharper generalization bounds for different Transformer architectures, including single-layer single-head, single-layer multi-head, and multi-layer Transformers. We first express the excess risk of Transformers in terms of the offset Rademacher complexity. By exploiting its connection with the empirical covering numbers of the corresponding hypothesis spaces, we obtain excess risk bounds that achieve optimal convergence rates up to constant factors. We then derive refined excess risk bounds by upper bounding the covering numbers of Transformer hypothesis spaces using matrix ranks and matrix norms, leading to precise, architecture-dependent generalization bounds. Finally, we relax the boundedness assumption on feature mappings and extend our theoretical results to settings with unbounded (sub-Gaussian) features and heavy-tailed distributions.
MLDec 3, 2024
Selective Reviews of Bandit Problems in AI via a Statistical ViewPengjie Zhou, Haoyu Wei, Huiming Zhang
Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration-exploitation trade-offs. Additionally, we explore K-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.
MLFeb 11, 2022
High-dimensional Inference and FDR Control for Simulated Markov Random FieldsHaoyu Wei, Xiaoyu Lei, Yixin Han et al.
Identifying important features linked to a response variable is a fundamental task in various scientific domains. This article explores statistical inference for simulated Markov random fields in high-dimensional settings. We introduce a methodology based on Markov Chain Monte Carlo Maximum Likelihood Estimation (MCMC-MLE) with Elastic-net regularization. Under mild conditions on the MCMC method, our penalized MCMC-MLE method achieves $\ell_{1}$-consistency. We propose a decorrelated score test, establishing both its asymptotic normality and that of a one-step estimator, along with the associated confidence interval. Furthermore, we construct two false discovery rate control procedures via the asymptotic behaviors for both p-values and e-values. Comprehensive numerical simulations confirm the theoretical validity of the proposed methods.
MLJan 10, 2022
Non-Asymptotic Guarantees for Robust Statistical Learning under Infinite Variance AssumptionLihu Xu, Fang Yao, Qiuran Yao et al.
There has been a surge of interest in developing robust estimators for models with heavy-tailed and bounded variance data in statistics and machine learning, while few works impose unbounded variance. This paper proposes two type of robust estimators, the ridge log-truncated M-estimator and the elastic net log-truncated M-estimator. The first estimator is applied to convex regressions such as quantile regression and generalized linear models, while the other one is applied to high dimensional non-convex learning problems such as regressions via deep neural networks. Simulations and real data analysis demonstrate the {robustness} of log-truncated estimations over standard estimations.
CVOct 22, 2021
Adaptive Fusion Affinity Graph with Noise-free Online Low-rank Representation for Natural Image SegmentationYang Zhang, Moyun Liu, Huiming Zhang et al.
Affinity graph-based segmentation methods have become a major trend in computer vision. The performance of these methods relies on the constructed affinity graph, with particular emphasis on the neighborhood topology and pairwise affinities among superpixels. Due to the advantages of assimilating different graphs, a multi-scale fusion graph has a better performance than a single graph with single-scale. However, these methods ignore the noise from images which influences the accuracy of pairwise similarities. Multi-scale combinatorial grouping and graph fusion also generate a higher computational complexity. In this paper, we propose an adaptive fusion affinity graph (AFA-graph) with noise-free low-rank representation in an online manner for natural image segmentation. An input image is first over-segmented into superpixels at different scales and then filtered by the proposed improved kernel density estimation method. Moreover, we select global nodes of these superpixels on the basis of their subspace-preserving presentation, which reveals the feature distribution of superpixels exactly. To reduce time complexity while improving performance, a sparse representation of global nodes based on noise-free online low-rank representation is used to obtain a global graph at each scale. The global graph is finally used to update a local graph which is built upon all superpixels at each scale. Experimental results on the BSD300, BSD500, MSRC, SBD, and PASCAL VOC show the effectiveness of AFA-graph in comparison with state-of-the-art approaches.
STMay 2, 2021
Directional FDR Control for Sub-Gaussian Sparse GLMsChang Cui, Jinzhu Jia, Yijun Xiao et al.
High-dimensional sparse generalized linear models (GLMs) have emerged in the setting that the number of samples and the dimension of variables are large, and even the dimension of variables grows faster than the number of samples. False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results after getting the sparse penalized estimation of GLMs. Using the CLIME method for precision matrix estimations, we construct the debiased-Lasso estimator and prove the asymptotical normality by minimax-rate oracle inequalities for sparse GLMs. In practice, it is often needed to accurately judge each regression coefficient's positivity and negativity, which determines whether the predictor variable is positively or negatively related to the response variable conditionally on the rest variables. Using the debiased estimator, we establish multiple testing procedures. Under mild conditions, we show that the proposed debiased statistics can asymptotically control the directional (sign) FDR and directional false discovery variables at a pre-specified significance level. Moreover, it can be shown that our multiple testing procedure can approximately achieve a statistical power of 1. We also extend our methods to the two-sample problems and propose the two-sample test statistics. Under suitable conditions, we can asymptotically achieve directional FDR control and directional FDV control at the specified significance level for two-sample problems. Some numerical simulations have successfully verified the FDR control effects of our proposed testing procedures, which sometimes outperforms the classical knockoff method.
STFeb 4, 2021
Sharper Sub-Weibull ConcentrationsHuiming Zhang, Haoyu Wei
Constant-specified and exponential concentration inequalities play an essential role in the finite-sample theory of machine learning and high-dimensional statistics area. We obtain sharper and constants-specified concentration inequalities for the sum of independent sub-Weibull random variables, which leads to a mixture of two tails: sub-Gaussian for small deviations and sub-Weibull for large deviations from the mean. These bounds are new and improve existing bounds with sharper constants. In addition, a new sub-Weibull parameter if the italic should be retained. Please check the whole text. is also proposed, which enables recovering the tight concentration inequality for a random variable (vector). For statistical applications, we give an $\ell_2$-error of estimated coefficients in negative binomial regressions when the heavy-tailed covariates are sub-Weibull distributed with sparse structures, which is a new result for negative binomial regressions. In applying random matrices, we derive non-asymptotic versions of Bai-Yin's theorem for sub-Weibull entries with exponential tail bounds. Finally, by demonstrating a sub-Weibull confidence region for a log-truncated Z-estimator without the second-moment condition, we discuss and define the sub-Weibull type robust estimator for independent observations $\{X_i\}_{i=1}^{n}$ without exponential-moment conditions.
CVJan 31, 2021
A Unified Light Framework for Real-time Fault Detection of Freight Train ImagesYang Zhang, Moyun Liu, Yang Yang et al.
Real-time fault detection for freight trains plays a vital role in guaranteeing the security and optimal operation of railway transportation under stringent resource requirements. Despite the promising results for deep learning based approaches, the performance of these fault detectors on freight train images, are far from satisfactory in both accuracy and efficiency. This paper proposes a unified light framework to improve detection accuracy while supporting a real-time operation with a low resource requirement. We firstly design a novel lightweight backbone (RFDNet) to improve the accuracy and reduce computational cost. Then, we propose a multi region proposal network using multi-scale feature maps generated from RFDNet to improve the detection performance. Finally, we present multi level position-sensitive score maps and region of interest pooling to further improve accuracy with few redundant computations. Extensive experimental results on public benchmark datasets suggest that our RFDNet can significantly improve the performance of baseline network with higher accuracy and efficiency. Experiments on six fault datasets show that our method is capable of real-time detection at over 38 frames per second and achieves competitive accuracy and lower computation than the state-of-the-art detectors.
STNov 4, 2020
Concentration Inequalities for Statistical InferenceHuiming Zhang, Song Xi Chen
This paper gives a review of concentration inequalities which are widely employed in non-asymptotical analyses of mathematical statistics in a wide range of settings, from distribution-free to distribution-dependent, from sub-Gaussian to sub-exponential, sub-Gamma, and sub-Weibull random variables, and from the mean to the maximum concentration. This review provides results in these settings with some fresh new results. Given the increasing popularity of high-dimensional data and inference, results in the context of high-dimensional linear and Poisson regressions are also provided. We aim to illustrate the concentration inequalities with known constants and to improve existing bounds with sharper constants.
STSep 10, 2020
Non-asymptotic Optimal Prediction Error for Growing-dimensional Partially Functional Linear ModelsHuiming Zhang, Xiaoyu Lei
Under the reproducing kernel Hilbert spaces (RKHS), we consider the penalized least-squares of the partially functional linear models (PFLM), whose predictor contains both functional and traditional multivariate parts, and the multivariate part allows a divergent number of parameters. From the non-asymptotic point of view, we focus on the rate-optimal upper and lower bounds of the prediction error. An exact upper bound for the excess prediction risk is shown in a non-asymptotic form under a more general assumption known as the effective dimension to the model, by which we also show the prediction consistency when the number of multivariate covariates $p$ slightly increases with the sample size $n$. Our new finding implies a trade-off between the number of non-functional predictors and the effective dimension of the kernel principal components to ensure prediction consistency in the increasing-dimensional setting. The analysis in our proof hinges on the spectral condition of the sandwich operator of the covariance operator and the reproducing kernel, and on sub-Gaussian and Berstein concentration inequalities for the random elements in Hilbert space. Finally, we derive the non-asymptotic minimax lower bound under the regularity assumption of the Kullback-Leibler divergence of the models.
MLJun 11, 2020
Weighted Lasso Estimates for Sparse Logistic Regression: Non-asymptotic Properties with Measurement ErrorHuamei Huang, Yujing Gao, Huiming Zhang et al.
When we are interested in high-dimensional system and focus on classification performance, the $\ell_{1}$-penalized logistic regression is becoming important and popular. However, the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data. We proposed two types of weighted Lasso estimates depending on covariates by the McDiarmid inequality. Given sample size $n$ and dimension of covariates $p$, the finite sample behavior of our proposed methods with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as $\ell_{1}$-estimation error and squared prediction error of the unknown parameters. We compare the performance of our methods with former weighted estimates on simulated data, then apply these methods to do real data analysis.
MEMay 21, 2020
Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive DataJun Yu, HaiYing Wang, Mingyao Ai et al.
Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the data volume is so large that nonuniform subsampling probabilities cannot be calculated all at once, then subsampling with replacement is infeasible to implement. This paper solves this problem using Poisson subsampling. We first derive optimal Poisson subsampling probabilities in the context of quasi-likelihood estimation under the A- and L-optimality criteria. For a practically implementable algorithm with approximated optimal subsampling probabilities, we establish the consistency and asymptotic normality of the resultant estimators. To deal with the situation that the full data are stored in different blocks or at multiple locations, we develop a distributed subsampling framework, in which statistics are computed simultaneously on smaller partitions of the full data. Asymptotic properties of the resultant aggregated estimator are investigated. We illustrate and evaluate the proposed strategies through numerical experiments on simulated and real data sets.
STNov 14, 2019
Sparse Density Estimation with Measurement ErrorsXiaowei Yang, Huiming Zhang, Haoyu Wei et al.
This paper aims to build an estimate of an unknown density of the data with measurement error as a linear combination of functions from a dictionary. Inspired by the penalization approach, we propose the weighted Elastic-net penalized minimal $\ell_2$-distance method for sparse coefficients estimation, where the adaptive weights come from sharp concentration inequalities. The optimal weighted tuning parameters are obtained by the first-order conditions holding with a high probability. Under local coherence or minimal eigenvalue assumptions, non-asymptotical oracle inequalities are derived. These theoretical results are transposed to obtain the support recovery with a high probability. Then, some numerical experiments for discrete and continuous distributions confirm the significant improvement obtained by our procedure when compared with other conventional approaches. Finally, the application is performed in a meteorology data set. It shows that our method has potency and superiority of detecting the shape of multi-mode density compared with other conventional approaches.
MLDec 9, 2017
Elastic-net Regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals DetectionHuiming Zhang, Jinzhu Jia
We study a sparse negative binomial regression (NBR) for count data by showing the non-asymptotic advantages of using the elastic-net estimator. Two types of oracle inequalities are derived for the NBR's elastic-net estimates by using the Compatibility Factor Condition and the Stabil Condition. The second type of oracle inequality is for the random design and can be extended to many $\ell_1 + \ell_2$ regularized M-estimations, with the corresponding empirical process having stochastic Lipschitz properties. We derive the concentration inequality for the suprema empirical processes for the weighted sum of negative binomial variables to show some high--probability events. We apply the method by showing the sign consistency, provided that the nonzero components in the true sparse vector are larger than a proper choice of the weakest signal detection threshold. In the second application, we show the grouping effect inequality with high probability. Third, under some assumptions for a design matrix, we can recover the true variable set with a high probability if the weakest signal detection threshold is large than the turning parameter up to a known constant. Lastly, we briefly discuss the de-biased elastic-net estimator, and numerical studies are given to support the proposal.