AIJul 24, 2024Code
LAMBDA: A Large Model Based Data AgentMaojun Sun, Ruijian Han, Binyan Jiang et al.
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large language models. LAMBDA is designed to address data analysis challenges in data-driven applications through innovatively designed data agents using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user's instructions and domain-specific knowledge, while the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention. Moreover, LAMBDA can flexibly integrate external models and algorithms through our proposed Knowledge Integration Mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various data analysis tasks. It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for users from diverse backgrounds. The strong performance of LAMBDA in solving data analysis problems is demonstrated using real-world data examples. The code for LAMBDA is available at https://github.com/AMA-CMFAI/LAMBDA and videos of three case studies can be viewed at https://www.polyu.edu.hk/ama/cmfai/lambda.html.
LGMar 29, 2023
Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery GuaranteesZiwen Wang, Yancheng Yuan, Jiaming Ma et al.
In this paper, we propose a randomly projected convex clustering model for clustering a collection of $n$ high dimensional data points in $\mathbb{R}^d$ with $K$ hidden clusters. Compared to the convex clustering model for clustering original data with dimension $d$, we prove that, under some mild conditions, the perfect recovery of the cluster membership assignments of the convex clustering model, if exists, can be preserved by the randomly projected convex clustering model with embedding dimension $m = O(ε^{-2}\log(n))$, where $0 < ε< 1$ is some given parameter. We further prove that the embedding dimension can be improved to be $O(ε^{-2}\log(K))$, which is independent of the number of data points. Extensive numerical experiment results will be presented in this paper to demonstrate the robustness and superior performance of the randomly projected convex clustering model. The numerical results presented in this paper also demonstrate that the randomly projected convex clustering model can outperform the randomly projected K-means model in practice.
LGDec 3, 2022
Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated Learning FrameworkShuai Wang, Yanqing Xu, Zhiguo Wang et al.
As a novel distributed learning paradigm, federated learning (FL) faces serious challenges in dealing with massive clients with heterogeneous data distribution and computation and communication resources. Various client-variance-reduction schemes and client sampling strategies have been respectively introduced to improve the robustness of FL. Among others, primal-dual algorithms such as the alternating direction of method multipliers (ADMM) have been found being resilient to data distribution and outperform most of the primal-only FL algorithms. However, the reason behind remains a mystery still. In this paper, we firstly reveal the fact that the federated ADMM is essentially a client-variance-reduced algorithm. While this explains the inherent robustness of federated ADMM, the vanilla version of it lacks the ability to be adaptive to the degree of client heterogeneity. Besides, the global model at the server under client sampling is biased which slows down the practical convergence. To go beyond ADMM, we propose a novel primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model. In addition, FedVRA unifies several representative FL algorithms in the sense that they are either special instances of FedVRA or are close to it. Extensions of FedVRA to semi/un-supervised learning are also presented. Experiments based on (semi-)supervised image classification tasks demonstrate superiority of FedVRA over the existing schemes in learning scenarios with massive heterogeneous clients and client sampling.
AIJan 20
DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science ProblemsMaojun Sun, Yifei Xie, Yue Wu et al.
Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers, poses a significant challenge for evaluation. To address this, we introduce DSAEval, a benchmark comprising 641 real-world data science problems grounded in 285 diverse datasets, covering both structured and unstructured data (e.g., vision and text). DSAEval incorporates three distinctive features: (1) Multimodal Environment Perception, which enables agents to interpret observations from multiple modalities including text and vision; (2) Multi-Query Interactions, which mirror the iterative and cumulative nature of real-world data science projects; and (3) Multi-Dimensional Evaluation, which provides a holistic assessment across reasoning, code, and results. We systematically evaluate 11 advanced agentic LLMs using DSAEval. Our results show that Claude-Sonnet-4.5 achieves the strongest overall performance, GPT-5.2 is the most efficient, and MiMo-V2-Flash is the most cost-effective. We further demonstrate that multimodal perception consistently improves performance on vision-related tasks, with gains ranging from 2.04% to 11.30%. Overall, while current data science agents perform well on structured data and routine data anlysis workflows, substantial challenges remain in unstructured domains. Finally, we offer critical insights and outline future research directions to advance the development of data science agents.
IRMar 5Code
DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware RetrievalMaojun Sun, Yue Wu, Yifei Xie et al.
Large Language Model (LLM) agents can automate data-science workflows, but many rigorous statistical methods implemented in R remain underused because LLMs struggle with statistical knowledge and tool retrieval. Existing retrieval-augmented approaches focus on function-level semantics and ignore data distribution, producing suboptimal matches. We propose DARE (Distribution-Aware Retrieval Embedding), a lightweight, plug-and-play retrieval model that incorporates data distribution information into function representations for R package retrieval. Our main contributions are: (i) RPKB, a curated R Package Knowledge Base derived from 8,191 high-quality CRAN packages; (ii) DARE, an embedding model that fuses distributional features with function metadata to improve retrieval relevance; and (iii) RCodingAgent, an R-oriented LLM agent for reliable R code generation and a suite of statistical analysis tasks for systematically evaluating LLM agents in realistic analytical scenarios. Empirically, DARE achieves an NDCG at 10 of 93.47%, outperforming state-of-the-art open-source embedding models by up to 17% on package retrieval while using substantially fewer parameters. Integrating DARE into RCodingAgent yields significant gains on downstream analysis tasks. This work helps narrow the gap between LLM automation and the mature R statistical ecosystem.
24.2CLMar 12
Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-ThoughtYuling Jiao, Yanming Lai, Huazhen Lin et al.
Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, exhibiting emergent properties such as semantic prompt comprehension, In-Context Learning (ICL), and Chain-of-Thought (CoT) reasoning. Despite their empirical success, the theoretical mechanisms driving these phenomena remain poorly understood. This study dives into the foundations of these observations by addressing three critical questions: (1) How do LLMs accurately decode prompt semantics despite being trained solely on a next-token prediction objective? (2) Through what mechanism does ICL facilitate performance gains without explicit parameter updates? and (3) Why do intermediate reasoning steps in CoT prompting effectively unlock capabilities for complex, multi-step problems? Our results demonstrate that, through the autoregressive process, LLMs are capable of exactly inferring the transition probabilities between tokens across distinct tasks using provided prompts. We show that ICL enhances performance by reducing prompt ambiguity and facilitating posterior concentration on the intended task. Furthermore, we find that CoT prompting activates the model's capacity for task decomposition, breaking complex problems into a sequence of simpler sub-tasks that the model has mastered during the pretraining phase. By comparing their individual error bounds, we provide novel theoretical insights into the statistical superiority of advanced prompt engineering techniques.
MLFeb 24
Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,λ}$ TargetsYanming Lai, Defeng Sun
The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard Transformers can approximate Hölder functions $ C^{s,λ}\left([0,1]^{d\times n}\right) $$ (s\in\mathbb{N}_{\geq0},0<λ\leq1) $ under the $L^t$ distance ($t \in [1, \infty]$) with arbitrary precision. Building upon this approximation result, we demonstrate that standard Transformers achieve the minimax optimal rate in nonparametric regression for Hölder target functions. It is worth mentioning that, by introducing two metrics: the size tuple and the dimension vector, we provide a fine-grained characterization of Transformer structures, which facilitates future research on the generalization and optimization errors of Transformers with different structures. As intermediate results, we also derive the upper bounds for the Lipschitz constant of standard Transformers and their memorization capacity, which may be of independent interest. These findings provide theoretical justification for the powerful capabilities of Transformer models.
AIDec 18, 2024
A Survey on Large Language Model-based Agents for Statistics and Data ScienceMaojun Sun, Ruijian Han, Binyan Jiang et al.
In recent years, data science agents powered by Large Language Models (LLMs), known as "data agents," have shown significant potential to transform the traditional data analysis paradigm. This survey provides an overview of the evolution, capabilities, and applications of LLM-based data agents, highlighting their role in simplifying complex data tasks and lowering the entry barrier for users without related expertise. We explore current trends in the design of LLM-based frameworks, detailing essential features such as planning, reasoning, reflection, multi-agent collaboration, user interface, knowledge integration, and system design, which enable agents to address data-centric problems with minimal human intervention. Furthermore, we analyze several case studies to demonstrate the practical applications of various data agents in real-world scenarios. Finally, we identify key challenges and propose future research directions to advance the development of data agents into intelligent statistical analysis software.
AIJan 11, 2024
Machine Learning Insides OptVerse AI Solver: Design Principles and ApplicationsXijun Li, Fangzhou Zhu, Hui-Ling Zhen et al.
In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional optimization techniques. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. Furthermore, we introduce a training framework leveraging augmentation policies to maintain solvers' utility in dynamic environments. Besides the data generation and augmentation, our proposed approaches also include novel ML-driven policies for personalized solver strategies, with an emphasis on applications like graph convolutional networks for initial basis selection and reinforcement learning for advanced presolving and cut selection. Additionally, we detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance. Compared with traditional solvers such as Cplex and SCIP, our ML-augmented OptVerse AI Solver demonstrates superior speed and precision across both established benchmarks and real-world scenarios, reinforcing the practical imperative and effectiveness of machine learning techniques in mathematical programming solvers.
OCJun 12, 2025
Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noiseChuan He, Zhaosong Lu, Defeng Sun et al.
In this paper, we propose practical normalized stochastic first-order methods with Polyak momentum, multi-extrapolated momentum, and recursive momentum for solving unconstrained optimization problems. These methods employ dynamically updated algorithmic parameters and do not require explicit knowledge of problem-dependent quantities such as the Lipschitz constant or noise bound. We establish first-order oracle complexity results for finding approximate stochastic stationary points under heavy-tailed noise and weakly average smoothness conditions -- both of which are weaker than the commonly used bounded variance and mean-squared smoothness assumptions. Our complexity bounds either improve upon or match the best-known results in the literature. Numerical experiments are presented to demonstrate the practical effectiveness of the proposed methods.
MLApr 16, 2025
Approximation Bounds for Transformer Networks with Application to RegressionYuling Jiao, Yanming Lai, Defeng Sun et al.
We explore the approximation capabilities of Transformer networks for Hölder and Sobolev functions, and apply these results to address nonparametric regression estimation with dependent observations. First, we establish novel upper bounds for standard Transformer networks approximating sequence-to-sequence mappings whose component functions are Hölder continuous with smoothness index $γ\in (0,1]$. To achieve an approximation error $\varepsilon$ under the $L^p$-norm for $p \in [1, \infty]$, it suffices to use a fixed-depth Transformer network whose total number of parameters scales as $\varepsilon^{-d_x n / γ}$. This result not only extends existing findings to include the case $p = \infty$, but also matches the best known upper bounds on number of parameters previously obtained for fixed-depth FNNs and RNNs. Similar bounds are also derived for Sobolev functions. Second, we derive explicit convergence rates for the nonparametric regression problem under various $β$-mixing data assumptions, which allow the dependence between observations to weaken over time. Our bounds on the sample complexity impose no constraints on weight magnitudes. Lastly, we propose a novel proof strategy to establish approximation bounds, inspired by the Kolmogorov-Arnold representation theorem. We show that if the self-attention layer in a Transformer can perform column averaging, the network can approximate sequence-to-sequence Hölder functions, offering new insights into the interpretability of self-attention mechanisms.
MLFeb 20, 2025
Distribution Matching for Self-Supervised Transfer LearningYuling Jiao, Wensen Ma, Defeng Sun et al.
In this paper, we propose a novel self-supervised transfer learning method called \underline{\textbf{D}}istribution \underline{\textbf{M}}atching (DM), which drives the representation distribution toward a predefined reference distribution while preserving augmentation invariance. DM results in a learned representation space that is intuitively structured and therefore easy to interpret. Experimental results across multiple real-world datasets and evaluation metrics demonstrate that DM performs competitively on target classification tasks compared to existing self-supervised transfer learning methods. Additionally, we provide robust theoretical guarantees for DM, including a population theorem and an end-to-end sample theorem. The population theorem bridges the gap between the self-supervised learning task and target classification accuracy, while the sample theorem shows that, even with a limited number of samples from the target domain, DM can deliver exceptional classification performance, provided the unlabeled sample size is sufficiently large.
LGOct 22, 2020
Learning Graph Laplacian with MCPYangjing Zhang, Kim-Chuan Toh, Defeng Sun
We consider the problem of learning a graph under the Laplacian constraint with a non-convex penalty: minimax concave penalty (MCP). For solving the MCP penalized graphical model, we design an inexact proximal difference-of-convex algorithm (DCA) and prove its convergence to critical points. We note that each subproblem of the proximal DCA enjoys the nice property that the objective function in its dual problem is continuously differentiable with a semismooth gradient. Therefore, we apply an efficient semismooth Newton method to subproblems of the proximal DCA. Numerical experiments on various synthetic and real data sets demonstrate the effectiveness of the non-convex penalty MCP in promoting sparsity. Compared with the existing state-of-the-art method, our method is demonstrated to be more efficient and reliable for learning graph Laplacian with MCP.
OCApr 17, 2020
Estimation of sparse Gaussian graphical models with hidden clustering structureMeixia Lin, Defeng Sun, Kim-Chuan Toh et al.
Estimation of Gaussian graphical models is important in natural science when modeling the statistical relationships between variables in the form of a graph. The sparsity and clustering structure of the concentration matrix is enforced to reduce model complexity and describe inherent regularities. We propose a model to estimate the sparse Gaussian graphical models with hidden clustering structure, which also allows additional linear constraints to be imposed on the concentration matrix. We design an efficient two-phase algorithm for solving the proposed model. We develop a symmetric Gauss-Seidel based alternating direction method of the multipliers (sGS-ADMM) to generate an initial point to warm-start the second phase algorithm, which is a proximal augmented Lagrangian method (pALM), to get a solution with high accuracy. Numerical experiments on both synthetic data and real data demonstrate the good performance of our model, as well as the efficiency and robustness of our proposed algorithm.
OCFeb 26, 2020
Efficient algorithms for multivariate shape-constrained convex regression problemsMeixia Lin, Defeng Sun, Kim-Chuan Toh
Shape-constrained convex regression problem deals with fitting a convex function to the observed data, where additional constraints are imposed, such as component-wise monotonicity and uniform Lipschitz continuity. This paper provides a comprehensive mechanism for computing the least squares estimator of a multivariate shape-constrained convex regression function in $\mathbb{R}^d$. We prove that the least squares estimator is computable via solving a constrained convex quadratic programming (QP) problem with $(n+1)d$ variables and at least $n(n-1)$ linear inequality constraints, where $n$ is the number of data points. For solving the generally very large-scale convex QP, we design two efficient algorithms, one is the symmetric Gauss-Seidel based alternating direction method of multipliers ({\tt sGS-ADMM}), and the other is the proximal augmented Lagrangian method ({\tt pALM}) with the subproblems solved by the semismooth Newton method ({\tt SSN}). Comprehensive numerical experiments, including those in the pricing of basket options and estimation of production functions in economics, demonstrate that both of our proposed algorithms outperform the state-of-the-art algorithm. The {\tt pALM} is more efficient than the {\tt sGS-ADMM} but the latter has the advantage of being simpler to implement.
OCMar 27, 2019
A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problemsPeipei Tang, Chengjing Wang, Defeng Sun et al.
In this paper, we consider high-dimensional nonconvex square-root-loss regression problems and introduce a proximal majorization-minimization (PMM) algorithm for these problems. Our key idea for making the proposed PMM to be efficient is to develop a sparse semismooth Newton method to solve the corresponding subproblems. By using the Kurdyka-Łojasiewicz property exhibited in the underlining problems, we prove that the PMM algorithm converges to a d-stationary point. We also analyze the oracle property of the initial subproblem used in our algorithm. Extensive numerical experiments are presented to demonstrate the high efficiency of the proposed PMM algorithm.
OCFeb 1, 2019
A dual Newton based preconditioned proximal point algorithm for exclusive lasso modelsMeixia Lin, Defeng Sun, Kim-Chuan Toh et al.
The exclusive lasso (also known as elitist lasso) regularization has become popular recently due to its superior performance on group sparsity. Compared to the group lasso regularization which enforces the competition on variables among different groups, the exclusive lasso regularization also enforces the competition within each group. In this paper, we propose a highly efficient dual Newton based preconditioned proximal point algorithm (PPDNA) to solve machine learning models involving the exclusive lasso regularizer. As an important ingredient, we provide a rigorous proof for deriving the closed-form solution to the proximal mapping of the weighted exclusive lasso regularizer. In addition, we derive the corresponding HS-Jacobian to the proximal mapping and analyze its structure --- which plays an essential role in the efficient computation of the PPA subproblem via applying a semismooth Newton method on its dual. Various numerical experiments in this paper demonstrate the superior performance of the proposed PPDNA against other state-of-the-art numerical algorithms.
LGOct 4, 2018
Convex Clustering: Model, Theoretical Guarantee and Efficient AlgorithmDefeng Sun, Kim-Chuan Toh, Yancheng Yuan
Clustering is a fundamental problem in unsupervised learning. Popular methods like K-means, may suffer from poor performance as they are prone to get stuck in its local minima. Recently, the sum-of-norms (SON) model (also known as the clustering path) has been proposed in Pelckmans et al. (2005), Lindsten et al. (2011) and Hocking et al. (2011). The perfect recovery properties of the convex clustering model with uniformly weighted all pairwise-differences regularization have been proved by Zhu et al. (2014) and Panahi et al. (2017). However, no theoretical guarantee has been established for the general weighted convex clustering model, where better empirical results have been observed. In the numerical optimization aspect, although algorithms like the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA) have been proposed to solve the convex clustering model (Chi and Lange, 2015), it still remains very challenging to solve large-scale problems. In this paper, we establish sufficient conditions for the perfect recovery guarantee of the general weighted convex clustering model, which include and improve existing theoretical results as special cases. In addition, we develop a semismooth Newton based augmented Lagrangian method for solving large-scale convex clustering problems. Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly efficient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm comparing to the existing first-order methods. In particular, our algorithm is able to solve a convex clustering problem with 200,000 points in $\mathbb{R}^3$ in about 6 minutes.
OCSep 12, 2018
A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein BarycentersLei Yang, Jia Li, Defeng Sun et al.
We consider the problem of computing a Wasserstein barycenter for a set of discrete probability distributions with finite supports, which finds many applications in areas such as statistics, machine learning and image processing. When the support points of the barycenter are pre-specified, this problem can be modeled as a linear programming (LP) problem whose size can be extremely large. To handle this large-scale LP, we analyse the structure of its dual problem, which is conceivably more tractable and can be reformulated as a well-structured convex problem with 3 kinds of block variables and a coupling linear equality constraint. We then adapt a symmetric Gauss-Seidel based alternating direction method of multipliers (sGS-ADMM) to solve the resulting dual problem and establish its global convergence and global linear convergence rate. As a critical component for efficient computation, we also show how all the subproblems involved can be solved exactly and efficiently. This makes our method suitable for computing a Wasserstein barycenter on a large-scale data set, without introducing an entropy regularization term as is commonly practiced. In addition, our sGS-ADMM can be used as a subroutine in an alternating minimization method to compute a barycenter when its support points are not pre-specified. Numerical results on synthetic data sets and image data sets demonstrate that our method is highly competitive for solving large-scale Wasserstein barycenter problems, in comparison to two existing representative methods and the commercial software Gurobi.
OCAug 22, 2018
Efficient sparse semismooth Newton methods for the clustered lasso problemMeixia Lin, Yong-Jin Liu, Defeng Sun et al.
We focus on solving the clustered lasso problem, which is a least squares problem with the $\ell_1$-type penalties imposed on both the coefficients and their pairwise differences to learn the group structure of the regression parameters. Here we first reformulate the clustered lasso regularizer as a weighted ordered-lasso regularizer, which is essential in reducing the computational cost from $O(n^2)$ to $O(n\log (n))$. We then propose an inexact semismooth Newton augmented Lagrangian ({\sc Ssnal}) algorithm to solve the clustered lasso problem or its dual via this equivalent formulation, depending on whether the sample size is larger than the dimension of the features. An essential component of the {\sc Ssnal} algorithm is the computation of the generalized Jacobian of the proximal mapping of the clustered lasso regularizer. Based on the new formulation, we derive an efficient procedure for its computation. Comprehensive results on the global convergence and local linear convergence of the {\sc Ssnal} algorithm are established. For the purpose of exposition and comparison, we also summarize/design several first-order methods that can be used to solve the problem under consideration, but with the key improvement from the new formulation of the clustered lasso regularizer. As a demonstration of the applicability of our algorithms, numerical experiments on the clustered lasso problem are performed. The experiments show that the {\sc Ssnal} algorithm substantially outperforms the best alternative algorithm for the clustered lasso problem.
OCFeb 20, 2018
An Efficient Semismooth Newton Based Algorithm for Convex ClusteringYancheng Yuan, Defeng Sun, Kim-Chuan Toh
Clustering may be the most fundamental problem in unsupervised learning which is still active in machine learning research because its importance in many applications. Popular methods like K-means, may suffer from instability as they are prone to get stuck in its local minima. Recently, the sum-of-norms (SON) model (also known as clustering path), which is a convex relaxation of hierarchical clustering model, has been proposed in [7] and [5] Although numerical algorithms like ADMM and AMA are proposed to solve convex clustering model [2], it is known to be very challenging to solve large-scale problems. In this paper, we propose a semi-smooth Newton based augmented Lagrangian method for large-scale convex clustering problems. Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly efficient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm compared to existing first-order methods.
NAMay 23, 2017
A block symmetric Gauss-Seidel decomposition theorem for convex composite quadratic programming and its applicationsXudong Li, Defeng Sun, Kim-Chuan Toh
For a symmetric positive semidefinite linear system of equations $\mathcal{Q} {\bf x} = {\bf b}$, where ${\bf x} = (x_1,\ldots,x_s)$ is partitioned into $s$ blocks, with $s \geq 2$, we show that each cycle of the classical block symmetric Gauss-Seidel (block sGS) method exactly solves the associated quadratic programming (QP) problem but added with an extra proximal term of the form $\frac{1}{2} \| {\bf x}-{\bf x}^k \|_{\mathcal T}^2$, where ${\mathcal T}$ is a symmetric positive semidefinite matrix related to the sGS decomposition and ${\bf x}^k$ is the previous iterate. By leveraging on such a connection to optimization, we are able to extend the result (which we name as the block sGS decomposition theorem) for solving a convex composite QP (CCQP) with an additional possibly nonsmooth term in $x_1$, i.e., $\min\{ p(x_1) + \frac{1}{2}\langle {\bf x},\, \mathcal{Q} {\bf x} \rangle -\langle {\bf b},\, {\bf x}\rangle\}$, where $p(\cdot)$ is a proper closed convex function. Based on the block sGS decomposition theorem, we are able to extend the classical block sGS method to solve a CCQP. In addition, our extended block sGS method has the flexibility of allowing for inexact computation in each step of the block sGS cycle. At the same time, we can also accelerate the inexact block sGS method to achieve an iteration complexity of $O(1/k^2)$ after performing $k$ block sGS cycles. As a {fundamental} building block, the block sGS decomposition theorem has played a key role in various recently developed algorithms such as the inexact semiproximal {ALM/ADMM} for linearly constrained multi-block convex composite conic programming (CCCP), and the accelerated block coordinate descent method for multi-block CCCP.
OCOct 13, 2012
A Rank-Corrected Procedure for Matrix Completion with Fixed Basis CoefficientsWeimin Miao, Shaohua Pan, Defeng Sun
For the problems of low-rank matrix completion, the efficiency of the widely-used nuclear norm technique may be challenged under many circumstances, especially when certain basis coefficients are fixed, for example, the low-rank correlation matrix completion in various fields such as the financial market and the low-rank density matrix completion from the quantum state tomography. To seek a solution of high recovery quality beyond the reach of the nuclear norm, in this paper, we propose a rank-corrected procedure using a nuclear semi-norm to generate a new estimator. For this new estimator, we establish a non-asymptotic recovery error bound. More importantly, we quantify the reduction of the recovery error bound for this rank-corrected procedure. Compared with the one obtained for the nuclear norm penalized least squares estimator, this reduction can be substantial (around 50%). We also provide necessary and sufficient conditions for rank consistency in the sense of Bach (2008). Very interestingly, these conditions are highly related to the concept of constraint nondegeneracy in matrix optimization. As a byproduct, our results provide a theoretical foundation for the majorized penalty method of Gao and Sun (2010) and Gao (2010) for structured low-rank matrix optimization problems. Extensive numerical experiments demonstrate that our proposed rank-corrected procedure can simultaneously achieve a high recovery accuracy and capture the low-rank structure.