Xiliang Lu

NA
h-index36
31papers
490citations
Novelty51%
AI Score53

31 Papers

NASep 23, 2017
Preasymptotic Convergence of Randomized Kaczmarz Method

Yuling Jiao, Bangti Jin, Xiliang Lu

Kaczmarz method is one popular iterative method for solving inverse problems, especially in computed tomography. Recently, it was established that a randomized version of the method enjoys an exponential convergence for well-posed problems, and the convergence rate is determined by a variant of the condition number. In this work, we analyze the preasymptotic convergence behavior of the randomized Kaczmarz method, and show that the low-frequency error (with respect to the right singular vectors) decays faster during first iterations than the high-frequency error. Under the assumption that the inverse solution is smooth (e.g., sourcewise representation), the result explains the fast empirical convergence behavior, thereby shedding new insights into the excellent performance of the randomized Kaczmarz method in practice. Further, we propose a simple strategy to stabilize the asymptotic convergence of the iteration by means of variance reduction. We provide extensive numerical experiments to confirm the analysis and to elucidate the behavior of the algorithms.

NAFeb 27, 2015
A simple finite element method for the boundary value problem with a Riemann-Liouville derivative

Bangti Jin, Raytcho Lazarov, Xiliang Lu et al.

We consider a boundary value problem involving a Riemann-Liouville fractional derivative of order $α\in (3/2,2)$ on the unit interval $(0,1)$. The standard Galerkin finite element approximation converges slowly due to the presence of singularity term $x^{α-1}$ in the solution representation. In this work, we develop a simple technique, by transforming it into a second-order two-point boundary value problem with nonlocal low order terms, whose solution can reconstruct directly the solution to the original problem. The stability of the variational formulation, and the optimal regularity pickup of the solution are analyzed. A novel Galerkin finite element method with piecewise linear or quadratic finite elements is developed, and $L^2(D)$ error estimates are provided. The approach is then applied to the corresponding fractional Sturm-Liouville problem, and error estimates of the eigenvalue approximations are given. Extensive numerical results fully confirm our theoretical study.

10.3LGApr 24Code
Robust Fuzzy local k-plane clustering with mixture distance of hinge loss and L1 norm

Junjun Huang, Xiliang Lu, Xuelin Xie et al.

K-plane clustering (KPC), hyperplane clustering, and mixture regression all essentially fall within the same class of problems. This problem can be conceptualized as clustering in relatively high-dimensional K subspaces or K linear manifolds. Traditional KPC or fuzzy KPC models demonstrate a pronounced susceptibility to outliers, as they presuppose that the projection distance between data points and the plane normal vector adheres to the L2 distance. Meanwhile, the assumption of infinitely extending clusters adversely affects clustering performance. To solve these problems, this paper proposed a new robust fuzzy local k-plane clustering (RFLkPC) method that combines the mixture distance of hinge loss and L1 norm. The RFLkPC model assumes that each plane cluster is bounded to a finite area, which can flexibly and robustly handle plane clustering tasks with outliers or not. The corresponding model and optimization algorithms of RFLkPC were provided. Compared to other related models on this topic, a large number of experiments verify the efficiency of RFLkPC on simulated data and real data. The source code for the proposed RFLkPC method is publicly available at https://github.com/xuelin-xie/RFLkPC.

NAApr 11, 2017
Iterative Soft/Hard Thresholding with Homotopy Continuation for Sparse Recovery

Yuling Jiao, Bangti Jin, Xiliang Lu

In this note, we analyze an iterative soft / hard thresholding algorithm with homotopy continuation for recovering a sparse signal $x^†$ from noisy data of a noise level $ε$. Under suitable regularity and sparsity conditions, we design a path along which the algorithm can find a solution $x^*$ which admits a sharp reconstruction error $\|x^* - x^†\|_{\ell^\infty} = O(ε)$ with an iteration complexity $O(\frac{\ln ε}{\ln γ} np)$, where $n$ and $p$ are problem dimensionality and $γ\in (0,1)$ controls the length of the path. Numerical examples are given to illustrate its performance.

NAApr 5, 2022
Imaging Conductivity from Current Density Magnitude using Neural Networks

Bangti Jin, Xiyao Li, Xiliang Lu

Conductivity imaging represents one of the most important tasks in medical imaging. In this work we develop a neural network based reconstruction technique for imaging the conductivity from the magnitude of the internal current density. It is achieved by formulating the problem as a relaxed weighted least-gradient problem, and then approximating its minimizer by standard fully connected feedforward neural networks. We derive bounds on two components of the generalization error, i.e., approximation error and statistical error, explicitly in terms of properties of the neural networks (e.g., depth, total number of parameters, and the bound of the network parameters). We illustrate the performance and distinct features of the approach on several numerical experiments. Numerically, it is observed that the approach enjoys remarkable robustness with respect to the presence of data noise.

ITNov 22, 2016
Group Sparse Recovery via the $\ell^0(\ell^2)$ Penalty: Theory and Algorithm

Yuling Jiao, Bangti Jin, Xiliang Lu

In this work we propose and analyze a novel approach for group sparse recovery. It is based on regularized least squares with an $\ell^0(\ell^2)$ penalty, which penalizes the number of nonzero groups. One distinct feature of the approach is that it has the built-in decorrelation mechanism within each group, and thus can handle challenging strong inner-group correlation. We provide a complete analysis of the regularized model, e.g., existence of a global minimizer, invariance property, support recovery, and properties of block coordinatewise minimizers. Further, the regularized problem admits an efficient primal dual active set algorithm with a provable finite-step global convergence. At each iteration, it involves solving a least-squares problem on the active set only, and exhibits a fast local convergence, which makes the method extremely efficient for recovering group sparse signals. Extensive numerical experiments are presented to illustrate salient features of the model and the efficiency and accuracy of the algorithm. A comparative study indicates its competitiveness with existing approaches.

NAJan 12, 2016
Alternating Direction Method of Multipliers for Linear Inverse Problems

Yuling Jiao, Qinian Jin, Xiliang Lu et al.

In this paper we propose an iterative method using alternating direction method of multipliers (ADMM) strategy to solve linear inverse problems in Hilbert spaces with general convex penalty term. When the data is given exactly, we give a convergence analysis of our ADMM algorithm without assuming the existence of Lagrange multiplier. In case the data contains noise, we show that our method is a regularization method as long as it is terminated by a suitable stopping rule. Various numerical simulations are performed to test the efficiency of the method.

LGMar 28, 2023
GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs

Yuling Jiao, Di Li, Xiliang Lu et al.

With the recent study of deep learning in scientific computation, the Physics-Informed Neural Networks (PINNs) method has drawn widespread attention for solving Partial Differential Equations (PDEs). Compared to traditional methods, PINNs can efficiently handle high-dimensional problems, but the accuracy is relatively low, especially for highly irregular problems. Inspired by the idea of adaptive finite element methods and incremental learning, we propose GAS, a Gaussian mixture distribution-based adaptive sampling method for PINNs. During the training procedure, GAS uses the current residual information to generate a Gaussian mixture distribution for the sampling of additional points, which are then trained together with historical data to speed up the convergence of the loss and achieve higher accuracy. Several numerical simulations on 2D and 10D problems show that GAS is a promising method that achieves state-of-the-art accuracy among deep solvers, while being comparable with traditional numerical solvers.

QUANT-PHApr 14, 2022
Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Qiuhao Chen, Yuxuan Du, Qi Zhao et al.

Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.

NANov 3, 2017
Robust Decoding from 1-Bit Compressive Sampling with Least Squares

Jian Huang, Yuling Jiao, Xiliang Lu et al.

In 1-bit compressive sensing (1-bit CS) where target signal is coded into a binary measurement, one goal is to recover the signal from noisy and quantized samples. Mathematically, the 1-bit CS model reads: $y = η\odot\textrm{sign} (Ψx^* + ε)$, where $x^{*}\in \mathcal{R}^{n}, y\in \mathcal{R}^{m}$, $Ψ\in \mathcal{R}^{m\times n}$, and $ε$ is the random error before quantization and $η\in \mathcal{R}^{n}$ is a random vector modeling the sign flips. Due to the presence of nonlinearity, noise and sign flips, it is quite challenging to decode from the 1-bit CS. In this paper, we consider least squares approach under the over-determined and under-determined settings. For $m>n$, we show that, up to a constant $c$, with high probability, the least squares solution $x_{\textrm{ls}}$ approximates $ x^*$ with precision $δ$ as long as $m \geq\widetilde{\mathcal{O}}(\frac{n}{δ^2})$. For $m< n$, we prove that, up to a constant $c$, with high probability, the $\ell_1$-regularized least-squares solution $x_{\ell_1}$ lies in the ball with center $x^*$ and radius $δ$ provided that $m \geq \mathcal{O}( \frac{s\log n}{δ^2})$ and $\|x^*\|_0 := s < m$. We introduce a Newton type method, the so-called primal and dual active set (PDAS) algorithm, to solve the nonsmooth optimization problem. The PDAS possesses the property of one-step convergence. It only requires to solve a small least squares problem on the active set. Therefore, the PDAS is extremely efficient for recovering sparse signals through continuation. We propose a novel regularization parameter selection rule which does not introduce any extra computational overhead. Extensive numerical experiments are presented to illustrate the robustness of our proposed model and the efficiency of our algorithm.

NAJun 24, 2023
Current density impedance imaging with PINNs

Chenguang Duan, Yuling Jiao, Xiliang Lu et al.

In this paper, we introduce CDII-PINNs, a computationally efficient method for solving CDII using PINNs in the framework of Tikhonov regularization. This method constructs a physics-informed loss function by merging the regularized least-squares output functional with an underlying differential equation, which describes the relationship between the conductivity and voltage. A pair of neural networks representing the conductivity and voltage, respectively, are coupled by this loss function. Then, minimizing the loss function provides a reconstruction. A rigorous theoretical guarantee is provided. We give an error analysis for CDII-PINNs and establish a convergence rate, based on prior selected neural network parameters in terms of the number of samples. The numerical simulations demonstrate that CDII-PINNs are efficient, accurate and robust to noise levels ranging from $1\%$ to $20\%$.

QUANT-PHOct 11, 2023
Non-asymptotic Approximation Error Bounds of Parameterized Quantum Circuits

Zhan Yu, Qiuhao Chen, Yuling Jiao et al.

Parameterized quantum circuits (PQCs) have emerged as a promising approach for quantum neural networks. However, understanding their expressive power in accomplishing machine learning tasks remains a crucial question. This paper investigates the expressivity of PQCs for approximating general multivariate function classes. Unlike previous Universal Approximation Theorems for PQCs, which are either nonconstructive or rely on parameterized classical data processing, we explicitly construct data re-uploading PQCs for approximating multivariate polynomials and smooth functions. We establish the first non-asymptotic approximation error bounds for these functions in terms of the number of qubits, quantum circuit depth, and number of trainable parameters. Notably, we demonstrate that for approximating functions that satisfy specific smoothness criteria, the quantum circuit size and number of trainable parameters of our proposed PQCs can be smaller than those of deep ReLU neural networks. We further validate the approximation capability of PQCs through numerical experiments. Our results provide a theoretical foundation for designing practical PQCs and quantum neural networks for machine learning tasks that can be implemented on near-term quantum devices, paving the way for the advancement of quantum machine learning.

20.5CVApr 14
Spatial-Spectral Adaptive Fidelity and Noise Prior Reduction Guided Hyperspectral Image Denoising

Xuelin Xie, Xiliang Lu, Zhengshan Wang et al.

The core challenge of hyperspectral image denoising is striking the right balance between data fidelity and noise prior modeling. Most existing methods place too much emphasis on the intrinsic priors of the image while overlooking diverse noise assumptions and the dynamic trade-off between fidelity and priors. To address these issues, we propose a denoising framework that integrates noise prior reduction and a spatial-spectral adaptive fidelity term. This framework considers comprehensive noise priors with fewer parameters and introduces an adaptive weight tensor to dynamically balance the fidelity and prior regularization terms. Within this framework, we further develop a fast and robust pixel-wise model combined with the representative coefficient total variation regularizer to accurately remove mixed noise in HSIs. The proposed method not only efficiently handles various types of noise but also accurately captures the spectral low-rank structure and local smoothness of HSIs. An efficient optimization algorithm based on the alternating direction method of multipliers is designed to ensure stable and fast convergence. Extensive experiments on simulated and real-world datasets demonstrate that the proposed model achieves superior denoising performance while maintaining competitive computational efficiency.

CLMar 15, 2024Code
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

Ziyang Xu, Keqin Peng, Liang Ding et al.

Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction, i.e., prompts tend to introduce biases toward specific labels. Prompt bias presents a significant challenge in assessing the factual knowledge within PLMs. Therefore, this paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias. We show that: 1) all prompts in the experiments exhibit non-negligible bias, with gradient-based prompts like AutoPrompt and OptiPrompt displaying significantly higher levels of bias; 2) prompt bias can amplify benchmark accuracy unreasonably by overfitting the test datasets, especially on imbalanced datasets like LAMA. Based on these findings, we propose a representation-based approach to mitigate the prompt bias during inference time. Specifically, we first estimate the biased representation using prompt-only querying, and then remove it from the model's internal representations to generate the debiased representations, which are used to produce the final debiased outputs. Experiments across various prompts, PLMs, and benchmarks show that our approach can not only correct the overfitted performance caused by prompt bias, but also significantly improve the prompt retrieval capability (up to 10% absolute performance gain). These results indicate that our approach effectively alleviates prompt bias in knowledge evaluation, thereby enhancing the reliability of benchmark assessments. Hopefully, our plug-and-play approach can be a golden standard to strengthen PLMs toward reliable knowledge bases. Code and data are released in https://github.com/FelliYang/PromptBias.

19.6COApr 16
Theta-regularized Kriging: Modelling and Algorithms

Xuelin Xie, Xiliang Lu

To obtain more accurate model parameters and improve prediction accuracy, we proposed a regularized Kriging model that penalizes the hyperparameter theta in the Gaussian stochastic process, termed the Theta-regularized Kriging. We derived the optimization problem for this model from a maximum likelihood perspective. Additionally, we presented specific implementation details for the iterative process, including the regularized optimization algorithm and the geometric search cross-validation tuning algorithm. Three distinct penalty methods, Lasso, Ridge, and Elastic-net regularization, were meticulously considered. Meanwhile, the proposed Theta-regularized Kriging models were tested on nine common numerical functions and two practical engineering examples. The results demonstrate that, compared with other penalized Kriging models, the proposed model performs better in terms of accuracy and stability.

33.7LGMay 8
Approximation Error Upper and Lower Bounds for Hölder Class with Transformers

Xin He, Yuling Jiao, Xiliang Lu et al.

We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for Hölder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax operators, ReLU activation functions, and residual connections. We prove that a Transformer network composed of at most $\mathcal{O}(\varepsilon^{-{d_{0}}/α})$ blocks can approximate any bounded Hölder function with $d_{0}$-dimensional input and smoothness $α\in(0,1]$ under any accuracy $\varepsilon>0$. In the case of approximation lower bounds, leveraging the VC-dimension upper bound, we are the first to rigorously prove that Transformers demand for at least $Ω(\varepsilon^{-{d_{0}}/({4α})})$ blocks to achieve the $\varepsilon$ approximation accuracy. As a final step, we extend the derived results for standard Transformers to a general regression task and establish the corresponding excess risk rates demonstrating Transformers' empirical effectiveness in real-world settings.

LGDec 19, 2023
Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Di Wu, Yuling Jiao, Li Shen et al.

Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling of deep learning and Bellman residuals makes this problem challenging, in addition to the difficulty of data dependence. In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with $\mathcal{C}$-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks. We utilize the empirical process tool for $\mathcal{C}$-mixing sequences and the neural network approximation theory for the Hölder class to achieve this. We also develop methods to bound the Bellman estimation error caused by function approximation with empirical Bellman constraint perturbations. Additionally, we present a result that lessens the curse of dimensionality using data with low intrinsic dimensionality and function classes with low complexity. Our estimation provides valuable insights into the development of deep offline RL and guidance for algorithm model design.

QUANT-PHJun 18, 2024
Quantum Compiling with Reinforcement Learning on a Superconducting Processor

Z. T. Wang, Qiuhao Chen, Yuxuan Du et al.

To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcement learning (RL)-based quantum compiler for a superconducting processor and demonstrate its capability of discovering novel and hardware-amenable circuits with short lengths. We show that for the three-qubit quantum Fourier transformation, a compiled circuit using only seven CZ gates with unity circuit fidelity can be achieved. The compiler is also able to find optimal circuits under device topological constraints, with lengths considerably shorter than those by the conventional method. Our study exemplifies the codesign of the software with hardware for efficient quantum compilation, offering valuable insights for the advancement of RL-based compilers.

MLJun 11, 2024
Accelerating Ill-conditioned Hankel Matrix Recovery via Structured Newton-like Descent

HanQin Cai, Longxiu Huang, Xiliang Lu et al.

This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of the condition number of the underlying Hankel matrix. The recovery guarantee has been established under some mild conditions. Numerical experiments on both synthetic and real datasets show the superior performance of HSNLD against state-of-the-art algorithms.

MLNov 21, 2021
A Data-Driven Line Search Rule for Support Recovery in High-dimensional Data Analysis

Peili Li, Yuling Jiao, Xiliang Lu et al.

In this work, we consider the algorithm to the (nonlinear) regression problems with $\ell_0$ penalty. The existing algorithms for $\ell_0$ based optimization problem are often carried out with a fixed step size, and the selection of an appropriate step size depends on the restricted strong convexity and smoothness for the loss function, hence it is difficult to compute in practical calculation. In sprite of the ideas of support detection and root finding \cite{HJK2020}, we proposes a novel and efficient data-driven line search rule to adaptively determine the appropriate step size. We prove the $\ell_2$ error bound to the proposed algorithm without much restrictions for the cost functional. A large number of numerical comparisons with state-of-the-art algorithms in linear and logistic regression problems show the stability, effectiveness and superiority of the proposed algorithms.

MLSep 18, 2021
Coordinate Descent for MCP/SCAD Penalized Least Squares Converges Linearly

Yuling Jiao, Dingwei Li, Min Liu et al.

Recovering sparse signals from observed data is an important topic in signal/imaging processing, statistics and machine learning. Nonconvex penalized least squares have been attracted a lot of attentions since they enjoy nice statistical properties. Computationally, coordinate descent (CD) is a workhorse for minimizing the nonconvex penalized least squares criterion due to its simplicity and scalability. In this work, we prove the linear convergence rate to CD for solving MCP/SCAD penalized least squares problems.

LGFeb 28, 2021
Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class

Yuling Jiao, Yanming Lai, Xiliang Lu et al.

In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $ω_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(ω_f(\sqrt{d})\cdot2^{-M}+ω_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3μ}ε\right)^{1/α}\right\rceil,2\left\lceil\log_2\frac{3μd^{α/2}}{2ε}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_μ^α([0,1]^d)$ within a given tolerance $ε>0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_μ^α([0,1]^d)$ denotes the Hölder continuous function class defined on $[0,1]^d$ with order $α\in (0,1]$ and constant $μ> 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_μ^α([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.

LGDec 11, 2020
Generative Learning With Euler Particle Transport

Yuan Gao, Jian Huang, Yuling Jiao et al.

We propose an Euler particle transport (EPT) approach for generative learning. The proposed approach is motivated by the problem of finding an optimal transport map from a reference distribution to a target distribution characterized by the Monge-Ampere equation. Interpreting the infinitesimal linearization of the Monge-Ampere equation from the perspective of gradient flows in measure spaces leads to a stochastic McKean-Vlasov equation. We use the forward Euler method to solve this equation. The resulting forward Euler map pushes forward a reference distribution to the target. This map is the composition of a sequence of simple residual maps, which are computationally stable and easy to train. The key task in training is the estimation of the density ratios or differences that determine the residual maps. We estimate the density ratios (differences) based on the Bregman divergence with a gradient penalty using deep density-ratio (difference) fitting. We show that the proposed density-ratio (difference) estimators do not suffer from the "curse of dimensionality" if data is supported on a lower-dimensional manifold. Numerical experiments with multi-mode synthetic datasets and comparisons with the existing methods on real benchmark datasets support our theoretical results and demonstrate the effectiveness of the proposed method.

MLJan 27, 2020
On Newton Screening

Jian Huang, Yuling Jiao, Lican Kang et al.

Screening and working set techniques are important approaches to reducing the size of an optimization problem. They have been widely used in accelerating first-order methods for solving large-scale sparse learning problems. In this paper, we develop a new screening method called Newton screening (NS) which is a generalized Newton method with a built-in screening mechanism. We derive an equivalent KKT system for the Lasso and utilize a generalized Newton method to solve the KKT equations. Based on this KKT system, a built-in working set with a relatively small size is first determined using the sum of primal and dual variables generated from the previous iteration, then the primal variable is updated by solving a least-squares problem on the working set and the dual variable updated based on a closed-form expression. Moreover, we consider a sequential version of Newton screening (SNS) with a warm-start strategy. We show that NS possesses an optimal convergence property in the sense that it achieves one-step local convergence. Under certain regularity conditions on the feature matrix, we show that SNS hits a solution with the same signs as the underlying true target and achieves a sharp estimation error bound with high probability. Simulation studies and real data analysis support our theoretical results and demonstrate that SNS is faster and more accurate than several state-of-the-art methods in our comparative studies.

MLJan 16, 2020
A Support Detection and Root Finding Approach for Learning High-dimensional Generalized Linear Models

Jian Huang, Yuling Jiao, Lican Kang et al.

Feature selection is important for modeling high-dimensional data, where the number of variables can be much larger than the sample size. In this paper, we develop a support detection and root finding procedure to learn the high dimensional sparse generalized linear models and denote this method by GSDAR. Based on the KKT condition for $\ell_0$-penalized maximum likelihood estimations, GSDAR generates a sequence of estimators iteratively. Under some restricted invertibility conditions on the maximum likelihood function and sparsity assumption on the target coefficients, the errors of the proposed estimate decays exponentially to the optimal order. Moreover, the oracle estimator can be recovered if the target signal is stronger than the detectable level. We conduct simulations and real data analysis to illustrate the advantages of our proposed method over several existing methods, including Lasso and MCP.

MLJun 14, 2019
A stochastic alternating minimizing method for sparse phase retrieval

Jianfeng Cai, Yuling Jiao, Xiliang Lu et al.

Sparse phase retrieval plays an important role in many fields of applied science and thus attracts lots of attention. In this paper, we propose a \underline{sto}chastic alte\underline{r}nating \underline{m}inimizing method for \underline{sp}arse ph\underline{a}se \underline{r}etrieval (\textit{StormSpar}) algorithm which {emprically} is able to recover $n$-dimensional $s$-sparse signals from only $O(s\,\mathrm{log}\, n)$ number of measurements without a desired initial value required by many existing methods. In \textit{StormSpar}, the hard-thresholding pursuit (HTP) algorithm is employed to solve the sparse constraint least square sub-problems. The main competitive feature of \textit{StormSpar} is that it converges globally requiring optimal order of number of samples with random initialization. Extensive numerical experiments are given to validate the proposed algorithm.

MLOct 9, 2018
SNAP: A semismooth Newton algorithm for pathwise optimization with optimal local convergence rate and oracle properties

Jian Huang, Yuling Jiao, Xiliang Lu et al.

We propose a semismooth Newton algorithm for pathwise optimization (SNAP) for the LASSO and Enet in sparse, high-dimensional linear regression. SNAP is derived from a suitable formulation of the KKT conditions based on Newton derivatives. It solves the semismooth KKT equations efficiently by actively and continuously seeking the support of the regression coefficients along the solution path with warm start. At each knot in the path, SNAP converges locally superlinearly for the Enet criterion and achieves an optimal local convergence rate for the LASSO criterion, i.e., SNAP converges in one step at the cost of two matrix-vector multiplication per iteration. Under certain regularity conditions on the design matrix and the minimum magnitude of the nonzero elements of the target regression coefficients, we show that SNAP hits a solution with the same signs as the regression coefficients and achieves a sharp estimation error bound in finite steps with high probability. The computational complexity of SNAP is shown to be the same as that of LARS and coordinate descent algorithms per iteration. Simulation studies and real data analysis support our theoretical results and demonstrate that SNAP is faster and accurate than LARS and coordinate descent algorithms.

CVSep 6, 2018
Oblique Stripe Removal in Remote Sensing Images via Oriented Variation

Xinxin Liu, Xiliang Lu, Huanfeng Shen et al.

Destriping is a classical problem in remote sensing image processing. Although considerable effort has been made to remove stripes, few of the existing methods can eliminate stripe noise with arbitrary orientations. This situation makes the removal of oblique stripes in the higher-level remote sensing products become an unfinished and urgent issue. To overcome the challenging problem, we propose a novel destriping model which is self-adjusted to different orientations of stripe noise. First of all, the oriented variation model is designed to accomplish the stripe orientation approximation. In this model, the stripe direction is automatically estimated and then imbedded into the constraint term to depict the along-stripe smoothness of the stripe component. Mainly based on the oriented variation model, a whole destriping framework is proposed by jointly employing an L1-norm constraint and a TV regularization to separately capture the global distribution property of stripe component and the piecewise smoothness of the clean image. The qualitative and quantitative experimental results of both orientation and destriping aspects confirm the effectiveness and stability of the proposed method.

NAOct 18, 2018
On the Regularizing Property of Stochastic Gradient Descent

Bangti Jin, Xiliang Lu

Stochastic gradient descent is one of the most successful approaches for solving large-scale problems, especially in machine learning and statistics. At each iteration, it employs an unbiased estimator of the full gradient computed from one single randomly selected data point. Hence, it scales well with problem size and is very attractive for truly massive dataset, and holds significant potentials for solving large-scale inverse problems. In the recent literature of machine learning, it was empirically observed that when equipped with early stopping, it has regularizing property. In this work, we rigorously establish its regularizing property (under \textit{a priori} early stopping rule), and also prove convergence rates under the canonical sourcewise condition, for minimizing the quadratic functional for linear inverse problems. This is achieved by combining tools from classical regularization theory and stochastic analysis. Further, we analyze the preasymptotic weak and strong convergence behavior of the algorithm. The theoretical findings shed insights into the performance of the algorithm, and are complemented with illustrative numerical experiments.

OCMar 3, 2014
A Primal Dual Active Set with Continuation Algorithm for the \ell^0-Regularized Optimization Problem

Yuling Jiao, Bangti Jin, Xiliang Lu

We develop a primal dual active set with continuation algorithm for solving the \ell^0-regularized least-squares problem that frequently arises in compressed sensing. The algorithm couples the the primal dual active set method with a continuation strategy on the regularization parameter. At each inner iteration, it first identifies the active set from both primal and dual variables, and then updates the primal variable by solving a (typically small) least-squares problem defined on the active set, from which the dual variable can be updated explicitly. Under certain conditions on the sensing matrix, i.e., mutual incoherence property or restricted isometry property, and the noise level, the finite step global convergence of the algorithm is established. Extensive numerical examples are presented to illustrate the efficiency and accuracy of the algorithm and the convergence analysis.

OCOct 4, 2013
A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery

Jian Huang, Yuling Jiao, Bangti Jin et al.

In this paper, we consider the problem of recovering a sparse signal based on penalized least squares formulations. We develop a novel algorithm of primal-dual active set type for a class of nonconvex sparsity-promoting penalties, including $\ell^0$, bridge, smoothly clipped absolute deviation, capped $\ell^1$ and minimax concavity penalty. First we establish the existence of a global minimizer for the related optimization problems. Then we derive a novel necessary optimality condition for the global minimizer using the associated thresholding operator. The solutions to the optimality system are coordinate-wise minimizers, and under minor conditions, they are also local minimizers. Upon introducing the dual variable, the active set can be determined using the primal and dual variables together. Further, this relation lends itself to an iterative algorithm of active set type which at each step involves first updating the primal variable only on the active set and then updating the dual variable explicitly. When combined with a continuation strategy on the regularization parameter, the primal dual active set method is shown to converge globally to the underlying regression target under certain regularity conditions. Extensive numerical experiments with both simulated and real data demonstrate its superior performance in efficiency and accuracy compared with the existing sparse recovery methods.