Weiyu Xu

h-index68

30papers

377citations

Novelty56%

AI Score53

Ranked #32,393 of 201,326 authors (top 16%)#81 in IT (top 9%)

30 Papers

ITJan 5, 2013

Sparse Recovery from Nonlinear Measurements with Applications in Bad Data Detection for Power Networks

Weiyu Xu, Meng Wang, Jianfeng Cai et al.

In this paper, we consider the problem of sparse recovery from nonlinear measurements, which has applications in state estimation and bad data detection for power networks. An iterative mixed $\ell_1$ and $\ell_2$ convex program is used to estimate the true state by locally linearizing the nonlinear measurements. When the measurements are linear, through using the almost Euclidean property for a linear subspace, we derive a new performance bound for the state estimation error under sparse bad data and additive observation noise. As a byproduct, in this paper we provide sharp bounds on the almost Euclidean property of a linear subspace, using the "escape-through-the-mesh" theorem from geometric functional analysis. When the measurements are nonlinear, we give conditions under which the solution of the iterative algorithm converges to the true state even though the locally linearized measurements may not be the actual nonlinear measurements. We numerically evaluate our iterative convex programming approach to perform bad data detections in nonlinear electrical power networks problems. We are able to use semidefinite programming to verify the conditions for convergence of the proposed iterative sparse recovery algorithms from nonlinear measurements.

CVOct 25, 2023

Trust, but Verify: Robust Image Segmentation using Deep Learning

Fahim Ahmed Zaman, Xiaodong Wu, Weiyu Xu et al.

We describe a method for verifying the output of a deep neural network for medical image segmentation that is robust to several classes of random as well as worst-case perturbations i.e. adversarial attacks. This method is based on a general approach recently developed by the authors called "Trust, but Verify" wherein an auxiliary verification network produces predictions about certain masked features in the input image using the segmentation as an input. A well-designed auxiliary network will produce high-quality predictions when the input segmentations are accurate, but will produce low-quality predictions when the segmentations are incorrect. Checking the predictions of such a network with the original image allows us to detect bad segmentations. However, to ensure the verification method is truly robust, we need a method for checking the quality of the predictions that does not itself rely on a black-box neural network. Indeed, we show that previous methods for segmentation evaluation that do use deep neural regression networks are vulnerable to false negatives i.e. can inaccurately label bad segmentations as good. We describe the design of a verification network that avoids such vulnerability and present results to demonstrate its robustness compared to previous methods.

LGAug 28, 2023

Distributed Dual Coordinate Ascent with Imbalanced Data on a General Tree Network

Myung Cho, Lifeng Lai, Weiyu Xu

In this paper, we investigate the impact of imbalanced data on the convergence of distributed dual coordinate ascent in a tree network for solving an empirical loss minimization problem in distributed machine learning. To address this issue, we propose a method called delayed generalized distributed dual coordinate ascent that takes into account the information of the imbalanced data, and provide the analysis of the proposed algorithm. Numerical experiments confirm the effectiveness of our proposed method in improving the convergence speed of distributed dual coordinate ascent in a tree network.

IVOct 8, 2022

A deep learning network with differentiable dynamic programming for retina OCT surface segmentation

Hui Xie, Weiyu Xu, Xiaodong Wu

Multiple-surface segmentation in Optical Coherence Tomography (OCT) images is a challenge problem, further complicated by the frequent presence of weak image boundaries. Recently, many deep learning (DL) based methods have been developed for this task and yield remarkable performance. Unfortunately, due to the scarcity of training data in medical imaging, it is challenging for DL networks to learn the global structure of the target surfaces, including surface smoothness. To bridge this gap, this study proposes to seamlessly unify a U-Net for feature learning with a constrained differentiable dynamic programming module to achieve an end-to-end learning for retina OCT surface segmentation to explicitly enforce surface smoothness. It effectively utilizes the feedback from the downstream model optimization module to guide feature learning, yielding a better enforcement of global structures of the target surfaces. Experiments on Duke AMD (age-related macular degeneration) and JHU MS (multiple sclerosis) OCT datasets for retinal layer segmentation demonstrated very promising segmentation accuracy.

MLOct 16, 2023

Outlier Detection Using Generative Models with Theoretical Performance Guarantees

Jirong Yi, Jingchao Gao, Tianming Wang et al.

This paper considers the problem of recovering signals modeled by generative models from linear measurements contaminated with sparse outliers. We propose an outlier detection approach for reconstructing the ground-truth signals modeled by generative models under sparse outliers. We establish theoretical recovery guarantees for reconstruction of signals using generative models in the presence of outliers, giving lower bounds on the number of correctable outliers. Our results are applicable to both linear generator neural networks and the nonlinear generator neural networks with an arbitrary number of layers. We propose an iterative alternating direction method of multipliers (ADMM) algorithm for solving the outlier detection problem via $\ell_1$ norm minimization, and a gradient descent algorithm for solving the outlier detection problem via squared $\ell_1$ norm minimization. We conduct extensive experiments using variational auto-encoder and deep convolutional generative adversarial networks, and the experimental results show that the signals can be successfully reconstructed under outliers using our approach. Our approach outperforms the traditional Lasso and $\ell_2$ minimization approach.

LGNov 1, 2023

Optimal Cost Constrained Adversarial Attacks For Multiple Agent Systems

Ziqing Lu, Guanlin Liu, Lifeng Lai et al.

Finding optimal adversarial attack strategies is an important topic in reinforcement learning and the Markov decision process. Previous studies usually assume one all-knowing coordinator (attacker) for whom attacking different recipient (victim) agents incurs uniform costs. However, in reality, instead of using one limitless central attacker, the attacks often need to be performed by distributed attack agents. We formulate the problem of performing optimal adversarial agent-to-agent attacks using distributed attack agents, in which we impose distinct cost constraints on each different attacker-victim pair. We propose an optimal method integrating within-step static constrained attack-resource allocation optimization and between-step dynamic programming to achieve the optimal adversarial attack in a multi-agent system. Our numerical results show that the proposed attacks can significantly reduce the rewards received by the attacked agents.

LGMay 4

Efficient Preference Poisoning Attack on Offline RLHF

Chenye Yang, Weiyu Xu, Lifeng Lai

Offline Reinforcement Learning from Human Feedback (RLHF) pipelines such as Direct Preference Optimization (DPO) train on a pre-collected preference dataset, which makes them vulnerable to preference poisoning attack. We study label flip attacks against log-linear DPO. We first illustrate that flipping one preference label induces a parameter-independent shift in the DPO gradient. Using this key property, we can then convert the targeted poisoning problem into a structured binary sparse approximation problem. To solve this problem, we develop two attack methods: Binary-Aware Lattice Attack (BAL-A) and Binary Matching Pursuit Attack (BMP-A). BAL-A embeds the binary flip selection problem into a binary-aware lattice and applies Lenstra-Lenstra-Lovász reduction and Babai's nearest plane algorithm; we provide sufficient conditions that enforce binary coefficients and recover the minimum-flip objective. BMP-A adapts binary matching pursuit to our non-normalized gradient dictionary and yields coherence-based recovery guarantees and robustness (impossibility) certificates for $K$-flip budgets. Experiments on synthetic dictionaries and the Stanford Human Preferences dataset validate the theory and highlight how dictionary geometry governs attack success.

THMar 1

Artificial Superintelligence May be Useless: Equilibria in the Economy of Multiple AI Agents

Huan Cai, Ziqing Lu, Catherine Xu et al.

With recent development of artificial intelligence, it is more common to adopt AI agents in economic activities. This paper explores the economic actions of agents, including human agents and AI agents, in an economic game of trading products/services, and the equilibria in this economy involving multiple agents. We derive a range of equilibrium results and their corresponding conditions using a Markov chain stationary distribution based model. One distinct feature of our model is that we consider the long-term utility generated by economic activities instead of their short-term benefits. For the model consisting of two agents, we fully characterize all the possible economic equilibria and conditions. Interestingly, we show that unless each agent can at least double (not merely increase) its marginal utility by purchasing the other agent's products/services, purchasing the other agent's products/services will not happen in any economic equilibrium. We further extend our results to three and more agents, where we characterize more economic equilibria. We find that in some equilibria, the ``more powerful'' AI agents contribute zero utility to ``less capable'' agents.

CVDec 7, 2023

gcDLSeg: Integrating Graph-cut into Deep Learning for Binary Semantic Segmentation

Hui Xie, Weiyu Xu, Ya Xing Wang et al.

Binary semantic segmentation in computer vision is a fundamental problem. As a model-based segmentation method, the graph-cut approach was one of the most successful binary segmentation methods thanks to its global optimality guarantee of the solutions and its practical polynomial-time complexity. Recently, many deep learning (DL) based methods have been developed for this task and yielded remarkable performance, resulting in a paradigm shift in this field. To combine the strengths of both approaches, we propose in this study to integrate the graph-cut approach into a deep learning network for end-to-end learning. Unfortunately, backward propagation through the graph-cut module in the DL network is challenging due to the combinatorial nature of the graph-cut algorithm. To tackle this challenge, we propose a novel residual graph-cut loss and a quasi-residual connection, enabling the backward propagation of the gradients of the residual graph-cut loss for effective feature learning guided by the graph-cut segmentation model. In the inference phase, globally optimal segmentation is achieved with respect to the graph-cut energy defined on the optimized image features learned from DL networks. Experiments on the public AZH chronic wound data set and the pancreas cancer data set from the medical segmentation decathlon (MSD) demonstrated promising segmentation accuracy, and improved robustness against adversarial attacks.

LGOct 16, 2025

Learn to Change the World: Multi-level Reinforcement Learning with Model-Changing Actions

Ziqing Lu, Babak Hassibi, Lifeng Lai et al.

Reinforcement learning usually assumes a given or sometimes even fixed environment in which an agent seeks an optimal policy to maximize its long-term discounted reward. In contrast, we consider agents that are not limited to passive adaptations: they instead have model-changing actions that actively modify the RL model of world dynamics itself. Reconfiguring the underlying transition processes can potentially increase the agents' rewards. Motivated by this setting, we introduce the multi-layer configurable time-varying Markov decision process (MCTVMDP). In an MCTVMDP, the lower-level MDP has a non-stationary transition function that is configurable through upper-level model-changing actions. The agent's objective consists of two parts: Optimize the configuration policies in the upper-level MDP and optimize the primitive action policies in the lower-level MDP to jointly improve its expected long-term reward.

LGOct 15, 2025

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

Ziqing Lu, Lifeng Lai, Weiyu Xu

Reinforcement learning (RL) for the Markov Decision Process (MDP) has emerged in many security-related applications, such as autonomous driving, financial decisions, and drone/robot algorithms. In order to improve the robustness/defense of RL systems against adversaries, studying various adversarial attacks on RL systems is very important. Most previous work considered deterministic adversarial attack strategies in MDP, which the recipient (victim) agent can defeat by reversing the deterministic attacks. In this paper, we propose a provably ``invincible'' or ``uncounterable'' type of adversarial attack on RL. The attackers apply a rate-distortion information-theoretic approach to randomly change agents' observations of the transition kernel (or other properties) so that the agent gains zero or very limited information about the ground-truth kernel (or other properties) during the training. We derive an information-theoretic lower bound on the recipient agent's reward regret and show the impact of rate-distortion attacks on state-of-the-art model-based and model-free algorithms. We also extend this notion of an information-theoretic approach to other types of adversarial attack, such as state observation attacks.

LGJun 23, 2024

Feature compression is the root cause of adversarial fragility in neural network classifiers

Jingchao Gao, Ziqing Lu, Raghu Mudumbai et al.

In this paper, we uniquely study the adversarial robustness of deep neural networks (NN) for classification tasks against that of optimal classifiers. We look at the smallest magnitude of possible additive perturbations that can change a classifier's output. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural networks for classification. In particular, our theoretical results show that a neural network's adversarial robustness can degrade as the input dimension $d$ increases. Analytically, we show that neural networks' adversarial robustness can be only $1/\sqrt{d}$ of the best possible adversarial robustness of optimal classifiers. Our theories match remarkably well with numerical experiments of practically trained NN, including NN for ImageNet images. The matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.

THMay 9, 2023

To AI or not to AI, to Buy Local or not to Buy Local: A Mathematical Theory of Real Price

Huan Cai, Catherine Xu, Weiyu Xu

In the past several decades, the world's economy has become increasingly globalized. On the other hand, there are also ideas advocating the practice of ``buy local'', by which people buy locally produced goods and services rather than those produced farther away. In this paper, we establish a mathematical theory of real price that determines the optimal global versus local spending of an agent which achieves the agent's optimal tradeoff between spending and obtained utility. Our theory of real price depends on the asymptotic analysis of a Markov chain transition probability matrix related to the network of producers and consumers. We show that the real price of a product or service can be determined from the involved Markov chain matrix, and can be dramatically different from the product's label price. In particular, we show that the label prices of products and services are often not ``real'' or directly ``useful'': given two products offering the same myopic utility, the one with lower label price may not necessarily offer better asymptotic utility. This theory shows that the globality or locality of the products and services does have different impacts on the spending-utility tradeoff of a customer. The established mathematical theory of real price can be used to determine whether to adopt or not to adopt certain artificial intelligence (AI) technologies from an economic perspective.

LGJul 28, 2020

Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

Jirong Yi, Raghu Mudumbai, Weiyu Xu

We consider the theoretical problem of designing an optimal adversarial attack on a decision system that maximally degrades the achievable performance of the system as measured by the mutual information between the degraded signal and the label of interest. This problem is motivated by the existence of adversarial examples for machine learning classifiers. By adopting an information theoretic perspective, we seek to identify conditions under which adversarial vulnerability is unavoidable i.e. even optimally designed classifiers will be vulnerable to small adversarial perturbations. We present derivations of the optimal adversarial attacks for discrete and continuous signals of interest, i.e., finding the optimal perturbation distributions to minimize the mutual information between the degraded signal and a signal following a continuous or discrete distribution. In addition, we show that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available. This provides additional support to the recently proposed ``feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers. We also report on results from computational experiments to illustrate our theoretical results.

CVMar 26, 2020

Do Deep Minds Think Alike? Selective Adversarial Attacks for Fine-Grained Manipulation of Multiple Deep Neural Networks

Zain Khan, Jirong Yi, Raghu Mudumbai et al.

Recent works have demonstrated the existence of {\it adversarial examples} targeting a single machine learning system. In this paper we ask a simple but fundamental question of "selective fooling": given {\it multiple} machine learning systems assigned to solve the same classification problem and taking the same input signal, is it possible to construct a perturbation to the input signal that manipulates the outputs of these {\it multiple} machine learning systems {\it simultaneously} in arbitrary pre-defined ways? For example, is it possible to selectively fool a set of "enemy" machine learning systems but does not fool the other "friend" machine learning systems? The answer to this question depends on the extent to which these different machine learning systems "think alike". We formulate the problem of "selective fooling" as a novel optimization problem, and report on a series of experiments on the MNIST dataset. Our preliminary findings from these experiments show that it is in fact very easy to selectively manipulate multiple MNIST classifiers simultaneously, even when the classifiers are identical in their architectures, training algorithms and training datasets except for random initialization during training. This suggests that two nominally equivalent machine learning systems do not in fact "think alike" at all, and opens the possibility for many novel applications and deeper understandings of the working principles of deep neural networks.

CRMay 25, 2019

Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks

Jirong Yi, Hui Xie, Leixin Zhou et al.

Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. Drawing on ideas from information and coding theory, we propose a general class of defenses for detecting classifier errors caused by abnormally small input perturbations. We further show theoretical guarantees for the performance of this detection method. We present experimental results with (a) a voice recognition system, and (b) a digit recognition system using the MNIST database, to demonstrate the effectiveness of the proposed defense methods. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system.

CVMar 10, 2019

Fast Single Image Reflection Suppression via Convex Optimization

Yang Yang, Wenye Ma, Yin Zheng et al.

Removing undesired reflections from images taken through the glass is of great importance in computer vision. It serves as a means to enhance the image quality for aesthetic purposes as well as to preprocess images in machine learning and pattern recognition applications. We propose a convex model to suppress the reflection from a single input image. Our model implies a partial differential equation with gradient thresholding, which is solved efficiently using Discrete Cosine Transform. Extensive experiments on synthetic and real-world images demonstrate that our approach achieves desirable reflection suppression results and dramatically reduces the execution time.

ITJan 27, 2019

An Information-Theoretic Explanation for the Adversarial Fragility of AI Classifiers

Hui Xie, Jirong Yi, Weiyu Xu et al.

We present a simple hypothesis about a compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. We also propose a new method for detecting when small input perturbations cause classifier errors, and show theoretical guarantees for the performance of this detection method. We present experimental results with a voice recognition system to demonstrate this method. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system.

ITDec 31, 2018

How did Donald Trump Surprisingly Win the 2016 United States Presidential Election? an Information-Theoretic Perspective (Clean Sensing for Big Data Analytics:Optimal Strategies,Estimation Error Bounds Tighter than the Cramér-Rao Bound)

Weiyu Xu, Lifeng Lai, Amin Khajehnejad

Donald Trump was lagging behind in nearly all opinion polls leading up to the 2016 US presidential election, but he surprisingly won the election. This raises the following important questions: 1) why most opinion polls were not accurate in 2016? and 2) how to improve the accuracies of opinion polls? In this paper, we study the inaccuracies of opinion polls in the 2016 election through the lens of information theory. We first propose a general framework of parameter estimation, called clean sensing (polling), which performs optimal parameter estimation with sensing cost constraints, from heterogeneous and potentially distorted data sources. We then cast the opinion polling as a problem of parameter estimation from potentially distorted heterogeneous data sources, and derive the optimal polling strategy using heterogenous and possibly distorted data under cost constraints. Our results show that a larger number of data samples do not necessarily lead to better polling accuracy, which give a possible explanation of the inaccuracies of opinion polls in 2016. The optimal sensing strategy should instead optimally allocate sensing resources over heterogenous data sources according to several factors including data quality, and, moreover, for a particular data source, it should strike an optimal balance between the quality of data samples, and the quantity of data samples. As a byproduct of this research, in a general setting, we derive a group of new lower bounds on the mean-squared errors of general unbiased and biased parameter estimators. These new lower bounds can be tighter than the classical Cramér-Rao bound (CRB) and Chapman-Robbins bound. Our derivations are via studying the Lagrange dual problems of certain convex programs. The classical Cramér-Rao bound and Chapman-Robbins bound follow naturally from our results for special cases of these convex programs.

ITOct 26, 2018

Outlier Detection using Generative Models with Theoretical Performance Guarantees

Jirong Yi, Anh Duc Le, Tianming Wang et al.

This paper considers the problem of recovering signals from compressed measurements contaminated with sparse outliers, which has arisen in many applications. In this paper, we propose a generative model neural network approach for reconstructing the ground truth signals under sparse outliers. We propose an iterative alternating direction method of multipliers (ADMM) algorithm for solving the outlier detection problem via $\ell_1$ norm minimization, and a gradient descent algorithm for solving the outlier detection problem via squared $\ell_1$ norm minimization. We establish the recovery guarantees for reconstruction of signals using generative models in the presence of outliers, and give an upper bound on the number of outliers allowed for recovery. Our results are applicable to both the linear generator neural network and the nonlinear generator neural network with an arbitrary number of layers. We conduct extensive experiments using variational auto-encoder and deep convolutional generative adversarial networks, and the experimental results show that the signals can be successfully reconstructed under outliers using our approach. Our approach outperforms the traditional Lasso and $\ell_2$ minimization approach.

OCFeb 14, 2018

Necessary and Sufficient Null Space Condition for Nuclear Norm Minimization in Low-Rank Matrix Recovery

Jirong Yi, Weiyu Xu

Low-rank matrix recovery has found many applications in science and engineering such as machine learning, signal processing, collaborative filtering, system identification, and Euclidean embedding. But the low-rank matrix recovery problem is an NP hard problem and thus challenging. A commonly used heuristic approach is the nuclear norm minimization. In [12,14,15], the authors established the necessary and sufficient null space conditions for nuclear norm minimization to recover every possible low-rank matrix with rank at most r (the strong null space condition). In addition, in [12], Oymak et al. established a null space condition for successful recovery of a given low-rank matrix (the weak null space condition) using nuclear norm minimization, and derived the phase transition for the nuclear norm minimization. In this paper, we show that the weak null space condition in [12] is only a sufficient condition for successful matrix recovery using nuclear norm minimization, and is not a necessary condition as claimed in [12]. In this paper, we further give a weak null space condition for low-rank matrix recovery, which is both necessary and sufficient for the success of nuclear norm minimization. At the core of our derivation are an inequality for characterizing the nuclear norms of block matrices, and the conditions for equality to hold in that inequality.

ITNov 4, 2017

Separation-Free Super-Resolution from Compressed Measurements is Possible: an Orthonormal Atomic Norm Minimization Approach

Weiyu Xu, Jirong Yi, Soura Dasgupta et al.

We consider the problem of recovering the superposition of $R$ distinct complex exponential functions from compressed non-uniform time-domain samples. Total Variation (TV) minimization or atomic norm minimization was proposed in the literature to recover the $R$ frequencies or the missing data. However, it is known that in order for TV minimization and atomic norm minimization to recover the missing data or the frequencies, the underlying $R$ frequencies are required to be well-separated, even when the measurements are noiseless. This paper shows that the Hankel matrix recovery approach can super-resolve the $R$ complex exponentials and their frequencies from compressed non-uniform measurements, regardless of how close their frequencies are to each other. We propose a new concept of orthonormal atomic norm minimization (OANM), and demonstrate that the success of Hankel matrix recovery in separation-free super-resolution comes from the fact that the nuclear norm of a Hankel matrix is an orthonormal atomic norm. More specifically, we show that, in traditional atomic norm minimization, the underlying parameter values $\textbf{must}$ be well separated to achieve successful signal recovery, if the atoms are changing continuously with respect to the continuously-valued parameter. In contrast, for the OANM, it is possible the OANM is successful even though the original atoms can be arbitrarily close. As a byproduct of this research, we provide one matrix-theoretic inequality of nuclear norm, and give its proof from the theory of compressed sensing.

DCMar 14, 2017

Distributed Dual Coordinate Ascent in General Tree Networks and Communication Network Effect on Synchronous Machine Learning

Myung Cho, Lifeng Lai, Weiyu Xu

Due to the big size of data and limited data storage volume of a single computer or a single server, data are often stored in a distributed manner. Thus, performing large-scale machine learning operations with the distributed datasets through communication networks is often required. In this paper, we study the convergence rate of the distributed dual coordinate ascent for distributed machine learning problems in a general tree-structured network. Since a tree network model can be understood as the generalization of a star network model, our algorithm can be thought of as the generalization of the distributed dual coordinate ascent in a star network model. We provide the convergence rate of the distributed dual coordinate ascent over a general tree network in a recursive manner and analyze the network effect on the convergence rate. Secondly, by considering network communication delays, we optimize the distributed dual coordinate ascent algorithm to maximize its convergence speed. From our analytical result, we can choose the optimal number of local iterations depending on the communication delay severity to achieve the fastest convergence speed. In numerical experiments, we consider machine learning scenarios over communication networks, where local workers cannot directly reach to a central node due to constraints in communication, and demonstrate that the usability of our distributed dual coordinate ascent algorithm in tree networks. Additionally, we show that adapting number of local and global iterations to network communication delays in the distributed dual coordinated ascent algorithm can improve its convergence speed.

ITSep 15, 2015

Precise Phase Transition of Total Variation Minimization

Bingwen Zhang, Weiyu Xu, Jian-Feng Cai et al.

Characterizing the phase transitions of convex optimizations in recovering structured signals or data is of central importance in compressed sensing, machine learning and statistics. The phase transitions of many convex optimization signal recovery methods such as $\ell_1$ minimization and nuclear norm minimization are well understood through recent years' research. However, rigorously characterizing the phase transition of total variation (TV) minimization in recovering sparse-gradient signal is still open. In this paper, we fully characterize the phase transition curve of the TV minimization. Our proof builds on Donoho, Johnstone and Montanari's conjectured phase transition curve for the TV approximate message passing algorithm (AMP), together with the linkage between the minmax Mean Square Error of a denoising problem and the high-dimensional convex geometry for TV minimization.

ITJul 14, 2015

Projected Wirtinger Gradient Descent for Low-Rank Hankel Matrix Completion in Spectral Compressed Sensing

Jian-Feng Cai, Suhui Liu, Weiyu Xu

This paper considers reconstructing a spectrally sparse signal from a small number of randomly observed time-domain samples. The signal of interest is a linear combination of complex sinusoids at $R$ distinct frequencies. The frequencies can assume any continuous values in the normalized frequency domain $[0,1)$. After converting the spectrally sparse signal recovery into a low rank structured matrix completion problem, we propose an efficient feasible point approach, named projected Wirtinger gradient descent (PWGD) algorithm, to efficiently solve this structured matrix completion problem. We further accelerate our proposed algorithm by a scheme inspired by FISTA. We give the convergence analysis of our proposed algorithms. Extensive numerical experiments are provided to illustrate the efficiency of our proposed algorithm. Different from earlier approaches, our algorithm can solve problems of very large dimensions very efficiently.

ITMar 10, 2015

Robust recovery of complex exponential signals from random Gaussian projections via low rank Hankel matrix reconstruction

Jian-Feng Cai, Xiaobo Qu, Weiyu Xu et al.

This paper explores robust recovery of a superposition of $R$ distinct complex exponential functions from a few random Gaussian projections. We assume that the signal of interest is of $2N-1$ dimensional and $R<<2N-1$. This framework covers a large class of signals arising from real applications in biology, automation, imaging science, etc. To reconstruct such a signal, our algorithm is to seek a low-rank Hankel matrix of the signal by minimizing its nuclear norm subject to the consistency on the sampled data. Our theoretical results show that a robust recovery is possible as long as the number of projections exceeds $O(R\ln^2N)$. No incoherence or separation condition is required in our proof. Our method can be applied to spectral compressed sensing where the signal of interest is a superposition of $R$ complex sinusoids. Compared to existing results, our result here does not need any separation condition on the frequencies, while achieving better or comparable bounds on the number of measurements. Furthermore, our method provides theoretical guidance on how many samples are required in the state-of-the-art non-uniform sampling in NMR spectroscopy. The performance of our algorithm is further demonstrated by numerical experiments.

ITDec 2, 2013

Precise Semidefinite Programming Formulation of Atomic Norm Minimization for Recovering d-Dimensional ($d\geq 2$) Off-the-Grid Frequencies

Weiyu Xu, Jian-Feng Cai, Kumar Vijay Mishra et al.

Recent research in off-the-grid compressed sensing (CS) has demonstrated that, under certain conditions, one can successfully recover a spectrally sparse signal from a few time-domain samples even though the dictionary is continuous. In particular, atomic norm minimization was proposed in \cite{tang2012csotg} to recover $1$-dimensional spectrally sparse signal. However, in spite of existing research efforts \cite{chi2013compressive}, it was still an open problem how to formulate an equivalent positive semidefinite program for atomic norm minimization in recovering signals with $d$-dimensional ($d\geq 2$) off-the-grid frequencies. In this paper, we settle this problem by proposing equivalent semidefinite programming formulations of atomic norm minimization to recover signals with $d$-dimensional ($d\geq 2$) off-the-grid frequencies.

ITJul 17, 2013

Universally Elevating the Phase Transition Performance of Compressed Sensing: Non-Isometric Matrices are Not Necessarily Bad Matrices

Weiyu Xu, Myung Cho

In compressed sensing problems, $\ell_1$ minimization or Basis Pursuit was known to have the best provable phase transition performance of recoverable sparsity among polynomial-time algorithms. It is of great theoretical and practical interest to find alternative polynomial-time algorithms which perform better than $\ell_1$ minimization. \cite{Icassp reweighted l_1}, \cite{Isit reweighted l_1}, \cite{XuScaingLaw} and \cite{iterativereweightedjournal} have shown that a two-stage re-weighted $\ell_1$ minimization algorithm can boost the phase transition performance for signals whose nonzero elements follow an amplitude probability density function (pdf) $f(\cdot)$ whose $t$-th derivative $f^{t}(0) \neq 0$ for some integer $t \geq 0$. However, for signals whose nonzero elements are strictly suspended from zero in distribution (for example, constant-modulus, only taking values `$+d$' or `$-d$' for some nonzero real number $d$), no polynomial-time signal recovery algorithms were known to provide better phase transition performance than plain $\ell_1$ minimization, especially for dense sensing matrices. In this paper, we show that a polynomial-time algorithm can universally elevate the phase-transition performance of compressed sensing, compared with $\ell_1$ minimization, even for signals with constant-modulus nonzero elements. Contrary to conventional wisdoms that compressed sensing matrices are desired to be isometric, we show that non-isometric matrices are not necessarily bad sensing matrices. In this paper, we also provide a framework for recovering sparse signals when sensing matrices are not isometric.

ITJun 11, 2013

Precisely Verifying the Null Space Conditions in Compressed Sensing: A Sandwiching Algorithm

Myung Cho, Weiyu Xu

In this paper, we propose new efficient algorithms to verify the null space condition in compressed sensing (CS). Given an $(n-m) \times n$ ($m>0$) CS matrix $A$ and a positive $k$, we are interested in computing $\displaystyle α_k = \max_{\{z: Az=0,z\neq 0\}}\max_{\{K: |K|\leq k\}}$ ${\|z_K \|_{1}}{\|z\|_{1}}$, where $K$ represents subsets of $\{1,2,...,n\}$, and $|K|$ is the cardinality of $K$. In particular, we are interested in finding the maximum $k$ such that $α_k < {1}{2}$. However, computing $α_k$ is known to be extremely challenging. In this paper, we first propose a series of new polynomial-time algorithms to compute upper bounds on $α_k$. Based on these new polynomial-time algorithms, we further design a new sandwiching algorithm, to compute the \emph{exact} $α_k$ with greatly reduced complexity. When needed, this new sandwiching algorithm also achieves a smooth tradeoff between computational complexity and result accuracy. Empirical results show the performance improvements of our algorithm over existing known methods; and our algorithm outputs precise values of $α_k$, with much lower complexity than exhaustive search.

ITJan 28, 2013

Guarantees of Total Variation Minimization for Signal Recovery

Jian-Feng Cai, Weiyu Xu

In this paper, we consider using total variation minimization to recover signals whose gradients have a sparse support, from a small number of measurements. We establish the proof for the performance guarantee of total variation (TV) minimization in recovering \emph{one-dimensional} signal with sparse gradient support. This partially answers the open problem of proving the fidelity of total variation minimization in such a setting \cite{TVMulti}. In particular, we have shown that the recoverable gradient sparsity can grow linearly with the signal dimension when TV minimization is used. Recoverable sparsity thresholds of TV minimization are explicitly computed for 1-dimensional signal by using the Grassmann angle framework. We also extend our results to TV minimization for multidimensional signals. Stability of recovering signal itself using 1-D TV minimization has also been established through a property called "almost Euclidean property for 1-dimensional TV norm". We further give a lower bound on the number of random Gaussian measurements for recovering 1-dimensional signal vectors with $N$ elements and $K$-sparse gradients. Interestingly, the number of needed measurements is lower bounded by $Ω((NK)^{\frac{1}{2}})$, rather than the $O(K\log(N/K))$ bound frequently appearing in recovering $K$-sparse signal vectors.