QUANT-PHMay 8
Conjugate queries can helpEwin Tang, John Wright, Mark Zhandry
We give a natural problem over input quantum oracles $U$ which cannot be solved with exponentially many black-box queries to $U$ and $U^\dagger$, but which can be solved with constant many queries to $U$ and $U^*$, or $U$ and $U^{\mathrm{T}}$. We also demonstrate a quantum commitment scheme that is secure against adversaries that query only $U$ and $U^\dagger$, but is insecure if the adversary can query $U^*$. These results show that conjugate and transpose queries do give more power to quantum algorithms, lending credence to the idea put forth by Zhandry that cryptographic primitives should prove security against these forms of queries. Our key lemma is that any circuit using $q$ forward and inverse queries to a state preparation unitary for a state $σ$ can be simulated to $\varepsilon$ error with $n = \mathcal{O}(q^2/\varepsilon)$ copies of $σ$. Consequently, for decision tasks, algorithms using (forward and inverse) state preparation queries only ever perform quadratically better than sample access. We also identify a motif, which we call the "acorn trick", where generically strengthening a quantum resource can be possible if the output is allowed to be random, bypassing no-go theorems for deterministic algorithms. We demonstrate this idea for several settings, including controlization and purification.
IMJul 23, 2022
Boosting the Efficiency of Parametric Detection with Hierarchical Neural NetworksJingkai Yan, Robert Colgan, John Wright et al.
Gravitational wave astronomy is a vibrant field that leverages both classic and modern data processing techniques for the understanding of the universe. Various approaches have been proposed for improving the efficiency of the detection scheme, with hierarchical matched filtering being an important strategy. Meanwhile, deep learning methods have recently demonstrated both consistency with matched filtering methods and remarkable statistical performance. In this work, we propose Hierarchical Detection Network (HDN), a novel approach to efficient detection that combines ideas from hierarchical matching and deep learning. The network is trained using a novel loss function, which encodes simultaneously the goals of statistical accuracy and efficiency. We discuss the source of complexity reduction of the proposed model, and describe a general recipe for initialization with each layer specializing in different regions. We demonstrate the performance of HDN with experiments using open LIGO data and synthetic injections, and observe with two-layer models a $79\%$ efficiency gain compared with matched filtering at an equal error rate of $0.2\%$. Furthermore, we show how training a three-layer HDN initialized using two-layer model can further boost both accuracy and efficiency, highlighting the power of multiple simple layers in efficient detection.
LGOct 16, 2023
TpopT: Efficient Trainable Template Optimization on Low-Dimensional ManifoldsJingkai Yan, Shiyu Wang, Xinyu Rain Wei et al.
In scientific and engineering scenarios, a recurring task is the detection of low-dimensional families of signals or patterns. A classic family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank. While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality. In this work, we study TpopT (TemPlate OPTimization) as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability. We provide a theoretical analysis of the convergence of Riemannian gradient descent for TpopT, and prove that it has a superior dimension scaling to covering. We also propose a practical TpopT framework for nonparametric signal sets, which incorporates techniques of embedding and kernel interpolation, and is further configurable into a trainable network architecture by unrolled optimization. The proposed trainable TpopT exhibits significantly improved efficiency-accuracy tradeoffs for gravitational wave detection, where matched filtering is currently a method of choice. We further illustrate the general applicability of this approach with experiments on handwritten digit data.
CVMar 9, 2022Code
Resource-Efficient Invariant Networks: Exponential Gains by Unrolled OptimizationSam Buchanan, Jingkai Yan, Ellie Haber et al.
Achieving invariance to nuisance transformations is a fundamental challenge in the construction of robust and reliable vision systems. Existing approaches to invariance scale exponentially with the dimension of the family of transformations, making them unable to cope with natural variabilities in visual data such as changes in pose and perspective. We identify a common limitation of these approaches--they rely on sampling to traverse the high-dimensional space of transformations--and propose a new computational primitive for building invariant networks based instead on optimization, which in many scenarios provides a provably more efficient method for high-dimensional exploration than sampling. We provide empirical and theoretical corroboration of the efficiency gains and soundness of our proposed method, and demonstrate its utility in constructing an efficient invariant network for a simple hierarchical object detection task when combined with unrolled optimization. Code for our networks and experiments is available at https://github.com/sdbuch/refine.
QUANT-PHApr 23
Are controlled unitaries helpful?Ewin Tang, John Wright
Many quantum algorithms, to compute some property of a unitary $U$, require access not just to $U$, but to $cU$, the unitary with a control qubit. We show that having access to $cU$ does not help for a large class of quantum problems. For a quantum circuit which uses $cU$ and $cU^\dagger$ and outputs $|ψ(U)\rangle$, we show how to "decontrol" the circuit into one which uses only $U$ and $U^\dagger$ and outputs $|ψ(φU)\rangle$ for a uniformly random phase $φ$, with a small amount of time and space overhead. When we only care about the output state up to a global phase on $U$, then the decontrolled circuit suffices. Stated differently, $cU$ is only helpful because it contains global phase information about $U$. A version of our procedure is described in an appendix of Sheridan, Maslov, and Mosca (arXiv:0810.3843). Our goal with this work is to popularize this result by generalizing it and investigating its implications, in order to counter negative results in the literature which might lead one to believe that decontrolling is not possible. As an application, we give a simple proof for the existence of unitary ensembles which are pseudorandom under access to $U$, $U^\dagger$, $cU$, and $cU^\dagger$.
QUANT-PHApr 23
Amplitude amplification and estimation require inversesEwin Tang, John Wright
We prove that the generic quantum speedups for brute-force search and counting only hold when the process we apply them to can be efficiently inverted. The algorithms speeding up these problems, amplitude amplification and amplitude estimation, assume the ability to apply a state preparation unitary $U$ and its inverse $U^\dagger$; we give problem instances based on trace estimation where no algorithm which uses only $U$ beats the naive, quadratically slower approach. Our proof of this is simple and goes through the compressed oracle method introduced by Zhandry. Since these two subroutines are responsible for the ubiquity of the quadratic "Grover" speedup in quantum algorithms, our result explains why such speedups are far harder to come by in the settings of quantum learning, metrology, and sensing. In these settings, $U$ models the evolution of an experimental system, so implementing $U^\dagger$ can be much harder -- tantamount to reversing time within the system. Our result suggests a dichotomy: without inverse access, quantum speedups are scarce; with it, quantum speedups abound.
LGSep 27, 2024
Sequencing the Neurome: Towards Scalable Exact Parameter Reconstruction of Black-Box Neural NetworksJudah Goldfeder, Quinten Roets, Gabe Guo et al.
Inferring the exact parameters of a neural network with only query access is an NP-Hard problem, with few practical existing algorithms. Solutions would have major implications for security, verification, interpretability, and understanding biological networks. The key challenges are the massive parameter space, and complex non-linear relationships between neurons. We resolve these challenges using two insights. First, we observe that almost all networks used in practice are produced by random initialization and first order optimization, an inductive bias that drastically reduces the practical parameter space. Second, we present a novel query generation algorithm that produces maximally informative samples, letting us untangle the non-linear relationships efficiently. We demonstrate reconstruction of a hidden network containing over 1.5 million parameters, and of one 7 layers deep, the largest and deepest reconstructions to date, with max parameter difference less than 0.0001, and illustrate robustness and scalability across a variety of architectures, datasets, and training procedures.
DCNov 22, 2019Code
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack IntegrationHasan Genc, Seah Kim, Alon Amid et al.
DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of System-on-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source*, full-stack DNN accelerator generator. Gemmini generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects. Gemmini-generated accelerators have also been fabricated, delivering up to three orders-of-magnitude speedups over high-performance CPUs on various DNN benchmarks. * https://github.com/ucb-bar/gemmini
LGFeb 24, 2025
Fast, Accurate Manifold Denoising by Tunneling Riemannian OptimizationShiyu Wang, Mariam Avagyan, Yihan Shen et al.
Learned denoisers play a fundamental role in various signal generation (e.g., diffusion models) and reconstruction (e.g., compressed sensing) architectures, whose success derives from their ability to leverage low-dimensional structure in data. Existing denoising methods, however, either rely on local approximations that require a linear scan of the entire dataset or treat denoising as generic function approximation problems, often sacrificing efficiency and interpretability. We consider the problem of efficiently denoising a new noisy data point sampled from an unknown $d$-dimensional manifold $M \in \mathbb{R}^D$, using only noisy samples. This work proposes a framework for test-time efficient manifold denoising, by framing the concept of "learning-to-denoise" as "learning-to-optimize". We have two technical innovations: (i) online learning methods which learn to optimize over the manifold of clean signals using only noisy data, effectively "growing" an optimizer one sample at a time. (ii) mixed-order methods which guarantee that the learned optimizers achieve global optimality, ensuring both efficiency and near-optimal denoising performance. We corroborate these claims with theoretical analyses of both the complexity and denoising performance of mixed-order traversal. Our experiments on scientific manifolds demonstrate significantly improved complexity-performance tradeoffs compared to nearest neighbor search, which underpins existing provable denoising approaches based on exhaustive search.
MLJun 23, 2025
Local Averaging Accurately Distills Manifold Structure From Noisy DataYihan Shen, Shiyu Wang, Arnaud Lamy et al.
High-dimensional data are ubiquitous, with examples ranging from natural images to scientific datasets, and often reside near low-dimensional manifolds. Leveraging this geometric structure is vital for downstream tasks, including signal denoising, reconstruction, and generation. However, in practice, the manifold is typically unknown and only noisy samples are available. A fundamental approach to uncovering the manifold structure is local averaging, which is a cornerstone of state-of-the-art provable methods for manifold fitting and denoising. However, to the best of our knowledge, there are no works that rigorously analyze the accuracy of local averaging in a manifold setting in high-noise regimes. In this work, we provide theoretical analyses of a two-round mini-batch local averaging method applied to noisy samples drawn from a $d$-dimensional manifold $\mathcal M \subset \mathbb{R}^D$, under a relatively high-noise regime where the noise size is comparable to the reach $τ$. We show that with high probability, the averaged point $\hat{\mathbf q}$ achieves the bound $d(\hat{\mathbf q}, \mathcal M) \leq σ\sqrt{d\left(1+\frac{κ\mathrm{diam}(\mathcal {M})}{\log(D)}\right)}$, where $σ, \mathrm{diam(\mathcal M)},κ$ denote the standard deviation of the Gaussian noise, manifold's diameter and a bound on its extrinsic curvature, respectively. This is the first analysis of local averaging accuracy over the manifold in the relatively high noise regime where $σ\sqrt{D} \approx τ$. The proposed method can serve as a preprocessing step for a wide range of provable methods designed for lower-noise regimes. Additionally, our framework can provide a theoretical foundation for a broad spectrum of denoising and dimensionality reduction methods that rely on local averaging techniques.
MLJul 29, 2021
Deep Networks Provably Classify Data on CurvesTingran Wang, Sam Buchanan, Dar Gilboa et al.
Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure -- a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves. We prove that when (i) the network depth is large relative to certain geometric properties that set the difficulty of the problem and (ii) the network width and number of samples is polynomial in the depth, randomly-initialized gradient descent quickly learns to correctly classify all points on the two curves with high probability. To our knowledge, this is the first generalization guarantee for deep networks with nonlinear data that depends only on intrinsic data properties. Our analysis proceeds by a reduction to dynamics in the neural tangent kernel (NTK) regime, where the network depth plays the role of a fitting resource in solving the classification problem. In particular, via fine-grained control of the decay properties of the NTK, we demonstrate that when the network is sufficiently deep, the NTK can be locally approximated by a translationally invariant operator on the manifolds and stably inverted over smooth functions, which guarantees convergence and generalization.
LGJun 17, 2021
Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix RecoveryJunhui Zhang, Jingkai Yan, John Wright
We propose a new framework -- Square Root Principal Component Pursuit -- for low-rank matrix recovery from observations corrupted with noise and outliers. Inspired by the square root Lasso, this new formulation does not require prior knowledge of the noise level. We show that a single, universal choice of the regularization parameter suffices to achieve reconstruction error proportional to the (a priori unknown) noise level. In comparison, previous formulations such as stable PCP rely on noise-dependent parameters to achieve similar performance, and are therefore challenging to deploy in applications where the noise level is unknown. We validate the effectiveness of our new method through experiments on simulated and real datasets. Our simulations corroborate the claim that a universal choice of the regularization parameter yields near optimal performance across a range of noise levels, indicating that the proposed method outperforms the (somewhat loose) bound proved here.
LGMay 21, 2021
ReduNet: A White-box Deep Network from the Principle of Maximizing Rate ReductionKwan Ho Ryan Chan, Yaodong Yu, Chong You et al.
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We argue that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, which shares common characteristics of modern deep networks. The deep layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer via forward propagation, although they are amenable to fine-tuning via back propagation. All components of so-obtained "white-box" network have precise optimization, statistical, and geometric interpretation. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation in the invariant setting suggests a trade-off between sparsity and invariance, and also indicates that such a deep convolution network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments clearly verify the effectiveness of both the rate reduction objective and the associated ReduNet. All code and data are available at \url{https://github.com/Ma-Lab-Berkeley}.
IMApr 8, 2021
Generalized Approach to Matched Filtering using Neural NetworksJingkai Yan, Mariam Avagyan, Robert E. Colgan et al.
Gravitational wave science is a pioneering field with rapidly evolving data analysis methodology currently assimilating and inventing deep learning techniques. The bulk of the sophisticated flagship searches of the field rely on the time-tested matched filtering principle within their core. In this paper, we make a key observation on the relationship between the emerging deep learning and the traditional techniques: matched filtering is formally equivalent to a particular neural network. This means that a neural network can be constructed analytically to exactly implement matched filtering, and can be further trained on data or boosted with additional complexity for improved performance. Moreover, we show that the proposed neural network architecture can outperform matched filtering, both with or without knowledge of a prior on the parameter distribution. When a prior is given, the proposed neural network can approach the statistically optimal performance. We also propose and investigate two different neural network architectures MNet-Shallow and MNet-Deep, both of which implement matched filtering at initialization and can be trained on data. MNet-Shallow has simpler structure, while MNet-Deep is more flexible and can deal with a wider range of distributions. Our theoretical findings are corroborated by experiments using real LIGO data and synthetic injections, where our proposed methods significantly outperform matched filtering at false positive rates above $5\times 10^{-3}\%$. The fundamental equivalence between matched filtering and neural networks allows us to define a "complexity standard candle" to characterize the relative complexity of the different approaches to gravitational wave signal searches in a common framework. Finally, our results suggest new perspectives on the role of deep learning in gravitational wave detection.
LGOct 27, 2020
Deep Networks from the Principle of Rate ReductionKwan Ho Ryan Chan, Yaodong Yu, Chong You et al.
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer. The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme. All components of this "white box" network have precise optimization, statistical, and geometric interpretation. This principled framework also reveals and justifies the role of multi-channel lifting and sparse coding in early stage of deep networks. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation also indicates that such a convolutional network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments indicate that so constructed deep network can already learn a good discriminative representation even without any back propagation training.
MLAug 25, 2020
Deep Networks and the Multiple Manifold ProblemSam Buchanan, Dar Gilboa, John Wright
We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere. We provide an analysis of the one-dimensional case, proving for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width $n$ grows as a sufficiently large polynomial in $L$, and the number of i.i.d. samples from the manifolds is polynomial in $L$, randomly-initialized gradient descent rapidly learns to classify the two manifolds perfectly with high probability. Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem: the depth acts as a fitting resource, with larger depths corresponding to smoother networks that can more readily separate the class manifolds, and the width acts as a statistical resource, enabling concentration of the randomly-initialized network and its gradients. The argument centers around the neural tangent kernel and its role in the nonasymptotic analysis of training overparameterized neural networks; to this literature, we contribute essentially optimal rates of concentration for the neural tangent kernel of deep fully-connected networks, requiring width $n \gtrsim L\,\mathrm{poly}(d_0)$ to achieve uniform concentration of the initial kernel over a $d_0$-dimensional submanifold of the unit sphere $\mathbb{S}^{n_0-1}$, and a nonasymptotic framework for establishing generalization of networks trained in the NTK regime with structured data. The proof makes heavy use of martingale concentration to optimally treat statistical dependencies across layers of the initial random network. This approach should be of use in establishing similar results for other network architectures.
LGJul 14, 2020
From Symmetry to Geometry: Tractable Nonconvex ProblemsYuqian Zhang, Qing Qu, John Wright
As science and engineering have become increasingly data-driven, the role of optimization has expanded to touch almost every stage of the data analysis pipeline, from signal and data acquisition to modeling and prediction. The optimization problems encountered in practice are often nonconvex. While challenges vary from problem to problem, one common source of nonconvexity is nonlinearity in the data or measurement model. Nonlinear models often exhibit symmetries, creating complicated, nonconvex objective landscapes, with multiple equivalent solutions. Nevertheless, simple methods (e.g., gradient descent) often perform surprisingly well in practice. The goal of this survey is to highlight a class of tractable nonconvex problems, which can be understood through the lens of symmetries. These problems exhibit a characteristic geometric structure: local minimizers are symmetric copies of a single "ground truth" solution, while other critical points occur at balanced superpositions of symmetric copies of the ground truth, and exhibit negative curvature in directions that break the symmetry. This structure enables efficient methods to obtain global minimizers. We discuss examples of this phenomenon arising from a wide range of problems in imaging, signal processing, and data analysis. We highlight the key role of symmetry in shaping the objective landscape and discuss the different roles of rotational and discrete symmetries. This area is rich with observed phenomena and open problems; we close by highlighting directions for future research.
LGJan 20, 2020
Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and ApplicationsQing Qu, Zhihui Zhu, Xiao Li et al.
The problem of finding the sparsest vector (direction) in a low dimensional subspace can be considered as a homogeneous variant of the sparse recovery problem, which finds applications in robust subspace recovery, dictionary learning, sparse blind deconvolution, and many other problems in signal processing and machine learning. However, in contrast to the classical sparse recovery problem, the most natural formulation for finding the sparsest vector in a subspace is usually nonconvex. In this paper, we overview recent advances on global nonconvex optimization theory for solving this problem, ranging from geometric analysis of its optimization landscapes, to efficient optimization algorithms for solving the associated nonconvex optimization problem, to applications in machine intelligence, representation learning, and imaging sciences. Finally, we conclude this review by pointing out several interesting open problems for future research.
SPAug 28, 2019
Short-and-Sparse Deconvolution -- A Geometric ApproachYenson Lau, Qing Qu, Han-Wen Kuo et al.
Short-and-sparse deconvolution (SaSD) is the problem of extracting localized, recurring motifs in signals with spatial or temporal structure. Variants of this problem arise in applications such as image deblurring, microscopy, neural spike sorting, and more. The problem is challenging in both theory and practice, as natural optimization formulations are nonconvex. Moreover, practical deconvolution problems involve smooth motifs (kernels) whose spectra decay rapidly, resulting in poor conditioning and numerical challenges. This paper is motivated by recent theoretical advances, which characterize the optimization landscape of a particular nonconvex formulation of SaSD. This is used to derive a $provable$ algorithm which exactly solves certain non-practical instances of the SaSD problem. We leverage the key ideas from this theory (sphere constraints, data-driven initialization) to develop a $practical$ algorithm, which performs well on data arising from a range of application areas. We highlight key additional challenges posed by the ill-conditioning of real SaSD problems, and suggest heuristics (acceleration, continuation, reweighting) to mitigate them. Experiments demonstrate both the performance and generality of the proposed method.
LGJun 6, 2019
Complete Dictionary Learning via $\ell^4$-Norm Maximization over the Orthogonal GroupYuexiang Zhai, Zitong Yang, Zhenyu Liao et al.
This paper considers the fundamental problem of learning a complete (orthogonal) dictionary from samples of sparsely generated signals. Most existing methods solve the dictionary (and sparse representations) based on heuristic algorithms, usually without theoretical guarantees for either optimality or complexity. The recent $\ell^1$-minimization based methods do provide such guarantees but the associated algorithms recover the dictionary one column at a time. In this work, we propose a new formulation that maximizes the $\ell^4$-norm over the orthogonal group, to learn the entire dictionary. We prove that under a random data model, with nearly minimum sample complexity, the global optima of the $\ell^4$ norm are very close to signed permutations of the ground truth. Inspired by this observation, we give a conceptually simple and yet effective algorithm based on "matching, stretching, and projection" (MSP). The algorithm provably converges locally at a superlinear (cubic) rate and cost per iteration is merely an SVD. In addition to strong theoretical guarantees, experiments show that the new algorithm is significantly more efficient and effective than existing methods, including KSVD and $\ell^1$-based methods. Preliminary experimental results on mixed real imagery data clearly demonstrate advantages of so learned dictionary over classic PCA bases.
CVJan 7, 2019
On the Global Geometry of Sphere-Constrained Sparse Blind DeconvolutionYuqian Zhang, Yenson Lau, Han-Wen Kuo et al.
Blind deconvolution is the problem of recovering a convolutional kernel $\boldsymbol a_0$ and an activation signal $\boldsymbol x_0$ from their convolution $\boldsymbol y = \boldsymbol a_0 \circledast \boldsymbol x_0$. This problem is ill-posed without further constraints or priors. This paper studies the situation where the nonzero entries in the activation signal are sparsely and randomly populated. We normalize the convolution kernel to have unit Frobenius norm and cast the sparse blind deconvolution problem as a nonconvex optimization problem over the sphere. With this spherical constraint, every spurious local minimum turns out to be close to some signed shift truncation of the ground truth, under certain hypotheses. This benign property motivates an effective two stage algorithm that recovers the ground truth from the partial information offered by a suboptimal local minimum. This geometry-inspired algorithm recovers the ground truth for certain microscopy problems, also exhibits promising performance in the more challenging image deblurring problem. Our insights into the global geometry and the two stage algorithm extend to the convolutional dictionary learning problem, where a superposition of multiple convolution signals is observed.
SPJan 2, 2019
Geometry and Symmetry in Short-and-Sparse DeconvolutionHan-Wen Kuo, Yenson Lau, Yuqian Zhang et al.
We study the $\textit{Short-and-Sparse (SaS) deconvolution}$ problem of recovering a short signal $\mathbf a_0$ and a sparse signal $\mathbf x_0$ from their convolution. We propose a method based on nonconvex optimization, which under certain conditions recovers the target short and sparse signals, up to a signed shift symmetry which is intrinsic to this model. This symmetry plays a central role in shaping the optimization landscape for deconvolution. We give a $\textit{regional analysis}$, which characterizes this landscape geometrically, on a union of subspaces. Our geometric characterization holds when the length-$p_0$ short signal $\mathbf a_0$ has shift coherence $μ$, and $\mathbf x_0$ follows a random sparsity model with sparsity rate $θ\in \Bigl[\frac{c_1}{p_0}, \frac{c_2}{p_0\sqrtμ+ \sqrt{p_0}}\Bigr]\cdot\frac{1}{\log^2p_0}$. Based on this geometry, we give a provable method that successfully solves SaS deconvolution with high probability.
SPJun 1, 2018
Structured Local Optima in Sparse Blind DeconvolutionYuqian Zhang, Han-Wen Kuo, John Wright
Blind deconvolution is a ubiquitous problem of recovering two unknown signals from their convolution. Unfortunately, this is an ill-posed problem in general. This paper focuses on the {\em short and sparse} blind deconvolution problem, where the one unknown signal is short and the other one is sparsely and randomly supported. This variant captures the structure of the unknown signals in several important applications. We assume the short signal to have unit $\ell^2$ norm and cast the blind deconvolution problem as a nonconvex optimization problem over the sphere. We demonstrate that (i) in a certain region of the sphere, every local optimum is close to some shift truncation of the ground truth, and (ii) for a generic short signal of length $k$, when the sparsity of activation signal $θ\lesssim k^{-2/3}$ and number of measurements $m\gtrsim poly(k)$, a simple initialization method together with a descent algorithm which escapes strict saddle points recovers a near shift truncation of the ground truth kernel.
CODec 3, 2017
Convolutional Phase Retrieval via Gradient DescentQing Qu, Yuqian Zhang, Yonina C. Eldar et al.
We study the convolutional phase retrieval problem, of recovering an unknown signal $\mathbf x \in \mathbb C^n $ from $m$ measurements consisting of the magnitude of its cyclic convolution with a given kernel $\mathbf a \in \mathbb C^m $. This model is motivated by applications such as channel estimation, optics, and underwater acoustic communication, where the signal of interest is acted on by a given channel/filter, and phase information is difficult or impossible to acquire. We show that when $\mathbf a$ is random and the number of observations $m$ is sufficiently large, with high probability $\mathbf x$ can be efficiently recovered up to a global phase shift using a combination of spectral initialization and generalized gradient descent. The main challenge is coping with dependencies in the measurement operator. We overcome this challenge by using ideas from decoupling theory, suprema of chaos processes and the restricted isometry property of random circulant matrices, and recent analysis of alternating minimization methods.
DSJul 31, 2017
Which Distribution Distances are Sublinearly Testable?Constantinos Daskalakis, Gautam Kamath, John Wright
Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far? This question of "identity testing" has received significant attention in the case of testing whether $p$ and $q$ are equal or far in total variation distance. However, in recent work, the following questions have been been critical to solving problems at the frontiers of distribution testing: -Alternative Distances: Can we test whether $p$ and $q$ are far in other distances, say Hellinger? -Tolerance: Can we test when $p$ and $q$ are close, rather than equal? And if so, close in which distances? Motivated by these questions, we characterize the complexity of distribution testing under a variety of distances, including total variation, $\ell_2$, Hellinger, Kullback-Leibler, and $χ^2$. For each pair of distances $d_1$ and $d_2$, we study the complexity of testing if $p$ and $q$ are close in $d_1$ versus far in $d_2$, with a focus on identifying which problems allow strongly sublinear testers (i.e., those with complexity $O(n^{1 - γ})$ for some $γ> 0$ where $n$ is the size of the support of the distributions $p$ and $q$). We provide matching upper and lower bounds for each case. We also study these questions in the case where we only have samples from $q$ (equivalence testing), showing qualitative differences from identity testing in terms of when tolerance can be achieved. Our algorithms fall into the classical paradigm of $χ^2$-statistics, but require crucial changes to handle the challenges introduced by each distance we consider. Finally, we survey other recent results in an attempt to serve as a reference for the complexity of various distribution testing problems.
ITFeb 22, 2016
A Geometric Analysis of Phase RetrievalJu Sun, Qing Qu, John Wright
Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of $m$ measurements, $y_k = |\mathbf a_k^* \mathbf x|$ for $k = 1, \dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e., length-$n$ complex vector)? This **generalized phase retrieval** (GPR) problem is a fundamental task in various disciplines, and has been the subject of much recent investigation. Natural nonconvex heuristics often work remarkably well for GPR in practice, but lack clear theoretical explanations. In this paper, we take a step towards bridging this gap. We prove that when the measurement vectors $\mathbf a_k$'s are generic (i.i.d. complex Gaussian) and the number of measurements is large enough ($m \ge C n \log^3 n$), with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) there are no spurious local minimizers, and all global minimizers are equal to the target signal $\mathbf x$, up to a global phase; and (2) the objective function has a negative curvature around each saddle point. This structure allows a number of iterative optimization methods to efficiently find a global minimizer, without special initialization. To corroborate the claim, we describe and analyze a second-order trust-region algorithm.
ITNov 15, 2015
Complete Dictionary Recovery over the Sphere II: Recovery by Riemannian Trust-region MethodJu Sun, Qing Qu, John Wright
We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb{R}^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per column, under suitable probability model for $\mathbf X_0$. Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. In a companion paper (arXiv:1511.03607), we have showed that with high probability our nonconvex formulation has no "spurious" local minimizers and around any saddle point the objective function has a negative directional curvature. In this paper, we take advantage of the particular geometric structure, and describe a Riemannian trust region algorithm that provably converges to a local minimizer with from arbitrary initializations. Such minimizers give excellent approximations to rows of $\mathbf X_0$. The rows are then recovered by linear programming rounding and deflation.
ITNov 11, 2015
Complete Dictionary Recovery over the Sphere I: Overview and the Geometric PictureJu Sun, Qing Qu, John Wright
We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb{R}^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per column, under suitable probability model for $\mathbf X_0$. In contrast, prior results based on efficient algorithms either only guarantee recovery when $\mathbf X_0$ has $O(\sqrt{n})$ zeros per column, or require multiple rounds of SDP relaxation to work when $\mathbf X_0$ has $O(n^{1-δ})$ nonzeros per column (for any constant $δ\in (0, 1)$). } Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint. In this paper, we provide a geometric characterization of the objective landscape. In particular, we show that the problem is highly structured: with high probability, (1) there are no "spurious" local minimizers; and (2) around all saddle points the objective has a negative directional curvature. This distinctive structure makes the problem amenable to efficient optimization algorithms. In a companion paper (arXiv:1511.04777), we design a second-order trust-region algorithm over the sphere that provably converges to a local minimizer from arbitrary initializations, despite the presence of saddle points.
OCOct 21, 2015
When Are Nonconvex Problems Not Scary?Ju Sun, Qing Qu, John Wright
In this note, we focus on smooth nonconvex optimization problems that obey: (1) all local minimizers are also global; and (2) around any saddle point or local maximizer, the objective has a negative directional curvature. Concrete applications such as dictionary learning, generalized phase retrieval, and orthogonal tensor decomposition are known to induce such structures. We describe a second-order trust-region algorithm that provably converges to a global minimizer efficiently, without special initializations. Finally we highlight alternatives, and open problems in this direction.
ITApr 26, 2015
Complete Dictionary Recovery over the SphereJu Sun, Qing Qu, John Wright
We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb R^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to the theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals, and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per column, under suitable probability model for $\mathbf X_0$. In contrast, prior results based on efficient algorithms provide recovery guarantees when $\mathbf X_0$ has only $O(n^{1-δ})$ nonzeros per column for any constant $δ\in (0, 1)$. Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. To show this apparently hard problem is tractable, we first provide a geometric characterization of the high-dimensional objective landscape, which shows that with high probability there are no "spurious" local minima. This particular geometric structure allows us to design a Riemannian trust region algorithm over the sphere that provably converges to one local minimizer with an arbitrary initialization, despite the presence of saddle points. The geometric approach we develop here may also shed light on other problems arising from nonconvex recovery of structured signals.
LGJan 31, 2015
A Batchwise Monotone Algorithm for Dictionary LearningHuan Wang, John Wright, Daniel Spielman
We propose a batchwise monotone algorithm for dictionary learning. Unlike the state-of-the-art dictionary learning algorithms which impose sparsity constraints on a sample-by-sample basis, we instead treat the samples as a batch, and impose the sparsity constraint on the whole. The benefit of batchwise optimization is that the non-zeros can be better allocated across the samples, leading to a better approximation of the whole. To accomplish this, we propose procedures to switch non-zeros in both rows and columns in the support of the coefficient matrix to reduce the reconstruction error. We prove in the proposed support switching procedure the objective of the algorithm, i.e., the reconstruction error, decreases monotonically and converges. Furthermore, we introduce a block orthogonal matching pursuit algorithm that also operates on sample batches to provide a warm start. Experiments on both natural image patches and UCI data sets show that the proposed algorithm produces a better approximation with the same sparsity levels compared to the state-of-the-art algorithms.
ITDec 15, 2014
Finding a sparse vector in a subspace: Linear sparsity using alternating directionsQing Qu, Ju Sun, John Wright
Is it possible to find the sparsest vector (direction) in a generic subspace $\mathcal{S} \subseteq \mathbb{R}^p$ with $\mathrm{dim}(\mathcal{S})= n < p$? This problem can be considered a homogeneous variant of the sparse recovery problem, and finds connections to sparse dictionary learning, sparse PCA, and many other problems in signal processing and machine learning. In this paper, we focus on a **planted sparse model** for the subspace: the target sparse vector is embedded in an otherwise random subspace. Simple convex heuristics for this planted recovery problem provably break down when the fraction of nonzero entries in the target sparse vector substantially exceeds $O(1/\sqrt{n})$. In contrast, we exhibit a relatively simple nonconvex approach based on alternating directions, which provably succeeds even when the fraction of nonzero entries is $Ω(1)$. To the best of our knowledge, this is the first practical algorithm to achieve linear scaling under the planted sparse model. Empirically, our proposed algorithm also succeeds in more challenging data models, e.g., sparse dictionary learning.
OCMar 29, 2014
Scalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal MethodsCun Mu, Yuqian Zhang, John Wright et al.
Recovering matrices from compressive and grossly corrupted observations is a fundamental problem in robust statistics, with rich applications in computer vision and machine learning. In theory, under certain conditions, this problem can be solved in polynomial time via a natural convex relaxation, known as Compressive Principal Component Pursuit (CPCP). However, all existing provable algorithms for CPCP suffer from superlinear per-iteration cost, which severely limits their applicability to large scale problems. In this paper, we propose provable, scalable and efficient methods to solve CPCP with (essentially) linear per-iteration cost. Our method combines classical ideas from Frank-Wolfe and proximal methods. In each iteration, we mainly exploit Frank-Wolfe to update the low-rank component with rank-one SVD and exploit the proximal step for the sparse term. Convergence results and implementation details are also discussed. We demonstrate the scalability of the proposed approach with promising numerical experiments on visual data.
MLJul 22, 2013
Square Deal: Lower Bounds and Improved Relaxations for Tensor RecoveryCun Mu, Bo Huang, John Wright et al.
Recovering a low-rank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a $K$-way tensor of length $n$ and Tucker rank $r$ from Gaussian measurements requires $Ω(r n^{K-1})$ observations. In contrast, a certain (intractable) nonconvex formulation needs only $O(r^K + nrK)$ observations. We introduce a very simple, new convex relaxation, which partially bridges this gap. Our new formulation succeeds with $O(r^{\lfloor K/2 \rfloor}n^{\lceil K/2 \rceil})$ observations. While these results pertain to Gaussian measurements, simulations strongly suggest that the new norm also outperforms the sum of nuclear norms for tensor completion from a random subset of entries. Our lower bound for the sum-of-nuclear-norms model follows from a new result on recovering signals with multiple sparse structures (e.g. sparse, low rank), which perhaps surprisingly demonstrates the significant suboptimality of the commonly used recovery approach via minimizing the sum of individual sparsity inducing norms (e.g. $l_1$, nuclear norm). Our new formulation for low-rank tensor recovery however opens the possibility in reducing the sample complexity by exploiting several structures jointly.
CVJul 4, 2013
Toward Guaranteed Illumination Models for Non-Convex ObjectsYuqian Zhang, Cun Mu, Han-wen Kuo et al.
Illumination variation remains a central challenge in object detection and recognition. Existing analyses of illumination variation typically pertain to convex, Lambertian objects, and guarantee quality of approximation in an average case sense. We show that it is possible to build V(vertex)-description convex cone models with worst-case performance guarantees, for non-convex Lambertian objects. Namely, a natural verification test based on the angle to the constructed cone guarantees to accept any image which is sufficiently well-approximated by an image of the object under some admissible lighting condition, and guarantees to reject any image that does not have a sufficiently good approximation. The cone models are generated by sampling point illuminations with sufficient density, which follows from a new perturbation bound for point images in the Lambertian model. As the number of point images required for guaranteed verification may be large, we introduce a new formulation for cone preserving dimensionality reduction, which leverages tools from sparse and low-rank decomposition to reduce the complexity, while controlling the approximation error with respect to the original cone.
CVAug 2, 2012
Efficient Point-to-Subspace Query in $\ell^1$ with Application to Robust Object Instance RecognitionJu Sun, Yuqian Zhang, John Wright
Motivated by vision tasks such as robust face and object recognition, we consider the following general problem: given a collection of low-dimensional linear subspaces in a high-dimensional ambient (image) space, and a query point (image), efficiently determine the nearest subspace to the query in $\ell^1$ distance. In contrast to the naive exhaustive search which entails large-scale linear programs, we show that the computational burden can be cut down significantly by a simple two-stage algorithm: (1) projecting the query and data-base subspaces into lower-dimensional space by random Cauchy matrix, and solving small-scale distance evaluations (linear programs) in the projection space to locate candidate nearest; (2) with few candidates upon independent repetition of (1), getting back to the high-dimensional space and performing exhaustive search. To preserve the identity of the nearest subspace with nontrivial probability, the projection dimension typically is low-order polynomial of the subspace dimension multiplied by logarithm of number of the subspaces (Theorem 2.1). The reduced dimensionality and hence complexity renders the proposed algorithm particularly relevant to vision application such as robust face and object instance recognition that we investigate empirically.
LGJun 26, 2012
Exact Recovery of Sparsely-Used DictionariesDaniel A. Spielman, Huan Wang, John Wright
We consider the problem of learning sparsely used dictionaries with an arbitrary square dictionary and a random, sparse coefficient matrix. We prove that $O (n \log n)$ samples are sufficient to uniquely determine the coefficient matrix. Based on this proof, we design a polynomial-time algorithm, called Exact Recovery of Sparsely-Used Dictionaries (ER-SpUD), and prove that it probably recovers the dictionary and coefficient matrix when the coefficient matrix is sufficiently sparse. Simulation results show that ER-SpUD reveals the true dictionary as well as the coefficients with probability higher than many state-of-the-art algorithms.