MLMar 18, 2022
Generative Principal Component AnalysisZhaoqiang Liu, Jiulong Liu, Subhroshekhar Ghosh et al.
In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.
MLMay 13
State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectivesHoang-Son Tran, Pranav Gupta, Rémi Bardenet et al.
Determinantal point processes (DPPs) have emerged as a kernelized alternative to vanilla independent sampling for generating efficient minibatches, coresets and other parsimonious representations of large-scale datasets. While theoretical foundations and promising empirical performance have been demonstrated, there are two challenges for current proposals for DPP-based coresets or minibatches. The first is the need for families of DPPs with certain key variance reduction properties, usually constructed in a continuous setting, of which there are few known examples. The second is the need for an ad-hoc construction of a discrete DPP defined on a given dataset, that inherits such variance reduction. In this work, we contribute to the programme of establishing DPPs as a subsampling toolbox for ML by advancing on these two fronts. First, we propose new DPPs on the Euclidean space based on wavelets, with provably better accuracy guarantees than the best known rates. Second, we introduce a general method to convert such continuous DPPs, which are more amenable to proving analytical statements, into discrete kernels, which are pertinent for subsampling tasks such as minibatch and coreset constructions. This conversion mechanism simultaneously preserves the desired variance decay and reveals a low-rank decomposition of the discrete kernel, which makes sampling the corresponding DPP computationally inexpensive. En route, we enlarge the class of ML tasks amenable to improvements via DPP-based minibatches and coresets to include objective functions with arbitrarily low regularity, and rate guarantees that explicitly adapt to this regularity.
MLNov 1, 2024
Small coresets via negative dependence: DPPs, linear statistics, and concentrationRémi Bardenet, Subhroshekhar Ghosh, Hugo Simon-Onfroy et al.
Determinantal point processes (DPPs) are random configurations of points with tunable negative dependence. Because sampling is tractable, DPPs are natural candidates for subsampling tasks, such as minibatch selection or coreset construction. A \emph{coreset} is a subset of a (large) training set, such that minimizing an empirical loss averaged over the coreset is a controlled replacement for the intractable minimization of the original empirical loss. Typically, the control takes the form of a guarantee that the average loss over the coreset approximates the total loss uniformly across the parameter space. Recent work has provided significant empirical support in favor of using DPPs to build randomized coresets, coupled with interesting theoretical results that are suggestive but leave some key questions unanswered. In particular, the central question of whether the cardinality of a DPP-based coreset is fundamentally smaller than one based on independent sampling remained open. In this paper, we answer this question in the affirmative, demonstrating that \emph{DPPs can provably outperform independently drawn coresets}. In this vein, we contribute a conceptual understanding of coreset loss as a \emph{linear statistic} of the (random) coreset. We leverage this structural observation to connect the coresets problem to a more general problem of concentration phenomena for linear statistics of DPPs, wherein we obtain \emph{effective concentration inequalities that extend well-beyond the state-of-the-art}, encompassing general non-projection, even non-symmetric kernels. The latter have been recently shown to be of interest in machine learning beyond coresets, but come with a limited theoretical toolbox, to the extension of which our result contributes. Finally, we are also able to address the coresets problem for vector-valued objective functions, a novelty in the coresets literature.
MLMar 5
On the Statistical Optimality of Optimal Decision TreesZineng Xu, Subhroshekhar Ghosh, Yan Shuo Tan
While globally optimal empirical risk minimization (ERM) decision trees have become computationally feasible and empirically successful, rigorous theoretical guarantees for their statistical performance remain limited. In this work, we develop a comprehensive statistical theory for ERM trees under random design in both high-dimensional regression and classification. We first establish sharp oracle inequalities that bound the excess risk of the ERM estimator relative to the best possible approximation achievable by any tree with at most $L$ leaves, thereby characterizing the interpretability-accuracy trade-off. We derive these results using a novel uniform concentration framework based on empirically localized Rademacher complexity. Furthermore, we derive minimax optimal rates over a novel function class: the piecewise sparse heterogeneous anisotropic Besov (PSHAB) space. This space explicitly captures three key structural features encountered in practice: sparsity, anisotropic smoothness, and spatial heterogeneity. While our main results are established under sub-Gaussianity, we also provide robust guarantees that hold under heavy-tailed noise settings. Together, these findings provide a principled foundation for the optimality of ERM trees and introduce empirical process tools broadly applicable to other highly adaptive, data-driven procedures.
MLJul 20, 2025
Learning under Latent Group Sparsity via Diffusion on NetworksSubhroshekhar Ghosh, Soumendu Sundar Mukherjee
Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to sparse learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat-flow-based local network dynamics. The proposed penalty interpolates between the lasso and the group lasso penalties, the runtime of the heat-flow dynamics being the interpolating parameter. As such it can automatically default to lasso when the group structure reflected in the Laplacian is weak. In fact, we demonstrate a data-driven procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the diffusion for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data.
MLFeb 11, 2025
Negative Dependence as a toolbox for machine learning : review and new developmentsHoang-Son Tran, Vladimir Petrovic, Remi Bardenet et al.
Negative dependence is becoming a key driver in advancing learning capabilities beyond the limits of traditional independence. Recent developments have evidenced support towards negatively dependent systems as a learning paradigm in a broad range of fundamental machine learning challenges including optimization, sampling, dimensionality reduction and sparse signal recovery, often surpassing the performance of current methods based on statistical independence. The most popular negatively dependent model has been that of determinantal point processes (DPPs), which have their origins in quantum theory. However, other models, such as perturbed lattice models, strongly Rayleigh measures, zeros of random functions have gained salience in various learning applications. In this article, we review this burgeoning field of research, as it has developed over the past two decades or so. We also present new results on applications of DPPs to the parsimonious representation of neural networks. In the limited scope of the article, we mostly focus on aspects of this area to which the authors contributed over the recent years, including applications to Monte Carlo methods, coresets and stochastic gradient descent, stochastic networks, signal processing and connections to quantum computation. However, starting from basics of negative dependence for the uninitiated reader, extensive references are provided to a broad swath of related developments which could not be covered within our limited scope. While existing works and reviews generally focus on specific negatively dependent models (e.g. DPPs), a notable feature of this article is that it addresses negative dependence as a machine learning methodology as a whole. In this vein, it covers within its span an array of negatively dependent models and their applications well beyond DPPs, thereby putting forward a very general and rather unique perspective.
OCMay 31, 2023
Dictionary Learning under Symmetries via Group RepresentationsSubhroshekhar Ghosh, Aaron Y. R. Low, Yong Sheng Soh et al.
The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We specifically study this problem under the lens of mathematical representation theory. Leveraging the power of non-abelian Fourier analysis for functions over compact groups, we prescribe an algorithmic recipe for learning dictionaries that obey such invariances. We relate the dictionary learning problem in the physical domain, which is naturally modelled as being infinite dimensional, with the associated computational problem, which is necessarily finite dimensional. We establish that the dictionary learning problem can be effectively understood as an optimization instance over certain matrix orbitopes having a particular block-diagonal structure governed by the irreducible representations of the group of symmetries. This perspective enables us to introduce a band-limiting procedure which obtains dimensionality reduction in applications. We provide guarantees for our computational ansatz to provide a desirable dictionary learning outcome. We apply our paradigm to investigate the dictionary learning problem for the groups SO(2) and SO(3). While the SO(2)-orbitope admits an exact spectrahedral description, substantially less is understood about the SO(3)-orbitope. We describe a tractable spectrahedral outer approximation of the SO(3)-orbitope, and contribute an alternating minimization paradigm to perform optimization in this setting. We provide numerical experiments to highlight the efficacy of our approach in learning SO(3)-invariant dictionaries, both on synthetic and on real world data.
MEJan 20, 2022
Learning with latent group sparsity via heat flow dynamics on networksSubhroshekhar Ghosh, Soumendu Sundar Mukherjee
Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat flow-based local network dynamics. In fact, we demonstrate a procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the heat flow dynamics for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. We validate our approach by successful applications to real-world data from a wide array of application domains, including computer science, genetics, climatology and economics. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data.
LGAug 8, 2021
Robust 1-bit Compressive Sensing with Partial Gaussian Circulant Matrices and Generative PriorsZhaoqiang Liu, Subhroshekhar Ghosh, Jun Han et al.
In 1-bit compressive sensing, each measurement is quantized to a single bit, namely the sign of a linear function of an unknown vector, and the goal is to accurately recover the vector. While it is most popular to assume a standard Gaussian sensing matrix for 1-bit compressive sensing, using structured sensing matrices such as partial Gaussian circulant matrices is of significant practical importance due to their faster matrix operations. In this paper, we provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing with randomly signed partial Gaussian circulant matrices and generative models. Under suitable assumptions, we match guarantees that were previously only known to hold for i.i.d.~Gaussian matrices that require significantly more computation. We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our theoretical results.
MLJun 29, 2021
Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative PriorsZhaoqiang Liu, Subhroshekhar Ghosh, Jonathan Scarlett
Compressive phase retrieval is a popular variant of the standard compressive sensing problem in which the measurements only contain magnitude information. In this paper, motivated by recent advances in deep generative models, we provide recovery guarantees with near-optimal sample complexity for phase retrieval with generative priors. We first show that when using i.i.d. Gaussian measurements and an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs, roughly $O(k \log L)$ samples suffice to guarantee that any signal minimizing an amplitude-based empirical loss function is close to the true signal. Attaining this sample complexity with a practical algorithm remains a difficult challenge, and finding a good initialization for gradient-based methods has been observed to pose a major bottleneck. To partially address this, we further show that roughly $O(k \log L)$ samples ensure sufficient closeness between the underlying signal and any {\em globally optimal} solution to an optimization problem designed for spectral initialization (though finding such a solution may still be challenging). We also adapt this result to sparse phase retrieval, and show that $O(s \log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional, matching an information-theoretic lower bound. While these guarantees do not directly correspond to a practical algorithm, we propose a practical spectral initialization method motivated by our findings, and experimentally observe performance gains over various existing spectral initialization methods for sparse phase retrieval.
SPMay 6, 2021
Signal Analysis via the Stochastic Geometry of Spectrogram Level SetsSubhroshekhar Ghosh, Meixia Lin, Dongfang Sun
Spectrograms are fundamental tools in time-frequency analysis, being the squared magnitude of the so-called short time Fourier transform (STFT). Signal analysis via spectrograms has traditionally explored their peaks, i.e. their maxima. This is complemented by a recent interest in their zeros or minima, following seminal work by Flandrin and others, which exploits connections with Gaussian analytic functions (GAFs). However, the zero sets (or extrema) of GAFs have a complicated stochastic structure, complicating any direct theoretical analysis. Standard techniques largely rely on statistical observables from the analysis of spatial data, whose distributional properties for spectrograms are mostly understood only at an empirical level. In this work, we investigate spectrogram analysis via an examination of the stochastic geometric properties of their level sets. We obtain rigorous theorems demonstrating the efficacy of a spectrogram level sets based approach to the detection and estimation of signals, framed in a concrete inferential set-up. Exploiting these ideas as theoretical underpinnings, we propose a level sets based algorithm for signal analysis that is intrinsic to given spectrogram data, and substantiate its effectiveness via extensive empirical studies. Our results also have theoretical implications for spectrogram zero based approaches to signal analysis. To our knowledge, these results are arguably among the first to provide a rigorous statistical understanding of signal detection and reconstruction in this set up, complemented with provable guarantees on detection thresholds and rates of convergence.
MENov 12, 2020
On a Variational Approximation based Empirical Likelihood ABC MethodSanjay Chaudhuri, Subhroshekhar Ghosh, David J. Nott et al.
Many scientifically well-motivated statistical models in natural, engineering, and environmental sciences are specified through a generative process. However, in some cases, it may not be possible to write down the likelihood for these models analytically. Approximate Bayesian computation (ABC) methods allow Bayesian inference in such situations. The procedures are nonetheless typically computationally intensive. Recently, computationally attractive empirical likelihood-based ABC methods have been suggested in the literature. All of these methods rely on the availability of several suitable analytically tractable estimating equations, and this is sometimes problematic. We propose an easy-to-use empirical likelihood ABC method in this article. First, by using a variational approximation argument as a motivation, we show that the target log-posterior can be approximated as a sum of an expected joint log-likelihood and the differential entropy of the data generating density. The expected log-likelihood is then estimated by an empirical likelihood where the only inputs required are a choice of summary statistic, it's observed value, and the ability to simulate the chosen summary statistics for any parameter value under the model. The differential entropy is estimated from the simulated summaries using traditional methods. Posterior consistency is established for the method, and we discuss the bounds for the required number of simulated summaries in detail. The performance of the proposed method is explored in various examples.
MLAug 7, 2020
Fractal Gaussian Networks: A sparse random graph model based on Gaussian Multiplicative ChaosSubhroshekhar Ghosh, Krishnakumar Balasubramanian, Xiaochuan Yang
We propose a novel stochastic network model, called Fractal Gaussian Network (FGN), that embodies well-defined and analytically tractable fractal structures. Such fractal structures have been empirically observed in diverse applications. FGNs interpolate continuously between the popular purely random geometric graphs (a.k.a. the Poisson Boolean network), and random graphs with increasingly fractal behavior. In fact, they form a parametric family of sparse random geometric graphs that are parametrized by a fractality parameter which governs the strength of the fractal structure. FGNs are driven by the latent spatial geometry of Gaussian Multiplicative Chaos (GMC), a canonical model of fractality in its own right. We asymptotically characterize the expected number of edges, triangles, cliques and hub-and-spoke motifs in FGNs, unveiling a distinct pattern in their scaling with the size parameter of the network. We then examine the natural question of detecting the presence of fractality and the problem of parameter estimation based on observed network data, in addition to fundamental properties of the FGN as a random graph model. We also explore fractality in community structures by unveiling a natural stochastic block model in the setting of FGNs. Finally, we substantiate our results with phenomenological analysis of the FGN in the context of available scientific literature for fractality in networks, including applications to real-world massive network data.
MLJul 8, 2020
Learning from DPPs via Sampling: Beyond HKPV and symmetryRémi Bardenet, Subhroshekhar Ghosh
Determinantal point processes (DPPs) have become a significant tool for recommendation systems, feature selection, or summary extraction, harnessing the intrinsic ability of these probabilistic models to facilitate sample diversity. The ability to sample from DPPs is paramount to the empirical investigation of these models. Most exact samplers are variants of a spectral meta-algorithm due to Hough, Krishnapur, Peres and Virág (henceforth HKPV), which is in general time and resource intensive. For DPPs with symmetric kernels, scalable HKPV samplers have been proposed that either first downsample the ground set of items, or force the kernel to be low-rank, using e.g. Nyström-type decompositions. In the present work, we contribute a radically different approach than HKPV. Exploiting the fact that many statistical and learning objectives can be effectively accomplished by only sampling certain key observables of a DPP (so-called linear statistics), we invoke an expression for the Laplace transform of such an observable as a single determinant, which holds in complete generality. Combining traditional low-rank approximation techniques with Laplace inversion algorithms from numerical analysis, we show how to directly approximate the distribution function of a linear statistic of a DPP. This distribution function can then be used in hypothesis testing or to actually sample the linear statistic, as per requirement. Our approach is scalable and applies to very general DPPs, beyond traditional symmetric kernels.
PRFeb 17, 2020
Transmission and navigation on disordered lattice networks, directed spanning forests and Brownian webSubhroshekhar Ghosh, Kumarjit Saha
Stochastic networks based on random point sets as nodes have attracted considerable interest in many applications, particularly in communication networks, including wireless sensor networks, peer-to-peer networks and so on. The study of such networks generally requires the nodes to be independently and uniformly distributed as a Poisson point process. In this work, we venture beyond this standard paradigm and investigate the stochastic geometry of networks obtained from \textit{directed spanning forests} (DSF) based on randomly perturbed lattices, which have desirable statistical properties as a models of spatially dependent point fields. In the regime of low disorder, we show in 2D and 3D that the DSF almost surely consists of a single tree. In 2D, we further establish that the DSF, as a collection of paths, converges under diffusive scaling to the Brownian web.