Wojciech Szpankowski

h-index48

18papers

91citations

Novelty62%

AI Score32

Ranked #136,928 of 201,326 authors (top 68%)#30,147 in LG (top 71%)

18 Papers

LGSep 9, 2022

Expected Worst Case Regret via Stochastic Sequential Covering

Changlong Wu, Mohsen Heidari, Ananth Grama et al.

We study the problem of sequential prediction and online minimax regret with stochastically generated features under a general loss function. We introduce a notion of expected worst case minimax regret that generalizes and encompasses prior known minimax regrets. For such minimax regrets we establish tight upper bounds via a novel concept of stochastic global sequential covering. We show that for a hypothesis class of VC-dimension $\mathsf{VC}$ and $i.i.d.$ generated features of length $T$, the cardinality of the stochastic global sequential covering can be upper bounded with high probability (whp) by $e^{O(\mathsf{VC} \cdot \log^2 T)}$. We then improve this bound by introducing a new complexity measure called the Star-Littlestone dimension, and show that classes with Star-Littlestone dimension $\mathsf{SL}$ admit a stochastic global sequential covering of order $e^{O(\mathsf{SL} \cdot \log T)}$. We further establish upper bounds for real valued classes with finite fat-shattering numbers. Finally, by applying information-theoretic tools of the fixed design minimax regrets, we provide lower bounds for the expected worst case minimax regret. We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.

QUANT-PHMar 22, 2022

Toward Physically Realizable Quantum Neural Networks

Mohsen Heidari, Ananth Grama, Wojciech Szpankowski

There has been significant recent interest in quantum neural networks (QNNs), along with their applications in diverse domains. Current solutions for QNNs pose significant challenges concerning their scalability, ensuring that the postulates of quantum mechanics are satisfied and that the networks are physically realizable. The exponential state space of QNNs poses challenges for the scalability of training procedures. The no-cloning principle prohibits making multiple copies of training samples, and the measurement postulates lead to non-deterministic loss functions. Consequently, the physical realizability and efficiency of existing approaches that rely on repeated measurement of several copies of each sample for training QNNs are unclear. This paper presents a new model for QNNs that relies on band-limited Fourier expansions of transfer functions of quantum perceptrons (QPs) to design scalable training procedures. This training procedure is augmented with a randomized quantum stochastic gradient descent technique that eliminates the need for sample replication. We show that this training procedure converges to the true minima in expectation, even in the presence of non-determinism due to quantum measurement. Our solution has a number of important benefits: (i) using QPs with concentrated Fourier power spectrum, we show that the training procedure for QNNs can be made scalable; (ii) it eliminates the need for resampling, thus staying consistent with the no-cloning rule; and (iii) enhanced data efficiency for the overall training process since each data sample is processed once per epoch. We present a detailed theoretical foundation for our models and methods' scalability, accuracy, and data efficiency. We also validate the utility of our approach through a series of numerical experiments.

LGMay 7, 2022

Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm

Changlong Wu, Mohsen Heidari, Ananth Grama et al.

We study the sequential general online regression, known also as the sequential probability assignments, under logarithmic loss when compared against a broad class of experts. We focus on obtaining tight, often matching, lower and upper bounds for the sequential minimax regret that are defined as the excess loss it incurs over a class of experts. After proving a general upper bound, we consider some specific classes of experts from Lipschitz class to bounded Hessian class and derive matching lower and upper bounds with provably optimal constants. Our bounds work for a wide range of values of the data dimension and the number of rounds. To derive lower bounds, we use tools from information theory (e.g., Shtarkov sum) and for upper bounds, we resort to new "smooth truncated covering" of the class of experts. This allows us to find constructive proofs by applying a simple and novel truncated Bayesian algorithm. Our proofs are substantially simpler than the existing ones and yet provide tighter (and often optimal) bounds.

LGJan 31, 2023

Online Learning in Dynamically Changing Environments

Changlong Wu, Ananth Grama, Wojciech Szpankowski

We study the problem of online learning and online regret minimization when samples are drawn from a general unknown non-stationary process. We introduce the concept of a dynamic changing process with cost $K$, where the conditional marginals of the process can vary arbitrarily, but that the number of different conditional marginals is bounded by $K$ over $T$ rounds. For such processes we prove a tight (upto $\sqrt{\log T}$ factor) bound $O(\sqrt{KT\cdot\mathsf{VC}(\mathcal{H})\log T})$ for the expected worst case regret of any finite VC-dimensional class $\mathcal{H}$ under absolute loss (i.e., the expected miss-classification loss). We then improve this bound for general mixable losses, by establishing a tight (up to $\log^3 T$ factor) regret bound $O(K\cdot\mathsf{VC}(\mathcal{H})\log^3 T)$. We extend these results to general smooth adversary processes with unknown reference measure by showing a sub-linear regret bound for $1$-dimensional threshold functions under a general bounded convex loss. Our results can be viewed as a first step towards regret analysis with non-stationary samples in the distribution blind (universal) regime. This also brings a new viewpoint that shifts the study of complexity of the hypothesis classes to the study of the complexity of processes generating data.

LGSep 4, 2023

Robust Online Classification: From Estimation to Denoising

Changlong Wu, Ananth Grama, Wojciech Szpankowski

We study online classification of features into labels with general hypothesis classes. In our setting, true labels are determined by some function within the hypothesis class but are corrupted by unknown stochastic noise, and the features are generated adversarially. Predictions are made using observed noisy labels and noiseless features, while the performance is measured via minimax risk when comparing against true labels. The noise mechanism is modeled via a general noise kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is tightly characterized (up to a logarithmic factor of the hypothesis class size) by the Hellinger gap of the noisy label distributions induced by the kernel, independent of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new conditional version of Le Cam-Birgé testing suitable for online settings. Our work provides the first comprehensive characterization for noisy online classification with guarantees with respect to the ground truth while addressing general noisy observations.

QUANT-PHAug 22, 2024

New Bounds on Quantum Sample Complexity of Measurement Classes

Mohsen Heidari, Wojciech Szpankowski

This paper studies quantum supervised learning for classical inference from quantum states. In this model, a learner has access to a set of labeled quantum samples as the training set. The objective is to find a quantum measurement that predicts the label of the unseen samples. The hardness of learning is measured via sample complexity under a quantum counterpart of the well-known probably approximately correct (PAC). Quantum sample complexity is expected to be higher than classical one, because of the measurement incompatibility and state collapse. Recent efforts showed that the sample complexity of learning a finite quantum concept class $\mathcal{C}$ scales as $O(|\mathcal{C}|)$. This is significantly higher than the classical sample complexity that grows logarithmically with the class size. This work improves the sample complexity bound to $O(V_{\mathcal{C}^*} \log |\mathcal{C}^*|)$, where $\mathcal{C}^*$ is the set of extreme points of the convex closure of $\mathcal{C}$ and $V_{\mathcal{C}^*}$ is the shadow-norm of this set. We show the tightness of our bound for the class of bounded Hilbert-Schmidt norm, scaling as $O(\log |\mathcal{C}^*|)$. Our approach is based on a new quantum empirical risk minimization (ERM) algorithm equipped with a shadow tomography method.

QUANT-PHOct 10, 2023

Quantum Shadow Gradient Descent for Variational Quantum Algorithms

Mohsen Heidari, Mobasshir A Naved, Zahra Honjani et al.

Gradient-based optimizers have been proposed for training variational quantum circuits in settings such as quantum neural networks (QNNs). The task of gradient estimation, however, has proven to be challenging, primarily due to distinctive quantum features such as state collapse and measurement incompatibility. Conventional techniques, such as the parameter-shift rule, necessitate several fresh samples in each iteration to estimate the gradient due to the stochastic nature of state measurement. Owing to state collapse from measurement, the inability to reuse samples in subsequent iterations motivates a crucial inquiry into whether fundamentally more efficient approaches to sample utilization exist. In this paper, we affirm the feasibility of such efficiency enhancements through a novel procedure called quantum shadow gradient descent (QSGD), which uses a single sample per iteration to estimate all components of the gradient. Our approach is based on an adaptation of shadow tomography that significantly enhances sample efficiency. Through detailed theoretical analysis, we show that QSGD has a significantly faster convergence rate than existing methods under locality conditions. We present detailed numerical experiments supporting all of our theoretical claims.

LGMar 8, 2023

Agnostic PAC Learning of k-juntas Using L2-Polynomial Regression

Mohsen Heidari, Wojciech Szpankowski

Many conventional learning algorithms rely on loss functions other than the natural 0-1 loss for computational efficiency and theoretical tractability. Among them are approaches based on absolute loss (L1 regression) and square loss (L2 regression). The first is proved to be an \textit{agnostic} PAC learner for various important concept classes such as \textit{juntas}, and \textit{half-spaces}. On the other hand, the second is preferable because of its computational efficiency, which is linear in the sample size. However, PAC learnability is still unknown as guarantees have been proved only under distributional restrictions. The question of whether L2 regression is an agnostic PAC learner for 0-1 loss has been open since 1993 and yet has to be answered. This paper resolves this problem for the junta class on the Boolean cube -- proving agnostic PAC learning of k-juntas using L2 polynomial regression. Moreover, we present a new PAC learning algorithm based on the Boolean Fourier expansion with lower computational complexity. Fourier-based algorithms, such as Linial et al. (1993), have been used under distributional restrictions, such as uniform distribution. We show that with an appropriate change, one can apply those algorithms in agnostic settings without any distributional assumption. We prove our results by connecting the PAC learning with 0-1 loss to the minimum mean square estimation (MMSE) problem. We derive an elegant upper bound on the 0-1 loss in terms of the MMSE error and show that the sign of the MMSE is a PAC learner for any concept class containing it.

QUANT-PHApr 7, 2024

Efficient Gradient Estimation of Variational Quantum Circuits with Lie Algebraic Symmetries

Mohsen Heidari, Masih Mozakka, Wojciech Szpankowski

Hybrid quantum-classical optimization and learning strategies are among the most promising approaches to harnessing quantum information or gaining a quantum advantage over classical methods. However, efficient estimation of the gradient of the objective function in such models remains a challenge due to several factors including the exponential dimensionality of the Hilbert spaces, and information loss of quantum measurements. In this work, we developed an efficient framework that makes the Hadamard test efficiently applicable to gradient estimation for a broad range of quantum systems, an advance that had been wanting from the outset. Under certain mild structural assumptions, the gradient is estimated with the measurement shots that scale logarithmically with the number of parameters and with polynomial classical and quantum time. This is an exponential reduction in the measurement cost and polynomial speed up in time compared to existing works. The structural assumptions are (1) the dimension of the dynamical Lie algebra is polynomial in the number of qubits, and (2) the observable has a bounded Hilbert-Schmidt norm.

LGOct 24, 2024

No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models

Changlong Wu, Ananth Grama, Wojciech Szpankowski

Generative models have shown impressive capabilities in synthesizing high-quality outputs across various domains. However, a persistent challenge is the occurrence of "hallucinations", where the model produces outputs that are plausible but invalid. While empirical strategies have been explored to mitigate this issue, a rigorous theoretical understanding remains elusive. In this paper, we develop a theoretical framework to analyze the learnability of non-hallucinating generative models from a learning-theoretic perspective. Our results reveal that non-hallucinating learning is statistically impossible when relying solely on the training dataset, even for a hypothesis class of size two and when the entire training set is truthful. To overcome these limitations, we show that incorporating inductive biases aligned with the actual facts into the learning process is essential. We provide a systematic approach to achieve this by restricting the facts set to a concept class of finite VC-dimension and demonstrate its effectiveness under various learning paradigms. Although our findings are primarily conceptual, they represent a first step towards a principled approach to addressing hallucinations in learning generative models.

LGJan 27, 2024

Oracle-Efficient Hybrid Online Learning with Unknown Distribution

Changlong Wu, Jin Sima, Wojciech Szpankowski

We study the problem of oracle-efficient hybrid online learning when the features are generated by an unknown i.i.d. process and the labels are generated adversarially. Assuming access to an (offline) ERM oracle, we show that there exists a computationally efficient online predictor that achieves a regret upper bounded by $\tilde{O}(T^{\frac{3}{4}})$ for a finite-VC class, and upper bounded by $\tilde{O}(T^{\frac{p+1}{p+2}})$ for a class with $α$ fat-shattering dimension $α^{-p}$. This provides the first known oracle-efficient sublinear regret bounds for hybrid online learning with an unknown feature generation process. In particular, it confirms a conjecture of Lazaric and Munos (JCSS 2012). We then extend our result to the scenario of shifting distributions with $K$ changes, yielding a regret of order $\tilde{O}(T^{\frac{4}{5}}K^{\frac{1}{5}})$. Finally, we establish a regret of $\tilde{O}((K^{\frac{2}{3}}(\log|\mathcal{H}|)^{\frac{1}{3}}+K)\cdot T^{\frac{4}{5}})$ for the contextual $K$-armed bandits with a finite policy set $\mathcal{H}$, i.i.d. generated contexts from an unknown distribution, and adversarially generated costs.

LGFeb 1, 2024

Online Distribution Learning with Local Private Constraints

Jin Sima, Changlong Wu, Olgica Milenkovic et al.

We study the problem of online conditional distribution estimation with \emph{unbounded} label sets under local differential privacy. Let $\mathcal{F}$ be a distribution-valued function class with unbounded label set. We aim at estimating an \emph{unknown} function $f\in \mathcal{F}$ in an online fashion so that at time $t$ when the context $\boldsymbol{x}_t$ is provided we can generate an estimate of $f(\boldsymbol{x}_t)$ under KL-divergence knowing only a privatized version of the true labels sampling from $f(\boldsymbol{x}_t)$. The ultimate objective is to minimize the cumulative KL-risk of a finite horizon $T$. We show that under $(ε,0)$-local differential privacy of the privatized labels, the KL-risk grows as $\tildeΘ(\frac{1}ε\sqrt{KT})$ upto poly-logarithmic factors where $K=|\mathcal{F}|$. This is in stark contrast to the $\tildeΘ(\sqrt{T\log K})$ bound demonstrated by Wu et al. (2023a) for bounded label sets. As a byproduct, our results recover a nearly tight upper bound for the hypothesis selection problem of gopi et al. (2020) established only for the batch setting.

LGFeb 11, 2021

On Agnostic PAC Learning using $\mathcal{L}_2$-polynomial Regression and Fourier-based Algorithms

Mohsen Heidari, Wojciech Szpankowski

We develop a framework using Hilbert spaces as a proxy to analyze PAC learning problems with structural properties. We consider a joint Hilbert space incorporating the relation between the true label and the predictor under a joint distribution $D$. We demonstrate that agnostic PAC learning with 0-1 loss is equivalent to an optimization in the Hilbert space domain. With our model, we revisit the PAC learning problem using methods based on least-squares such as $\mathcal{L}_2$ polynomial regression and Linial's low-degree algorithm. We study learning with respect to several hypothesis classes such as half-spaces and polynomial-approximated classes (i.e., functions approximated by a fixed-degree polynomial). We prove that (under some distributional assumptions) such methods obtain generalization error up to $2opt$ with $opt$ being the optimal error of the class. Hence, we show the tightest bound on generalization error when $opt\leq 0.2$.

LGJan 28, 2021

Low Complexity Approximate Bayesian Logistic Regression for Sparse Online Learning

Gil I. Shamir, Wojciech Szpankowski

Theoretical results show that Bayesian methods can achieve lower bounds on regret for online logistic regression. In practice, however, such techniques may not be feasible especially for very large feature sets. Various approximations that, for huge sparse feature sets, diminish the theoretical advantages, must be used. Often, they apply stochastic gradient methods with hyper-parameters that must be tuned on some surrogate loss, defeating theoretical advantages of Bayesian methods. The surrogate loss, defined to approximate the mixture, requires techniques as Monte Carlo sampling, increasing computations per example. We propose low complexity analytical approximations for sparse online logistic and probit regressions. Unlike variational inference and other methods, our methods use analytical closed forms, substantially lowering computations. Unlike dense solutions, as Gaussian Mixtures, our methods allow for sparse problems with huge feature sets without increasing complexity. With the analytical closed forms, there is also no need for applying stochastic gradient methods on surrogate losses, and for tuning and balancing learning and regularization hyper-parameters. Empirical results top the performance of the more computationally involved methods. Like such methods, our methods still reveal per feature and per example uncertainty measures.

ITJan 25, 2021

On maximum-likelihood estimation in the all-or-nothing regime

Luca Corinzia, Paolo Penna, Wojciech Szpankowski et al.

We study the problem of estimating a rank-1 additive deformation of a Gaussian tensor according to the \emph{maximum-likelihood estimator} (MLE). The analysis is carried out in the sparse setting, where the underlying signal has a support that scales sublinearly with the total number of dimensions. We show that for Bernoulli distributed signals, the MLE undergoes an \emph{all-or-nothing} (AoN) phase transition, already established for the minimum mean-square-error estimator (MMSE) in the same problem. The result follows from two main technical points: (i) the connection established between the MLE and the MMSE, using the first and second-moment methods in the constrained signal space, (ii) a recovery regime for the MMSE stricter than the simple error vanishing characterization given in the standard AoN, that is here proved as a general result.

LGNov 23, 2020

Statistical and computational thresholds for the planted $k$-densest sub-hypergraph problem

Luca Corinzia, Paolo Penna, Wojciech Szpankowski et al.

In this work, we consider the problem of recovery a planted $k$-densest sub-hypergraph on $d$-uniform hypergraphs. This fundamental problem appears in different contexts, e.g., community detection, average-case complexity, and neuroscience applications as a structural variant of tensor-PCA problem. We provide tight \emph{information-theoretic} upper and lower bounds for the exact recovery threshold by the maximum-likelihood estimator, as well as \emph{algorithmic} bounds based on approximate message passing algorithms. The problem exhibits a typical statistical-to-computational gap observed in analogous sparse settings that widen with increasing sparsity of the problem. The bounds show that the signal structure impacts the location of the statistical and computational phase transition that the known existing bounds for the tensor-PCA model do not capture. This effect is due to the generic planted signal prior that this latter model addresses.

SIMay 2, 2019

Temporal Ordered Clustering in Dynamic Networks: Unsupervised and Semi-supervised Learning Algorithms

Krzysztof Turowski, Jithin K. Sreedharan, Wojciech Szpankowski

In temporal ordered clustering, given a single snapshot of a dynamic network in which nodes arrive at distinct time instants, we aim at partitioning its nodes into $K$ ordered clusters $\mathcal{C}_1 \prec \cdots \prec \mathcal{C}_K$ such that for $i<j$, nodes in cluster $\mathcal{C}_i$ arrived before nodes in cluster $\mathcal{C}_j$, with $K$ being a data-driven parameter and not known upfront. Such a problem is of considerable significance in many applications ranging from tracking the expansion of fake news to mapping the spread of information. We first formulate our problem for a general dynamic graph, and propose an integer programming framework that finds the optimal clustering, represented as a strict partial order set, achieving the best precision (i.e., fraction of successfully ordered node pairs) for a fixed density (i.e., fraction of comparable node pairs). We then develop a sequential importance procedure and design unsupervised and semi-supervised algorithms to find temporal ordered clusters that efficiently approximate the optimal solution. To illustrate the techniques, we apply our methods to the vertex copying (duplication-divergence) model which exhibits some edge-case challenges in inferring the clusters as compared to other network models. Finally, we validate the performance of the proposed algorithms on synthetic and real-world networks.

MLApr 6, 2019

Toward Universal Testing of Dynamic Network Models

Abram Magner, Wojciech Szpankowski

Numerous networks in the real world change over time, in the sense that nodes and edges enter and leave the networks. Various dynamic random graph models have been proposed to explain the macroscopic properties of these systems and to provide a foundation for statistical inferences and predictions. It is of interest to have a rigorous way to determine how well these models match observed networks. We thus ask the following goodness of fit question: given a sequence of observations/snapshots of a growing random graph, along with a candidate model M, can we determine whether the snapshots came from M or from some arbitrary alternative model that is well-separated from M in some natural metric? We formulate this problem precisely and boil it down to goodness of fit testing for graph-valued, infinite-state Markov processes and exhibit and analyze a universal test based on non-stationary sampling for a natural class of models.