Anatoli Juditsky

9papers

274citations

Novelty48%

AI Score26

Ranked #165,088 of 205,806 authors (top 80%)#617 in OC (top 71%)

9 Papers

MEApr 26, 2023

Generalized generalized linear models: Convex estimation and online bounds

Anatoli Juditsky, Arkadi Nemirovski, Yao Xie et al. · gatech

We introduce a new computational framework for estimating parameters in generalized generalized linear models (GGLM), a class of models that extends the popular generalized linear models (GLM) to account for dependencies among observations in spatio-temporal data. The proposed approach uses a monotone operator-based variational inequality method to overcome non-convexity in parameter estimation and provide guarantees for parameter recovery. The results can be applied to GLM and GGLM, focusing on spatio-temporal models. We also present online instance-based bounds using martingale concentrations inequalities. Finally, we demonstrate the performance of the algorithm using numerical simulations and a real data example for wildfire incidents.

OCJul 4, 2023

Accelerated stochastic approximation with state-dependent noise

Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan et al.

We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.

MLOct 23, 2022

Stochastic Mirror Descent for Large-Scale Sparse Recovery

Sasila Ilandarideva, Yannis Bekri, Anatoli Juditsky et al.

In this paper we discuss an application of Stochastic Approximation to statistical estimation of high-dimensional sparse parameters. The proposed solution reduces to resolving a penalized stochastic optimization problem on each stage of a multistage algorithm; each problem being solved to a prescribed accuracy by the non-Euclidean Composite Stochastic Mirror Descent (CSMD) algorithm. Assuming that the problem objective is smooth and quadratically minorated and stochastic perturbations are sub-Gaussian, our analysis prescribes the method parameters which ensure fast convergence of the estimation error (the radius of a confidence ball of a given norm around the approximate solution). This convergence is linear during the first "preliminary" phase of the routine and is sublinear during the second "asymptotic" phase. We consider an application of the proposed approach to sparse Generalized Linear Regression problem. In this setting, we show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution. We also present a numerical study illustrating the performance of the algorithm on high-dimensional simulation data.

MLJun 11, 2020

Sparse recovery by reduced variance stochastic approximation

Anatoli Juditsky, Andrei Kulunchakov, Hlib Tsyntseus

In this paper, we discuss application of iterative Stochastic Optimization routines to the problem of sparse signal recovery from noisy observation. Using Stochastic Mirror Descent algorithm as a building block, we develop a multistage procedure for recovery of sparse solutions to Stochastic Optimization problem under assumption of smoothness and quadratic minoration on the expected objective. An interesting feature of the proposed algorithm is linear convergence of the approximate solution during the preliminary phase of the routine when the component of stochastic error in the gradient observation which is due to bad initial approximation of the optimal solution is larger than the "ideal" asymptotic error component owing to observation noise "at the optimal solution." We also show how one can straightforwardly enhance reliability of the corresponding solution by using Median-of-Means like techniques. We illustrate the performance of the proposed algorithms in application to classical problems of recovery of sparse and low rank signals in generalized linear regression framework. We show, under rather weak assumption on the regressor and noise distributions, how they lead to parameter estimates which obey (up to factors which are logarithmic in problem dimension and confidence level) the best known to us accuracy bounds.

STMar 29, 2020

Convex Parameter Recovery for Interacting Marked Processes

Anatoli Juditsky, Arkadi Nemirovski, Liyan Xie et al.

We introduce a new general modeling approach for multivariate discrete event data with categorical interacting marks, which we refer to as marked Bernoulli processes. In the proposed model, the probability of an event of a specific category to occur in a location may be influenced by past events at this and other locations. We do not restrict interactions to be positive or decaying over time as it is commonly adopted, allowing us to capture an arbitrary shape of influence from historical events, locations, and events of different categories. In our modeling, prior knowledge is incorporated by allowing general convex constraints on model parameters. We develop two parameter estimation procedures utilizing the constrained Least Squares (LS) and Maximum Likelihood (ML) estimation, which are solved using variational inequalities with monotone operators. We discuss different applications of our approach and illustrate the performance of proposed recovery routines on synthetic examples and a real-world police dataset.

OCOct 30, 2019

Unifying mirror descent and dual averaging

Anatoli Juditsky, Joon Kwon, Éric Moulines

We introduce and analyze a new family of first-order optimization algorithms which generalizes and unifies both mirror descent and dual averaging. Within the framework of this family, we define new algorithms for constrained optimization that combines the advantages of mirror descent and dual averaging. Our preliminary simulation study shows that these new algorithms significantly outperform available methods in some situations.

STJun 11, 2018

Adaptive Denoising of Signals with Local Shift-Invariant Structure

Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski et al.

We discuss the problem of adaptive discrete-time signal denoising in the situation where the signal to be recovered admits a "linear oracle" -- an unknown linear estimate that takes the form of convolution of observations with a time-invariant filter. It was shown by Juditsky and Nemirovski (2009) that when the $\ell_2$-norm of the oracle filter is small enough, such oracle can be "mimicked" by an efficiently computable adaptive estimate of the same structure with an observation-driven filter. The filter in question was obtained as a solution to the optimization problem in which the $\ell_\infty$-norm of the Discrete Fourier Transform (DFT) of the estimation residual is minimized under constraint on the $\ell_1$-norm of the filter DFT. In this paper, we discuss a new family of adaptive estimates which rely upon minimizing the $\ell_2$-norm of the estimation residual. We show that such estimators possess better statistical properties than those based on $\ell_\infty$-fit; in particular, we prove oracle inequalities for their $\ell_2$-loss and improved bounds for $\ell_2$- and pointwise losses. The oracle inequalities rely on the "approximate shift-invariance" assumption stating that the signal to be recovered is close to an (unknown) shift-invariant subspace. We also study the relationship of the approximate shift-invariance assumption with the "signal simplicity" assumption introduced in Juditsky and Nemirovski (2009) and discuss the application of the proposed approach to harmonic oscillations denoising.

OCFeb 10, 2013

Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization

Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski

Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone $K$, a norm $\|\cdot\|$ and a smooth convex function $f$, we want either 1) to minimize the norm over the intersection of the cone and a level set of $f$, or 2) to minimize over the cone the sum of $f$ and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) $\|\cdot\|$ is "too complicated" to allow for computationally cheap Bregman projections required in the first-order proximal gradient algorithms. On the other hand, we assume that {it is relatively easy to minimize linear forms over the intersection of $K$ and the unit $\|\cdot\|$-ball}. Motivating examples are given by the nuclear norm with $K$ being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications.

OCJul 4, 2012

On unified view of nullspace-type conditions for recoveries associated with general sparsity structures

Anatoli Juditsky, Fatma Kilinc Karzan, Arkadi Nemirovski

We discuss a general notion of "sparsity structure" and associated recoveries of a sparse signal from its linear image of reduced dimension possibly corrupted with noise. Our approach allows for unified treatment of (a) the "usual sparsity" and "usual $\ell_1$ recovery," (b) block-sparsity with possibly overlapping blocks and associated block-$\ell_1$ recovery, and (c) low-rank-oriented recovery by nuclear norm minimization. The proposed recovery routines are natural extensions of the usual $\ell_1$ minimization used in Compressed Sensing. Specifically we present nullspace-type sufficient conditions for the recovery to be precise on sparse signals in the noiseless case. Then we derive error bounds for imperfect (nearly sparse signal, presence of observation noise, etc.) recovery under these conditions. In all of these cases, we present efficiently verifiable sufficient conditions for the validity of the associated nullspace properties.