Alon Gonen

h-index8

13papers

534citations

Novelty51%

AI Score26

Ranked #160,067 of 194,257 authors (top 82%)#35,072 in LG (top 87%)

13 Papers

7.9LGFeb 8, 2020

Towards a combinatorial characterization of bounded memory learning

Alon Gonen, Shachar Lovett, Michal Moshkovitz

Combinatorial dimensions play an important role in the theory of machine learning. For example, VC dimension characterizes PAC learning, SQ dimension characterizes weak learning with statistical queries, and Littlestone dimension characterizes online learning. In this paper we aim to develop combinatorial dimensions that characterize bounded memory learning. We propose a candidate solution for the case of realizable strong learning under a known distribution, based on the SQ dimension of neighboring distributions. We prove both upper and lower bounds for our candidate solution, that match in some regime of parameters. In this parameter regime there is an equivalence between bounded memory and SQ learning. We conjecture that our characterization holds in a much wider regime of parameters.

9.6LGJan 31, 2020

Boosting Simple Learners

Noga Alon, Alon Gonen, Elad Hazan et al.

Boosting is a celebrated machine learning approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. We study boosting under the assumption that the weak hypotheses belong to a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are "rules-of-thumbs" from an "easy-to-learn class". (Schapire and Freund~'12, Shalev-Shwartz and Ben-David '14.) Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: How many weak hypotheses are needed to produce an accurate hypothesis? We design a novel boosting algorithm and demonstrate that it circumvents a classical lower bound by Freund and Schapire ('95, '12). Whereas the lower bound shows that $Ω({1}/{γ^2})$ weak hypotheses with $γ$-margin are sometimes necessary, our new method requires only $\tilde{O}({1}/γ)$ weak hypothesis, provided that they belong to a class of bounded VC dimension. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, the new boosting algorithm uses more complex ("deeper") aggregation rules. We complement this result by showing that complex aggregation rules are in fact necessary to circumvent the aforementioned lower bound. (ii) Expressivity: Which tasks can be learned by boosting weak hypotheses from a bounded VC class? Can complex concepts that are "far away" from the class be learned? Towards answering the first question we {introduce combinatorial-geometric parameters which capture expressivity in boosting.} As a corollary we provide an affirmative answer to the second question for well-studied classes, including half-spaces and decision stumps. Along the way, we establish and exploit connections with Discrepancy Theory.

9.5LGMay 27, 2019

Private Learning Implies Online Learning: An Efficient Reduction

Alon Gonen, Elad Hazan, Shay Moran

We study the relationship between the notions of differentially private learning and online learning in games. Several recent works have shown that differentially private learning implies online learning, but an open problem of Neel, Roth, and Wu \cite{NeelAaronRoth2018} asks whether this implication is {\it efficient}. Specifically, does an efficient differentially private learner imply an efficient online learner? In this paper we resolve this open question in the context of pure differential privacy. We derive an efficient black-box reduction from differentially private learning to online learning from expert advice.

18.7LGOct 17, 2018

Learning in Non-convex Games with an Optimization Oracle

Naman Agarwal, Alon Gonen, Elad Hazan

We consider online learning in an adversarial, non-convex setting under the assumption that the learner has an access to an offline optimization oracle. In the general setting of prediction with expert advice, Hazan et al. (2016) established that in the optimization-oracle model, online learning requires exponentially more computation than statistical learning. In this paper we show that by slightly strengthening the oracle model, the online and the statistical learning models become computationally equivalent. Our result holds for any Lipschitz and bounded (but not necessarily convex) function. As an application we demonstrate how the offline oracle enables efficient computation of an equilibrium in non-convex games, that include GAN (generative adversarial networks) as a special case.

4.1LGMay 21, 2018

Optimal Sketching Bounds for Exp-concave Stochastic Minimization

Naman Agarwal, Alon Gonen

We derive optimal statistical and computational complexity bounds for exp-concave stochastic minimization in terms of the effective dimension. For common eigendecay patterns of the population covariance matrix, this quantity is significantly smaller than the ambient dimension. Our results reveal interesting connections to sketching results in numerical linear algebra. In particular, our statistical analysis highlights a novel and natural relationship between algorithmic stability of empirical risk minimization and ridge leverage scores, which play significant role in sketching-based methods. Our main computational result is a fast implementation of a sketch-to-precondition approach in the context of exp-concave empirical risk minimization.

3.7LGOct 29, 2017

Smooth Sensitivity Based Approach for Differentially Private Principal Component Analysis

Ran Gilad-Bachrach, Alon Gonen

Currently known methods for this task either employ the computationally intensive \emph{exponential mechanism} or require an access to the covariance matrix, and therefore fail to utilize potential sparsity of the data. The problem of designing simpler and more efficient methods for this task has been raised as an open problem in \cite{kapralov2013differentially}. In this paper we address this problem by employing the output perturbation mechanism. Despite being arguably the simplest and most straightforward technique, it has been overlooked due to the large \emph{global sensitivity} associated with publishing the leading eigenvector. We tackle this issue by adopting a \emph{smooth sensitivity} based approach, which allows us to establish differential privacy (in a worst-case manner) and near-optimal sample complexity results under eigengap assumption. We consider both the pure and the approximate notions of differential privacy, and demonstrate a tradeoff between privacy level and sample complexity. We conclude by suggesting how our results can be extended to related problems.

12.7LGJan 16, 2017

Fast Rates for Empirical Risk Minimization of Strict Saddle Problems

Alon Gonen, Shai Shalev-Shwartz

We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property. Recent progress in non-convex optimization has yielded efficient algorithms for minimizing such functions. Our results imply that these efficient algorithms are statistically stable and also generalize well. In particular, we derive fast rates which resemble the bounds that are often attained in the strongly convex setting. We specify our bounds to Principal Component Analysis and Independent Component Analysis. Our results and techniques may pave the way for statistical analyses of additional strict saddle problems.

13.1LGFeb 7, 2016

Solving Ridge Regression using Sketched Preconditioned SVRG

Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz

We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods. By equipping Stochastic Variance Reduced Gradient (SVRG) with this preconditioning process, we obtain a significant speed-up relative to fast stochastic methods such as SVRG, SDCA and SAG.

11.0LGJan 15, 2016

Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization

Alon Gonen, Shai Shalev-Shwartz

We show that the average stability notion introduced by \cite{kearns1999algorithmic, bousquet2002stability} is invariant to data preconditioning, for a wide class of generalized linear models that includes most of the known exp-concave losses. In other words, when analyzing the stability rate of a given algorithm, we may assume the optimal preconditioning of the data. This implies that, at least from a statistical perspective, explicit regularization is not required in order to compensate for ill-conditioned data, which stands in contrast to a widely common approach that includes a regularization for analyzing the sample complexity of generalized linear models. Several important implications of our findings include: a) We demonstrate that the excess risk of empirical risk minimization (ERM) is controlled by the preconditioned stability rate. This immediately yields a relatively short and elegant proof for the fast rates attained by ERM in our context. b) We strengthen the recent bounds of \cite{hardt2015train} on the stability rate of the Stochastic Gradient Descent algorithm.

8.0NAJun 8, 2015

Faster SGD Using Sketched Conditioning

Alon Gonen, Shai Shalev-Shwartz

We propose a novel method for speeding up stochastic optimization algorithms via sketching methods, which recently became a powerful tool for accelerating algorithms for numerical linear algebra. We revisit the method of conditioning for accelerating first-order methods and suggest the use of sketching methods for constructing a cheap conditioner that attains a significant speedup with respect to the Stochastic Gradient Descent (SGD) algorithm. While our theoretical guarantees assume convexity, we discuss the applicability of our method to deep neural networks, and experimentally demonstrate its merits.

26.5LGFeb 25, 2015

Strongly Adaptive Online Learning

Amit Daniely, Alon Gonen, Shai Shalev-Shwartz

Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal. We present a reduction that can transform standard low-regret algorithms to strongly adaptive. As a consequence, we derive simple, yet efficient, strongly adaptive algorithms for a handful of problems.

11.1LGFeb 19, 2014

Subspace Learning with Partial Information

Alon Gonen, Dan Rosenbaum, Yonina Eldar et al.

The goal of subspace learning is to find a $k$-dimensional subspace of $\mathbb{R}^d$, such that the expected squared distance between instance vectors and the subspace is as small as possible. In this paper we study subspace learning in a partial information setting, in which the learner can only observe $r \le d$ attributes from each instance vector. We propose several efficient algorithms for this task, and analyze their sample complexity

17.6LGAug 17, 2012

Efficient Active Learning of Halfspaces: an Aggressive Approach

Alon Gonen, Sivan Sabato, Shai Shalev-Shwartz

We study pool-based active learning of half-spaces. We revisit the aggressive approach for active learning in the realizable case, and show that it can be made efficient and practical, while also having theoretical guarantees under reasonable assumptions. We further show, both theoretically and experimentally, that it can be preferable to mellow approaches. Our efficient aggressive active learner of half-spaces has formal approximation guarantees that hold when the pool is separable with a margin. While our analysis is focused on the realizable setting, we show that a simple heuristic allows using the same algorithm successfully for pools with low error as well. We further compare the aggressive approach to the mellow approach, and prove that there are cases in which the aggressive approach results in significantly better label complexity compared to the mellow approach. We demonstrate experimentally that substantial improvements in label complexity can be achieved using the aggressive approach, for both realizable and low-error settings.