LGMar 22, 2023
Stability is Stable: Connections between Replicability, Privacy, and Adaptive GeneralizationMark Bun, Marco Gaboardi, Max Hopkins et al.
The notion of replicable algorithms was introduced in Impagliazzo et al. [STOC '22] to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of $δ$ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions.
LGOct 2, 2022
Robust Empirical Risk Minimization with ToleranceRobi Bhattacharjee, Max Hopkins, Akash Kumar et al.
Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) $\textit{empirical risk minimization}$ (RERM), a simple process in which the learner outputs any hypothesis minimizing its training error. RERM famously fails to robustly learn VC classes (Montasser et al., 2019a), a bound we show extends even to `nice' settings such as (bounded) halfspaces. As such, we study a recent relaxation of the robust model called $\textit{tolerant}$ robust learning (Ashtiani et al., 2022) where the output classifier is compared to the best achievable error over slightly larger perturbation sets. We show that under geometric niceness conditions, a natural tolerant variant of RERM is indeed sufficient for $γ$-tolerant robust learning VC classes over $\mathbb{R}^d$, and requires only $\tilde{O}\left( \frac{VC(H)d\log \frac{D}{γδ}}{ε^2}\right)$ samples for robustness regions of (maximum) diameter $D$.
LGFeb 13, 2023
Do PAC-Learners Learn the Marginal Distribution?Max Hopkins, Daniel M. Kane, Shachar Lovett et al.
The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $\textit{uniform convergence}$ of empirical error in $H$ to its mean, or equivalently, to the problem of $\textit{density estimation}$, learnability of the underlying marginal distribution with respect to events in $H$. This seminal equivalence relies strongly on PAC learning's `distribution-free' assumption, that the adversary may choose any marginal distribution over data. Unfortunately, the distribution-free model is known to be overly adversarial in practice, failing to predict the success of modern machine learning algorithms, but without the Fundamental Theorem our theoretical understanding of learning under distributional constraints remains highly limited. In this work, we revisit the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting when the adversary is restricted to choosing a marginal distribution from a known family $\mathscr{P}$. We prove that while the traditional Fundamental Theorem indeed fails, a finer-grained connection between the three fundamental notions continues to hold: 1. PAC-Learning is strictly sandwiched between two refined models of density estimation, differing only in whether the learner $\textit{knows}$ the set of well-estimated events in $H$. 2. Under reasonable assumptions on $H$ and $\mathscr{P}$, density estimation is equivalent to $\textit{uniform estimation}$, a relaxation of uniform convergence allowing non-empirical estimators. Together, our results give a clearer picture of how the Fundamental Theorem extends beyond the distribution-free setting and shed new light on the classically challenging problem of learning under arbitrary distributional assumptions.
98.3DSApr 2
Non-Signaling Locality Lower Bounds for Dominating SetNoah Fleming, Max Hopkins, Yuichi Yoshida
Minimum dominating set is a basic local covering problem and a core task in distributed computing. Despite extensive study, in the classic LOCAL model there exist significant gaps between known algorithms and lower bounds. Chang and Li prove an $Ω(\log n)$-locality lower bound for a constant factor approximation, while Kuhn--Moscibroda--Wattenhofer gave an algorithm beating this bound beyond $\log Î$-approximation, along with a weaker lower bound for this degree-dependent setting scaling roughly with $\min\{\log Î/\log\log Î,\sqrt{\log n/\log\log n}\}$. Unfortunately, this latter bound is weak for small $Î$, and never recovers the Chang--Li bound, leaving central questions: does $O(\log Î)$-approximation require $Ω(\log n)$ locality, and do such bounds extend beyond LOCAL? In this work, we take a major step toward answering these questions in the non-signaling model, which strictly subsumes the LOCAL, quantum-LOCAL, and bounded-dependence settings. We prove every $O(\logÎ)$-approximate non-signaling distribution for dominating set requires locality $Ω(\log n/(\logÎ\cdot \mathrm{poly}\log\logÎ))$. Further, we show for some $β\in (0,1)$, every $O(\log^βÎ)$-approximate non-signaling distribution requires locality $Ω(\log n/\logÎ)$, which combined with the KMW bound yields a degree-independent $Ω(\sqrt{\log n/\log\log n})$ quantum-LOCAL lower bound for $O(\log^βÎ)$-approximation algorithms. The proof is based on two new low-soundness sensitivity lower bounds for label cover, one via Impagliazzo--Kabanets--Wigderson-style parallel repetition with degree reduction and one from a sensitivity-preserving reworking of the Dinur--Harsha framework, together with the reductions from label cover to set cover to dominating set and the sensitivity-to-locality transfer theorem of Fleming and Yoshida.
94.7COMay 21
A Simple Sub-Polynomial Degree Coboundary ExpanderMax Hopkins, Arka Ray
High dimensional expanders simultaneously satisfying spectral and combinatorial (coboundary) expansion have recently played a major role in breakthroughs in PCP and coding theory, but the only known construction of such complexes is extremely involved, requiring deep algebraic number theory. In this work, we give an extremely simple combinatorial construction of a sub-polynomial degree complex based on projections of the flags complex (subspace chains) that is (i) a local spectral expander, (ii) a coboundary expander, and (iii) a swap coboundary expander. As a corollary, we also give the first near-linear size combinatorial hypergraphs with good agreement tests in the '1%' regime, and a simple PCP construction with near-linear size.
LGFeb 11, 2025
The Role of Randomness in StabilityMax Hopkins, Shay Moran
Stability is a central property in learning and statistics promising the output of an algorithm $A$ does not change substantially when applied to similar datasets $S$ and $S'$. It is an elementary fact that any sufficiently stable algorithm (e.g.\ one returning the same result with high probability, satisfying privacy guarantees, etc.) must be randomized. This raises a natural question: can we quantify how much randomness is needed for algorithmic stability? We study the randomness complexity of two influential notions of stability in learning: replicability, which promises $A$ usually outputs the same result when run over samples from the same distribution (and shared random coins), and differential privacy, which promises the output distribution of $A$ remains similar under neighboring datasets. The randomness complexity of these notions was studied recently in (Dixon et al. ICML 2024) and (Cannone et al. ITCS 2024) for basic $d$-dimensional tasks (e.g. estimating the bias of $d$ coins), but little is known about the measures more generally or in complex settings like classification. Toward this end, we prove a `weak-to-strong' boosting theorem for stability: the randomness complexity of a task $M$ (either under replicability or DP) is tightly controlled by the best replication probability of any deterministic algorithm solving the task, a weak measure called `global stability' that is universally capped at $\frac{1}{2}$ (Chase et al. FOCS 2023). Using this, we characterize the randomness complexity of PAC Learning: a class has bounded randomness complexity iff it has finite Littlestone dimension, and moreover scales at worst logarithmically in the excess error of the learner. This resolves a question of (Chase et al. STOC 2024) who asked for such a characterization in the equivalent language of (error-dependent) `list-replicability'.
LGJul 16, 2025
From Generative to Episodic: Sample-Efficient Replicable Reinforcement LearningMax Hopkins, Sihan Liu, Christopher Ye et al.
The epidemic failure of replicability across empirical science and machine learning has recently motivated the formal study of replicable learning algorithms [Impagliazzo et al. (2022)]. In batch settings where data comes from a fixed i.i.d. source (e.g., hypothesis testing, supervised learning), the design of data-efficient replicable algorithms is now more or less understood. In contrast, there remain significant gaps in our knowledge for control settings like reinforcement learning where an agent must interact directly with a shifting environment. Karbasi et. al show that with access to a generative model of an environment with $S$ states and $A$ actions (the RL 'batch setting'), replicably learning a near-optimal policy costs only $\tilde{O}(S^2A^2)$ samples. On the other hand, the best upper bound without a generative model jumps to $\tilde{O}(S^7 A^7)$ [Eaton et al. (2024)] due to the substantial difficulty of environment exploration. This gap raises a key question in the broader theory of replicability: Is replicable exploration inherently more expensive than batch learning? Is sample-efficient replicable RL even possible? In this work, we (nearly) resolve this problem (for low-horizon tabular MDPs): exploration is not a significant barrier to replicable learning! Our main result is a replicable RL algorithm on $\tilde{O}(S^2A)$ samples, bridging the gap between the generative and episodic settings. We complement this with a matching $\tildeΩ(S^2A)$ lower bound in the generative setting (under the common parallel sampling assumption) and an unconditional lower bound in the episodic setting of $\tildeΩ(S^2)$ showcasing the near-optimality of our algorithm with respect to the state space $S$.
LGOct 23, 2025
Approximate Replicability in LearningMax Hopkins, Russell Impagliazzo, Christopher Ye
Replicability, introduced by (Impagliazzo et al. STOC '22), is the notion that algorithms should remain stable under a resampling of their inputs (given access to shared randomness). While a strong and interesting notion of stability, the cost of replicability can be prohibitive: there is no replicable algorithm, for instance, for tasks as simple as threshold learning (Bun et al. STOC '23). Given such strong impossibility results we ask: under what approximate notions of replicability is learning possible? In this work, we propose three natural relaxations of replicability in the context of PAC learning: (1) Pointwise: the learner must be consistent on any fixed input, but not across all inputs simultaneously, (2) Approximate: the learner must output hypotheses that classify most of the distribution consistently, (3) Semi: the algorithm is fully replicable, but may additionally use shared unlabeled samples. In all three cases, for constant replicability parameters, we obtain sample-optimal agnostic PAC learners: (1) and (2) are achievable for ``free" using $Θ(d/α^2)$ samples, while (3) requires $Θ(d^2/α^2)$ labeled samples.
MLJun 4, 2024
Replicability in High Dimensional StatisticsMax Hopkins, Russell Impagliazzo, Daniel Kane et al.
The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for $1$-dimensional tasks including statistical queries. In this work, we study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks, including multi-hypothesis testing and mean estimation. Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetric tilings. As a consequence, we obtain matching sample complexity upper and lower bounds for replicable mean estimation of distributions with bounded covariance, resolving an open problem of [Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sivakumar, and Sorrell, STOC2023] and for the $N$-Coin Problem, resolving a problem of [Karbasi, Velegkas, Yang, and Zhou, NeurIPS2023] up to log factors. While our equivalence is computational, allowing us to shave log factors in sample complexity from the best known efficient algorithms, efficient isoperimetric tilings are not known. To circumvent this, we introduce several relaxed paradigms that do allow for sample and computationally efficient algorithms, including allowing pre-processing, adaptivity, and approximate replicability. In these cases we give efficient algorithms matching or beating the best known sample complexity for mean estimation and the coin problem, including a generic procedure that reduces the standard quadratic overhead of replicability to linear in expectation.
LGJan 24, 2022
Active Learning Polynomial Threshold FunctionsOmri Ben-Eliezer, Max Hopkins, Chutong Yang et al.
We initiate the study of active learning polynomial threshold functions (PTFs). While traditional lower bounds imply that even univariate quadratics cannot be non-trivially actively learned, we show that allowing the learner basic access to the derivatives of the underlying classifier circumvents this issue and leads to a computationally efficient algorithm for active learning degree-$d$ univariate PTFs in $\tilde{O}(d^3\log(1/\varepsilonδ))$ queries. We also provide near-optimal algorithms and analyses for active learning PTFs in several average case settings. Finally, we prove that access to derivatives is insufficient for active learning multivariate PTFs, even those of just two variables.
LGNov 8, 2021
Realizable Learning is All You NeedMax Hopkins, Daniel M. Kane, Shachar Lovett et al.
The equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory. With variants ranging from classical settings like PAC learning and regression to recent trends such as adversarially robust learning, it's surprising that we still lack a unified theory; traditional proofs of the equivalence tend to be disparate, and rely on strong model-specific assumptions like uniform convergence and sample compression. In this work, we give the first model-independent framework explaining the equivalence of realizable and agnostic learnability: a three-line blackbox reduction that simplifies, unifies, and extends our understanding across a wide variety of settings. This includes models with no known characterization of learnability such as learning with arbitrary distributional assumptions and more general loss functions, as well as a host of other popular settings such as robust learning, partial learning, fair learning, and the statistical query model. More generally, we argue that the equivalence of realizable and agnostic learning is actually a special case of a broader phenomenon we call property generalization: any desirable property of a learning algorithm (e.g. noise tolerance, privacy, stability) that can be satisfied over finite hypothesis classes extends (possibly in some variation) to any learnable hypothesis class.
LGFeb 9, 2021
Bounded Memory Active Learning through Enriched QueriesMax Hopkins, Daniel Kane, Shachar Lovett et al.
The explosive growth of easily-accessible unlabeled data has lead to growing interest in active learning, a paradigm in which data-hungry learning algorithms adaptively select informative examples in order to lower prohibitively expensive labeling costs. Unfortunately, in standard worst-case models of learning, the active setting often provides no improvement over non-adaptive algorithms. To combat this, a series of recent works have considered a model in which the learner may ask enriched queries beyond labels. While such models have seen success in drastically lowering label costs, they tend to come at the expense of requiring large amounts of memory. In this work, we study what families of classifiers can be learned in bounded memory. To this end, we introduce a novel streaming-variant of enriched-query active learning along with a natural combinatorial parameter called lossless sample compression that is sufficient for learning not only with bounded memory, but in a query-optimal and computationally efficient manner as well. Finally, we give three fundamental examples of classifier families with small, easy to compute lossless compression schemes when given access to basic enriched queries: axis-aligned rectangles, decision trees, and halfspaces in two dimensions.
CGApr 23, 2020
Point Location and Active Learning: Learning Halfspaces Almost OptimallyMax Hopkins, Daniel M. Kane, Shachar Lovett et al.
Given a finite set $X \subset \mathbb{R}^d$ and a binary linear classifier $c: \mathbb{R}^d \to \{0,1\}$, how many queries of the form $c(x)$ are required to learn the label of every point in $X$? Known as \textit{point location}, this problem has inspired over 35 years of research in the pursuit of an optimal algorithm. Building on the prior work of Kane, Lovett, and Moran (ICALP 2018), we provide the first nearly optimal solution, a randomized linear decision tree of depth $\tilde{O}(d\log(|X|))$, improving on the previous best of $\tilde{O}(d^2\log(|X|))$ from Ezra and Sharir (Discrete and Computational Geometry, 2019). As a corollary, we also provide the first nearly optimal algorithm for actively learning halfspaces in the membership query model. En route to these results, we prove a novel characterization of Barthe's Theorem (Inventiones Mathematicae, 1998) of independent interest. In particular, we show that $X$ may be transformed into approximate isotropic position if and only if there exists no $k$-dimensional subspace with more than a $k/d$-fraction of $X$, and provide a similar characterization for exact isotropic position.
LGJan 15, 2020
Noise-tolerant, Reliable Active Classification with Comparison QueriesMax Hopkins, Daniel Kane, Shachar Lovett et al.
With the explosion of massive, widely available unlabeled data in the past years, finding label and time efficient, robust learning algorithms has become ever more important in theory and in practice. We study the paradigm of active learning, in which algorithms with access to large pools of data may adaptively choose what samples to label in the hope of exponentially increasing efficiency. By introducing comparisons, an additional type of query comparing two points, we provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise. We further provide algorithms for a generalization of the popular Tsybakov low noise condition, and show how comparisons provide a strong reliability guarantee that is often impractical or impossible with only labels - returning a classifier that makes no errors with high probability.
COOct 17, 2019
A Novel CMB Component Separation Method: Hierarchical Generalized Morphological Component AnalysisSebastian Wagner-Carena, Max Hopkins, Ana Diaz Rivero et al.
We present a novel technique for Cosmic Microwave Background (CMB) foreground subtraction based on the framework of blind source separation. Inspired by previous work incorporating local variation to Generalized Morphological Component Analysis (GMCA), we introduce Hierarchical GMCA (HGMCA), a Bayesian hierarchical graphical model for source separation. We test our method on $N_{\rm side}=256$ simulated sky maps that include dust, synchrotron, free-free and anomalous microwave emission, and show that HGMCA reduces foreground contamination by $25\%$ over GMCA in both the regions included and excluded by the Planck UT78 mask, decreases the error in the measurement of the CMB temperature power spectrum to the $0.02-0.03\%$ level at $\ell>200$ (and $<0.26\%$ for all $\ell$), and reduces correlation to all the foregrounds. We find equivalent or improved performance when compared to state-of-the-art Internal Linear Combination (ILC)-type algorithms on these simulations, suggesting that HGMCA may be a competitive alternative to foreground separation techniques previously applied to observed CMB data. Additionally, we show that our performance does not suffer when we perturb model parameters or alter the CMB realization, which suggests that our algorithm generalizes well beyond our simplified simulations. Our results open a new avenue for constructing CMB maps through Bayesian hierarchical analysis.
LGJul 8, 2019
The Power of Comparisons for Actively Learning Linear ClassifiersMax Hopkins, Daniel M. Kane, Shachar Lovett
In the world of big data, large but costly to label datasets dominate many fields. Active learning, a semi-supervised alternative to the standard PAC-learning model, was introduced to explore whether adaptive labeling could learn concepts with exponentially fewer labeled samples. While previous results show that active learning performs no better than its supervised alternative for important concept classes such as linear separators, we show that by adding weak distributional assumptions and allowing comparison queries, active learning requires exponentially fewer samples. Further, we show that these results hold as well for a stronger model of learning called Reliable and Probably Useful (RPU) learning. In this model, our learner is not allowed to make mistakes, but may instead answer "I don't know." While previous negative results showed this model to have intractably large sample complexity for label queries, we show that comparison queries make RPU-learning at worst logarithmically more expensive in both the passive and active regimes.
MMSep 3, 2017
Simulated Annealing for JPEG QuantizationMax Hopkins, Michael Mitzenmacher, Sebastian Wagner-Carena
JPEG is one of the most widely used image formats, but in some ways remains surprisingly unoptimized, perhaps because some natural optimizations would go outside the standard that defines JPEG. We show how to improve JPEG compression in a standard-compliant, backward-compatible manner, by finding improved default quantization tables. We describe a simulated annealing technique that has allowed us to find several quantization tables that perform better than the industry standard, in terms of both compressed size and image fidelity. Specifically, we derive tables that reduce the FSIM error by over 10% while improving compression by over 20% at quality level 95 in our tests; we also provide similar results for other quality levels. While we acknowledge our approach can in some images lead to visible artifacts under large magnification, we believe use of these quantization tables, or additional tables that could be found using our methodology, would significantly reduce JPEG file sizes with improved overall image quality.