Reese Pathak

h-index6

7papers

346citations

Novelty54%

AI Score45

Ranked #40,393 of 194,257 authors (top 21%)#9,411 in LG (top 23%)

7 Papers

20.8STMay 6, 2022

Optimally tackling covariate shift in RKHS-based nonparametric regression

Cong Ma, Reese Pathak, Martin J. Wainwright

We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of likelihood ratios apart from an upper bound on them. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a naive estimator, which minimizes the empirical risk over the function class, is strictly sub-optimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where the likelihood ratio is possibly unbounded yet has a finite second moment. Here, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax rate-optimal, up to logarithmic factors.

15.5LGNov 14, 2023

Transformers can optimally learn regression mixture models

Reese Pathak, Rajat Sen, Weihao Kong et al.

Mixture models arise in many regression problems, but most methods have seen limited adoption partly due to these algorithms' highly-tailored and model-specific nature. On the other hand, transformers are flexible, neural sequence models that present the intriguing possibility of providing general-purpose prediction methods, even in this mixture setting. In this work, we investigate the hypothesis that transformers can learn an optimal predictor for mixtures of regressions. We construct a generative process for a mixture of linear regressions for which the decision-theoretic optimal procedure is given by data-driven exponential weights on a finite set of parameters. We observe that transformers achieve low mean-squared error on data generated via this process. By probing the transformer's output at inference time, we also show that transformers typically make predictions that are close to the optimal predictor. Our experiments also demonstrate that transformers can learn mixtures of regressions in a sample-efficient fashion and are somewhat robust to distribution shifts. We complement our experimental observations by proving constructively that the decision-theoretic optimal procedure is indeed implementable by a transformer.

10.0PRJun 3

A remark on the majorizing measures theorem for general processes

Reese Pathak, Nikita Zhivotovskiy · eth-zurich

We show that the lower bound in the majorizing measures theorem holds for a large class of random vectors. Specifically, suppose $X \sim μ$ is a centered random vector in $\mathbf{R}^n$ with \[ C_{\mathrm{KL}}(μ) = \sup_{\substack{θ\neq η\\ θ, η\in \mathbf{R}^n}} \frac{\mathrm{KL}(μ_θ\| μ_η)}{\|θ- η\|_2^2} < \infty, \] where $μ_θ$ denotes the law of the translate $θ+ X$. Then, for every nonempty, bounded $T \subset \mathbf{R}^n$, \[ \sqrt{C_{\mathrm{KL}}(μ)}\, \mathbf{E}_μ\Big[\sup_{t \in T} \, \langle X, t \rangle \Big] \gtrsim γ_2(T), \] where the righthand side denotes Talagrand's generic chaining functional. This result recovers, as a special case, the lower bound in the majorizing measures theorem for centered Gaussian processes. Our argument critically relies on the rate-distortion integral, recently introduced by J. Liu.

2.3MEMar 28, 2024Code

Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction

Drew T. Nguyen, Reese Pathak, Anastasios N. Angelopoulos et al. · berkeley

Decision-making pipelines are generally characterized by tradeoffs among various risk functions. It is often desirable to manage such tradeoffs in a data-adaptive manner. As we demonstrate, if this is done naively, state-of-the art uncertainty quantification methods can lead to significant violations of putative risk guarantees. To address this issue, we develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively. Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions. To illustrate the benefits of our approach, we carry out numerical experiments on synthetic data and the large-scale vision dataset MS-COCO.

13.8STFeb 6, 2022

Reese Pathak, Cong Ma, Martin J. Wainwright

We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions that is based on the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of Hölder continuous functions under covariate shift. In comparison to the recently proposed notion of transfer exponent, this measure leads to a sharper rate of convergence and is more fine-grained. We accompany our theory with concrete instances of covariate shift that illustrate this sharp difference.

3.1LGOct 26, 2021

Cluster-and-Conquer: A Framework For Time-Series Forecasting

Reese Pathak, Rajat Sen, Nikhil Rao et al.

We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time series in the same cluster, accounting for intra-cluster similarity while minimizing potential noise in predictions by ignoring inter-cluster effects. Our framework -- which we refer to as "cluster-and-conquer" -- is highly general, allowing for any time-series forecasting and clustering method to be used in each step. It is computationally efficient and embarrassingly parallel. We motivate our framework with a theoretical analysis in an idealized mixed linear regression setting, where we provide guarantees on the quality of the estimates. We accompany these guarantees with experimental results that demonstrate the advantages of our framework: when instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets, sometimes outperforming deep-learning-based approaches.

29.4LGMay 11, 2020

FedSplit: An algorithmic framework for fast federated optimization

Reese Pathak, Martin J. Wainwright

Motivated by federated learning, we consider the hub-and-spoke model of distributed optimization in which a central authority coordinates the computation of a solution among many agents while limiting communication. We first study some past procedures for federated optimization, and show that their fixed points need not correspond to stationary points of the original optimization problem, even in simple convex settings with deterministic updates. In order to remedy these issues, we introduce FedSplit, a class of algorithms based on operator splitting procedures for solving distributed convex minimization with additive structure. We prove that these procedures have the correct fixed points, corresponding to optima of the original optimization problem, and we characterize their convergence rates under different settings. Our theory shows that these methods are provably robust to inexact computation of intermediate local quantities. We complement our theory with some simple experiments that demonstrate the benefits of our methods in practice.