Haichen Hu

LG
h-index17
7papers
19citations
Novelty71%
AI Score58

7 Papers

LGMar 15
Interleaved Resampling and Refitting: Data and Compute-Efficient Evaluation of Black-Box Predictors

Haichen Hu, David Simchi-Levi

We study the problem of evaluating the excess risk of large-scale empirical risk minimization under the square loss. Leveraging the idea of wild refitting and resampling, we assume only black-box access to the training algorithm and develop an efficient procedure for estimating the excess risk. Our evaluation algorithm is both computationally and data efficient. In particular, it requires access to only a single dataset and does not rely on any additional validation data. Computationally, it only requires refitting the model on several much smaller datasets obtained through sequential resampling, in contrast to previous wild refitting methods that require full-scale retraining and might therefore be unsuitable for large-scale trained predictors. Our algorithm has an interleaved sequential resampling-and-refitting structure. We first construct pseudo-responses through a randomized residual symmetrization procedure. At each round, we thus resample two sub-datasets from the resulting covariate pseudo-response pairs. Finally, we retrain the model separately on these two small artificial datasets. We establish high probability excess risk guarantees under both fixed design and random design settings, showing that with a suitably chosen noise scale, our interleaved resampling and refitting algorithm yields an upper bound on the prediction error. Our theoretical analysis draws on tools from empirical process theory, harmonic analysis, Toeplitz operator theory, and sharp tensor concentration inequalities.

LGMay 1
Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

Haichen Hu, Jian Qian, David Simchi-Levi

Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While recent advances have explored offline oracle-efficient algorithms, their computational complexity typically scales with the cardinality of the state and action spaces, rendering them intractable for large-scale or continuous environments. In this paper, we address this fundamental limitation by studying offline oracle-efficient episodic RL through the lens of log-barrier and log-determinant regularization. Specifically, for tabular Markov Decision Processes (MDPs), we propose a novel algorithm that achieves the optimal $\tilde{O}(\sqrt{T})$ regret bound while requiring only $O(H\log\log T)$ calls to both the offline statistical estimation and planning oracles when $T$ is known and $O(H\log T)$ calls when $T$ is unknown. Crucially, this oracle complexity is entirely independent of the size of the state and action spaces. This strict independence drastically reduces the planning oracle complexity, representing a substantial improvement over existing offline oracle-efficient algorithms (Qian et al., 2024). Furthermore, we demonstrate the versatility of our framework by generalizing the algorithm to linear MDPs featuring infinite state spaces and arbitrary action spaces. We prove that this generalized approach successfully attains meaningful sub-linear regret. Consequently, our work yields the first doubly oracle-efficient (i.e., efficient with respect to both statistical estimation and policy optimization) regret minimization algorithm capable of solving MDPs with infinite state and action spaces, significantly expanding the boundaries of computationally tractable RL.

MLJan 30, 2025
Contextual Online Decision Making with Infinite-Dimensional Functional Regression

Haichen Hu, Rui Ai, Stephen Bates et al.

Contextual sequential decision-making problems play a crucial role in machine learning, encompassing a wide range of downstream applications such as bandits, sequential hypothesis testing and online risk control. These applications often require different statistical measures, including expectation, variance and quantiles. In this paper, we provide a universal admissible algorithm framework for dealing with all kinds of contextual online decision-making problems that directly learns the whole underlying unknown distribution instead of focusing on individual statistics. This is much more difficult because the dimension of the regression is uncountably infinite, and any existing linear contextual bandits algorithm will result in infinite regret. To overcome this issue, we propose an efficient infinite-dimensional functional regression oracle for contextual cumulative distribution functions (CDFs), where each data point is modeled as a combination of context-dependent CDF basis functions. Our analysis reveals that the decay rate of the eigenvalue sequence of the design integral operator governs the regression error rate and, consequently, the utility regret rate. Specifically, when the eigenvalue sequence exhibits a polynomial decay of order $\frac{1}γ\ge 1$, the utility regret is bounded by $\tilde{\mathcal{O}}\Big(T^{\frac{3γ+2}{2(γ+2)}}\Big)$. By setting $γ=0$, this recovers the existing optimal regret rate for contextual bandits with finite-dimensional regression and is optimal under a stronger exponential decay assumption. Additionally, we provide a numerical method to compute the eigenvalue sequence of the integral operator, enabling the practical implementation of our framework.

LGJul 10, 2025
Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective

Haichen Hu, David Simchi-Levi

We study a sequential contextual decision-making problem in which certain covariates are missing but can be imputed using a pre-trained AI model. From a theoretical perspective, we analyze how the presence of such a model influences the regret of the decision-making process. We introduce a novel notion called "model elasticity", which quantifies the sensitivity of the reward function to the discrepancy between the true covariate and its imputed counterpart. This concept provides a unified way to characterize the regret incurred due to model imputation, regardless of the underlying missingness mechanism. More surprisingly, we show that under the missing at random (MAR) setting, it is possible to sequentially calibrate the pre-trained model using tools from orthogonal statistical learning and doubly robust regression. This calibration significantly improves the quality of the imputed covariates, leading to much better regret guarantees. Our analysis highlights the practical value of having an accurate pre-trained model in sequential decision-making tasks and suggests that model elasticity may serve as a fundamental metric for understanding and improving the integration of pre-trained models in a wide range of data-driven decision-making problems.

LGNov 24, 2025
Doubly Wild Refitting: Model-Free Evaluation of High Dimensional Black-Box Predictions under Convex Losses

Haichen Hu, David Simchi-Levi

We study the problem of excess risk evaluation for empirical risk minimization (ERM) under general convex loss functions. Our contribution is an efficient refitting procedure that computes the excess risk and provides high-probability upper bounds under the fixed-design setting. Assuming only black-box access to the training algorithm and a single dataset, we begin by generating two sets of artificially modified pseudo-outcomes termed wild response, created by stochastically perturbing the gradient vectors with carefully chosen scaling. Using these two pseudo-labeled datasets, we then refit the black-box procedure twice to obtain two corresponding wild predictors. Finally, leveraging the original predictor, the two wild predictors, and the constructed wild responses, we derive an efficient excess risk upper bound. A key feature of our analysis is that it requires no prior knowledge of the complexity of the underlying function class. As a result, the method is essentially model-free and holds significant promise for theoretically evaluating modern opaque machine learning system--such as deep nerral networks and generative model--where traditional capacity-based learning theory becomes infeasible due to the extreme complexity of the hypothesis class.

MLSep 2, 2025
Perturbing the Derivative: Wild Refitting for Model-Free Evaluation of Machine Learning Models under Bregman Losses

Haichen Hu, David Simchi-Levi

We study the excess risk evaluation of classical penalized empirical risk minimization (ERM) with Bregman losses. We show that by leveraging the idea of wild refitting, one can efficiently upper bound the excess risk through the so-called "wild optimism," without relying on the global structure of the underlying function class. This property makes our approach inherently model-free. Unlike conventional analysis, our framework operates with just one dataset and black-box access to the training procedure. The method involves randomized Rademacher symmetrization and constructing artificially modified outputs by perturbation in the derivative space with appropriate scaling, upon which we retrain a second predictor for excess risk estimation. We establish high-probability performance guarantees both under the fixed design setting and the random design setting, demonstrating that wild refitting under Bregman losses, with an appropriately chosen wild noise scale, yields a valid upper bound on the excess risk. Thus, our work is promising for theoretically evaluating modern opaque ML models, such as deep neural networks and generative models, where the function class is too complex for classical learning theory and empirical process techniques.

MLMay 11, 2025
Constrained Online Decision-Making: A Unified Framework

Haichen Hu, David Simchi-Levi, Navid Azizan · mit

Contextual online decision-making problems with constraints appear in a wide range of real-world applications, such as adaptive experimental design under safety constraints, personalized recommendation with resource limits, and dynamic pricing under fairness requirements. In this paper, we investigate a general formulation of sequential decision-making with stage-wise feasibility constraints, where at each round, the learner must select an action based on observed context while ensuring that a problem-specific feasibility criterion is satisfied. We propose a unified algorithmic framework that captures many existing constrained learning problems, including constrained bandits, active learning with label budgets, online hypothesis testing with Type I error control, and model calibration. Central to our approach is the concept of upper counterfactual confidence bounds, which enables the design of practically efficient online algorithms with strong theoretical guarantees using any offline conditional density estimation oracle. To handle feasibility constraints in complex environments, we introduce a generalized notion of the eluder dimension, extending it from the classical setting based on square loss to a broader class of metric-like probability divergences. This allows us to capture the complexity of various density function classes and characterize the utility regret incurred due to feasibility constraint uncertainty. Our result offers a principled foundation for constrained sequential decision-making in both theory and practice.