Ethan Che

LG
h-index37
6papers
97citations
Novelty53%
AI Score37

6 Papers

LGAug 8, 2024Code
AExGym: Benchmarks and Environments for Adaptive Experimentation

Jimmy Wang, Ethan Che, Daniel R. Jiang et al.

Innovations across science and industry are evaluated using randomized trials (a.k.a. A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on real-world datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an open source library, AExGym, which is designed with modularity and extensibility in mind to allow experimentation practitioners to develop custom environments and algorithms.

LGMar 21, 2023
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches

Ethan Che, Hongseok Namkoong

Standard bandit algorithms that assume continual reallocation of measurement effort are challenging to implement due to delayed feedback and infrastructural/organizational difficulties. Motivated by practical instances involving a handful of reallocation epochs in which outcomes are measured in batches, we develop a computation-driven adaptive experimentation framework that can flexibly handle batching. Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms. By deriving a Gaussian sequential experiment, we formulate a dynamic program that can leverage prior information on average rewards. Instead of the typical theory-driven paradigm, we leverage computational tools and empirical benchmarking for algorithm development. In particular, our empirical analysis highlights a simple yet effective algorithm, Residual Horizon Optimization, which iteratively solves a planning problem using stochastic gradient descent. Our approach significantly improves statistical power over standard methods, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings that are difficult for standard methods, involving limited adaptivity, low signal-to-noise ratio, and unknown reward distributions.

LGSep 5, 2024
Differentiable Discrete Event Simulation for Queuing Network Control

Ethan Che, Jing Dong, Hongseok Namkoong

Queuing network control is essential for managing congestion in job-processing systems such as service systems, communication networks, and manufacturing processes. Despite growing interest in applying reinforcement learning (RL) techniques, queueing network control poses distinct challenges, including high stochasticity, large state and action spaces, and lack of stability. To tackle these challenges, we propose a scalable framework for policy optimization based on differentiable discrete event simulation. Our main insight is that by implementing a well-designed smoothing technique for discrete event dynamics, we can compute pathwise policy gradients for large-scale queueing networks using auto-differentiation software (e.g., Tensorflow, PyTorch) and GPU parallelization. Through extensive empirical experiments, we observe that our policy gradient estimators are several orders of magnitude more accurate than typical REINFORCE-based estimators. In addition, We propose a new policy architecture, which drastically improves stability while maintaining the flexibility of neural-network policies. In a wide variety of scheduling and admission control tasks, we demonstrate that training control policies with pathwise gradients leads to a 50-1000x improvement in sample efficiency over state-of-the-art RL methods. Unlike prior tailored approaches to queueing, our methods can flexibly handle realistic scenarios, including systems operating in non-stationary environments and those with non-exponential interarrival/service times.

LGAug 8, 2024
Optimization-Driven Adaptive Experimentation

Ethan Che, Daniel R. Jiang, Hongseok Namkoong et al.

Real-world experiments involve batched & delayed feedback, non-stationarity, multiple objectives & constraints, and (often some) personalization. Tailoring adaptive methods to address these challenges on a per-problem basis is infeasible, and static designs remain the de facto standard. Focusing on short-horizon ($\le 10$) adaptive experiments, we move away from bespoke algorithms and present a mathematical programming formulation that can flexibly incorporate a wide range of objectives, constraints, and statistical procedures. We formulating a dynamic program based on central limit approximations, which enables the use of scalable optimization methods based on auto-differentiation and GPU parallelization. To evaluate our framework, we implement a simple heuristic planning method ("solver") and benchmark it across hundreds of problem instances involving non-stationarity, personalization, and multiple objectives & constraints. Unlike bespoke methods (e.g., Thompson sampling variants), our mathematical programming framework provides consistent gains over static randomized control trials and exhibits robust performance across problem instances.

CLMar 3, 2025
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach

Ayeong Lee, Ethan Che, Tianyi Peng

Chain-of-thought prompting has emerged as a powerful technique for enabling large language models (LLMs) to solve complex reasoning tasks. However, these reasoning chains can be verbose, raising concerns about efficiency. In response, recent works have sought to decrease response lengths through simple prompting strategies (e.g. 'be concise'). In this work, we conduct the first systematic study of the relationship between reasoning length and model performance across a diverse range of compression instructions (e.g. 'use 10 words or less' or 'remove all punctuation'). In doing so, we discover a universal tradeoff between reasoning length and accuracy that persists across even very distinct reasoning chains. We demonstrate that this tradeoff emerges from a sharp threshold behavior at the question level: each task has an intrinsic 'token complexity' - a minimal number of tokens required for successful problem-solving. We show how token complexity enables us to compute information-theoretic limits on the accuracy-compression tradeoff, and find that prompt-based compression strategies operate far from these theoretical limits. This suggests there may be significant room for improvement and our framework provides a benchmark to help researchers evaluate progress in reasoning efficiency. Our work also highlights the importance of adaptive compression -- giving shorter responses for easier questions -- and we show that token complexity is a useful tool for measuring this capability.

LGJun 3, 2025
Optimization of Epsilon-Greedy Exploration

Ethan Che, Hakan Ceylan, James McInerney et al.

Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommendations and how this should evolve over time. While various heuristics exist for navigating the resulting exploration-exploitation tradeoff, selecting optimal exploration rates is complicated by practical constraints including batched updates, time-varying user traffic, short time horizons, and minimum exploration requirements. In this work, we propose a principled framework for determining the exploration schedule based on directly minimizing Bayesian regret through stochastic gradient descent (SGD), allowing for dynamic exploration rate adjustment via Model-Predictive Control (MPC). Through extensive experiments with recommendation datasets, we demonstrate that variations in the batch size across periods significantly influence the optimal exploration strategy. Our optimization methods automatically calibrate exploration to the specific problem setting, consistently matching or outperforming the best heuristic for each setting.