Kevin Guo

19.5CLMar 17

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

Juming Xiong, Kevin Guo, Congning Ni et al.

Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating multiple reasoning trajectories, leading to substantial additional computational overhead. This paper introduces a confidence-aware decision framework that analyzes a single completed reasoning trajectory to adaptively select between single-path and multi-path reasoning. The framework is trained using sentence-level numeric and linguistic features extracted from intermediate reasoning states in the MedQA dataset and generalizes effectively to MathQA, MedMCQA, and MMLU without additional fine-tuning. Experimental results show that the proposed method maintains accuracy comparable to multi-path baselines while using up to 80\% fewer tokens. These findings demonstrate that reasoning trajectories contain rich signals for uncertainty estimation, enabling a simple, transferable mechanism to balance accuracy and efficiency in LLM reasoning.

MEDec 21, 2021

Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding

Jacob Dorn, Kevin Guo, Nathan Kallus

We consider the problem of constructing bounds on the average treatment effect (ATE) when unmeasured confounders exist but have bounded influence. Specifically, we assume that omitted confounders could not change the odds of treatment for any unit by more than a fixed factor. We derive the sharp partial identification bounds implied by this assumption by leveraging distributionally robust optimization, and we propose estimators of these bounds with several novel robustness properties. The first is double sharpness: our estimators consistently estimate the sharp ATE bounds when one of two nuisance parameters is misspecified and achieve semiparametric efficiency when all nuisance parameters are suitably consistent. The second is double validity: even when most nuisance parameters are misspecified, our estimators still provide valid but possibly conservative bounds for the ATE and our Wald confidence intervals remain valid even when our estimators are not asymptotically normal. As a result, our estimators provide a highly credible method for sensitivity analysis of causal inferences.

Kevin Guo

2 Papers