98.5LGApr 1Code
Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM ReasoningCai Zhou, Zekai Wang, Menghua Wu et al.
This work addresses inefficiencies in LLM deployment for reasoning tasks, offering a method to reduce compute costs while maintaining performance, though it is incremental as it builds on conformal prediction and test-time training.
96.6CLMay 26
Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot RepresentYuanhao Chen, Peter Chin
For computational linguists and cognitive scientists, this demonstrates that LLMs learn formal-syntactic abstractions invisible to UD-based probes, revealing that UD provides a lower bound on syntactic encoding.
97.4MEMay 26
When prompt perturbations break your A/B test: A valid statistical test for generative surveyingHayden Helm, Carey Priebe
For researchers and practitioners using LLM-based surveys for market research, this work addresses the critical issue of prompt sensitivity invalidating standard statistical tests, offering a valid alternative.
100.0APMay 22
Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest PredictionSiqi Li, Chuan Hong, Ziye Tian et al.
For healthcare systems deploying models across settings with incomplete data, DRUM provides a robust transfer learning method without requiring imputation or target labels.
98.1APJun 2
A Latent Variable Framework for Scaling Laws in Large Language ModelsPeiyao Cai, Chengyu Cui, Felipe Maia Polo et al.
This work addresses the need for flexible scaling laws that account for heterogeneity across LLM families and benchmarks, offering a more nuanced tool for model development and evaluation.
96.3APMar 26
Efficient Detection of Bad Benchmark Items with Novel Scalability CoefficientsMichael Hardy, Joshua Gilbert, Benjamin Domingue
This method addresses the challenge of ensuring assessment validity for AI researchers and educators by providing a scalable, model-agnostic tool to reduce reviewer effort in vetting large item sets.
94.6SIApr 2
Structural Diversity Drives Disruptive Scientific InnovationYichun Peng, Saike He, Peijie Zhang et al.
For researchers and policymakers in science of science, this provides a new, actionable principle for organizing teams to foster disruptive innovation, though it is an incremental extension of network-based team metrics.
96.5MEApr 3
Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-AnalysisYao Zhao, Zhiyue Zhang, Yanxun Xu
This addresses the need for scalable and reproducible evidence synthesis in precision medicine, representing a novel integration rather than an incremental improvement.
95.6MEMar 25
Conformal Selective Prediction with General Risk ControlTian Bai, Ying Jin
This addresses the need for reliable uncertainty quantification in high-stakes applications like drug discovery and health prediction, offering a novel method for risk control.
94.4APMay 13
Steer-to-Detect: Probing Hidden Representations for Detection of LLM-Generated TextsLuxu Liang, Xiang Li
For practitioners needing robust detection of LLM-generated text, S2D offers a method that improves discriminative power over raw hidden representations, though it is an incremental improvement over existing representation-based detectors.
92.7MLMay 7
Super-Level-Set Regression: Conditional Quantiles via Volume MinimizationSacha Braun, Michael I. Jordan, Francis Bach
For practitioners needing reliable prediction regions in multivariate regression, SLS offers a direct optimization alternative to two-step density-based methods, which are often difficult and computationally expensive.
92.6APApr 21
Ground-Level Near Real-Time Modeling for PM2.5 Pollution PredictionZachary R. Fox, Janet O. Agbaje, Dakotah Maguire et al.
For public health researchers and policymakers, this model enables timely and accurate air quality assessments, addressing the bottleneck of near real-time pollution exposure estimation.
93.8GNApr 12
Unveiling contrasting impacts of heat mitigation and adaptation policies on U.S. internal migrationChao Li, Xing Su, Chao Fan et al.
For policymakers, this reveals that heat mitigation and adaptation policies have opposing effects on internal migration, which is crucial for understanding population dynamics under climate change.
90.0LGMar 18
Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise ReductionYi Yu, Junzhuo Ma, Chenghuang Shen et al.
This work provides a practical solution for deploying efficient technical service agents, though it appears incremental as it builds on existing adaptation methods.
90.7APMar 24
Wafer-Level Etch Spatial Profiling for Process Monitoring from Time-Series with Time-LLMHyunwoo Kim, Munyoung Lee, Seung Hyub Jeon et al.
This addresses wafer-level spatial monitoring for plasma etching processes, which is incremental as it extends LLM reprogramming to spatial estimation.
88.9APMay 7
Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov modelsJose Benitez-Aurioles, Ricardo Silva, Brian McMillan et al.
This method corrects label bias due to differential testing frequency in clinical prediction models, improving fairness and accuracy for underdiagnosed groups.
87.6MLMar 20Code
Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and ProspectsHao Wang, Licheng Pan, Qingsong Wen et al.
It offers a systematic overview for researchers in time-series forecasting, but is incremental as it synthesizes existing literature.
90.2SYMar 31
Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis of Emerging Labor Market DisruptionRavish Gupta, Saket Kumar
This study addresses the problem of occupational displacement due to agentic AI for policymakers and regional planners, providing a novel framework to assess risks and opportunities in information-intensive sectors.
87.0APMay 27
Day-Ahead Electricity Price Forecasting Using a Multivariate Group Lasso MethodKeyi Wang, Jiaxiang Ji, Mahan Mansouri et al.
For electricity market operators and participants, the method offers an interpretable, low-complexity forecasting approach that outperforms existing methods on real-world data.
87.2ROMar 13
Beyond Binary Success: Sample-Efficient and Statistically Rigorous Robot Policy ComparisonDavid Snyder, Apurva Badithela, Nikolai Matni et al.
This work addresses the need for reliable and efficient policy comparison in robotics, particularly for generalist manipulation policies, by providing a unified approach that handles various metrics beyond binary success, though it is incremental as it builds on existing sequential inference methods.