34.0GTJun 2
Breaking $1/ε$ Barrier in Quantum Zero-Sum Games: Generalizing Metric Subregularity for SpectraplexesYiheng Su, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Pucheng Xiong
Quantum zero-sum games provide a framework for non-local games, quantum interactive proofs, and quantum machine learning, where players optimize a bilinear payoff over quantum states. In contrast to classical bilinear games over polyhedral domains, for which gradient methods achieve linear last-iterate convergence, comparable guarantees over spectraplexes have remained open. Recent work achieved only an $O(1/\varepsilon)$ average-iterate rate and suggested that semidefinite geometry may preclude classical-style linear rates. We refute this obstruction. We prove that quantum zero-sum games admit algorithms with $O(\log(1/\varepsilon))$ last-iterate convergence to Nash equilibrium. In particular, matrix variants of Nesterov's iterative smoothing and Optimistic Gradient Descent--Ascent match the asymptotic rate of the classical polyhedral case. The key technical ingredient is a new error-bound theory for semidefinite games, establishing metric subregularity of the relevant monotone operator over spectrahedra despite the absence of polyhedral structure. We also give a geometric characterization of Nash equilibria via slack operators, classifying strategic directions as essential, neutral, or non-essential. Under strict complementarity or nondegeneracy, this reduces to a sharp classical-style dichotomy. Finally, we revisit Optimistic Matrix Multiplicative Weights Update. By extending the Quantal Response Equilibrium framework to spectraplex games, we prove an $\widetilde O(1/\varepsilon)$ last-iterate guarantee, while showing that any $O(\log(1/\varepsilon))$ speedup for this method must depend on a natural, dimension-dependent condition number. Experiments support the theoretical picture, with Optimistic Gradient Descent--Ascent outperforming Optimistic Matrix Multiplicative Weights Update in the regimes studied.
LGNov 15, 2023Code
Wrapper Boxes: Faithful Attribution of Model Predictions to Training DataYiheng Su, Junyi Jessy Li, Matthew Lease
Can we preserve the accuracy of neural models while also providing faithful explanations of model decisions to training data? We propose a "wrapper box'' pipeline: training a neural model as usual and then using its learned feature representation in classic, interpretable models to perform prediction. Across seven language models of varying sizes, including four large language models (LLMs), two datasets at different scales, three classic models, and four evaluation metrics, we first show that the predictive performance of wrapper classic models is largely comparable to the original neural models. Because classic models are transparent, each model decision is determined by a known set of training examples that can be directly shown to users. Our pipeline thus preserves the predictive performance of neural language models while faithfully attributing classic model decisions to training data. Among other use cases, such attribution enables model decisions to be contested based on responsible training instances. Compared to prior work, our approach achieves higher coverage and correctness in identifying which training data to remove to change a model decision. To reproduce findings, our source code is online at: https://github.com/SamSoup/WrapperBox.
79.6GTApr 6
On the Exploitability of FTRL DynamicsYiheng Su, Emmanouil-Vasileios Vlatakis-Gkaragkounis
In this paper we investigate the exploitability of a Follow-the-Regularized-Leader (FTRL) learner with constant step size $η$ in $n\times m$ two-player zero-sum games played over $T$ rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that exploitability is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for fixed optimizer, we establish a sweeping law of order $Ω(N/η)$, proving that exploitation scales to the number of the learner's suboptimal actions $N$ and vanishes in their absence. Second, for alternating optimizer, a surplus of $Ω(ηT/\mathrm{poly}(n,m))$ can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers once more the sharp geometric dichotomy: non-steep regularizers allow the optimizer to extract maximum surplus via finite-time elimination of suboptimal actions, whereas steep ones introduce a vanishing correction that may delay exploitation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and we propose susceptibility measure to quantify which regularizers are most vulnerable to strategic manipulation.
35.6CLApr 1
LLM REgression with a Latent Iterative State HeadYiheng Su, Matthew Lease
We present RELISH (REgression with a Latent Iterative State Head), a novel, lightweight architecture designed for text regression with large language models. Rather than decoding numeric targets as text or aggregating multiple generated outputs, RELISH predicts scalar values directly from frozen LLM representations by iteratively refining a learned latent state through cross-attention over token-level representations, and then mapping the final state to a point estimate with a linear regressor. Across five datasets, four LLM backbones, and two LLM training regimes, RELISH consistently outperforms prior baselines from all three major LLM regression families, including autoregressive decoding, regression-aware inference, and existing predictive head methods. Despite these gains, RELISH remains highly parameter-efficient, requiring only 3.4-3.7M trainable parameters across frozen LLM backbones (only 0.01-0.04% additional overhead), far less than LoRA-based alternatives that grow with model size (0.26-0.42%).
CLSep 9, 2025
Instance-level Performance Prediction for Long-form Generation TasksChi-Yang Hsu, Alexander Braylan, Yiheng Su et al.
We motivate and share a new benchmark for instance-level performance prediction of long-form generation tasks having multi-faceted, fine-grained quality metrics. Our task-, model- and metric-agnostic formulation predicts continuous evaluation metric scores given only black-box model inputs and outputs. Beyond predicting point estimates of metric scores, the benchmark also requires inferring prediction intervals to quantify uncertainty around point estimates. Evaluation spans 11 long-form datasets/tasks with multiple LLMs, baselines, and metrics per task. We show that scores can be effectively predicted across long-form generation tasks using as few as 16 training examples. Overall, we introduce a novel and useful task, a valuable benchmark to drive progress, and baselines ready for practical adoption today.
HCJun 27, 2024
Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on DepressionJiaying Lizzy Liu, Yunlong Wang, Yao Lyu et al.
Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.