12.6GTJun 2
Second-Best Bilateral Trade is $1/2$ EfficientZhengyang Liu, Ying Qin, Zeyu Ren et al.
The landmark Myerson-Satterthwaite Theorem establishes a fundamental impossibility in bilateral trade: no Bayesian incentive-compatible mechanism can simultaneously achieve ex-post efficiency, individual rationality, and strong budget balance. We resolve a long-standing open question regarding the efficiency loss imposed by these constraints. Specifically, we prove that the Bayesian-optimal (second-best) mechanism always captures at least half of the first-best gains from trade ($\mathrm{SB}\ge\frac{1}{2}\mathrm{FB}$). This result is tight, definitively closing the gap between the previously best-known bounds of $0.317$ and $0.736$.
10.5GTJun 4
Constant Approximation for Hylland--Zeckhauser EquilibriaYonglei Yan, Zhengyang Liu
We present a polynomial-time algorithm for computing a $1/e$-approximate Hylland--Zeckhauser (HZ) equilibrium. This establishes the \emph{first} efficient approximation guarantee for HZ equilibria in settings with multi-valued utilities. Our main technical contribution is a novel utility stratification technique that reduces the original multi-valued market to a structured bi-valued instance. This reduction allows us to efficiently compute the approximation by leveraging the exact algorithm of Vazirani and Yannakakis.
29.1CLApr 7
EpiBench: Benchmarking Multi-turn Research Workflows for Multimodal AgentsXuan Dong, Huanyang Zheng, Tianhao Niu et al.
Scientific research follows multi-turn, multi-step workflows that require proactively searching the literature, consulting figures and tables, and integrating evidence across papers to align experimental settings and support reproducible conclusions. This joint capability is not systematically assessed in existing benchmarks, which largely under-evaluate proactive search, multi-evidence integration and sustained evidence use over time. In this work, we introduce EpiBench, an episodic multi-turn multimodal benchmark that instantiates short research workflows. Given a research task, agents must navigate across papers over multiple turns, align evidence from figures and tables, and use the accumulated evidence in the memory to answer objective questions that require cross paper comparisons and multi-figure integration. EpiBench introduces a process-level evaluation framework for fine-grained testing and diagnosis of research agents. Our experiments show that even the leading model achieves an accuracy of only 29.23% on the hard split, indicating substantial room for improvement in multi-turn, multi-evidence research workflows, providing an evaluation platform for verifiable and reproducible research agents.
CLSep 5, 2025Code
PRIM: Towards Practical In-Image Multilingual Machine TranslationYanzhi Tian, Zeming Liu, Zhengyang Liu et al.
In-Image Machine Translation (IIMT) aims to translate images containing texts from one language to another. Current research of end-to-end IIMT mainly conducts on synthetic data, with simple background, single font, fixed text position, and bilingual translation, which can not fully reflect real world, causing a significant gap between the research and practical conditions. To facilitate research of IIMT in real-world scenarios, we explore Practical In-Image Multilingual Machine Translation (IIMMT). In order to convince the lack of publicly available data, we annotate the PRIM dataset, which contains real-world captured one-line text images with complex background, various fonts, diverse text positions, and supports multilingual translation directions. We propose an end-to-end model VisTrans to handle the challenge of practical conditions in PRIM, which processes visual text and background information in the image separately, ensuring the capability of multilingual translation while improving the visual quality. Experimental results indicate the VisTrans achieves a better translation quality and visual effect compared to other models. The code and dataset are available at: https://github.com/BITHLP/PRIM.
8.0GTMay 10
Pacing Equilibria in Second-Price Auctions with Few GoodsYiyang Huang, Yonglei Yan, Zihe Wang et al.
In this paper, we investigate the computation of second-price pacing equilibria (SPPEs), a foundational model in online advertising auctions. We present a polynomial-time algorithm for computing exact SPPEs in instances with a constant number of goods. Our core technique maps buyers' pacing multipliers to the highest bids on each good, effectively partitioning the parameter space into a set of distinct geometric cells. By enumerating these cells, we fix the relative ordering of the bids and reduce the problem of equilibrium computation to a linear feasibility program. Finally, we demonstrate that this tractability extends to large-scale markets with an arbitrary number of goods, provided the goods can be aggregated into a constant number of valuation types.
LGFeb 12, 2024
Online Sequential Decision-Making with Unknown DelaysPing Wu, Heyan Huang, Zhengyang Liu
In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. Our proposed algorithms are versatile and applicable to universal norms. Specifically, we introduce a family of Follow the Delayed Regularized Leader algorithms for feedback with full information on the loss function, a family of Delayed Mirror Descent algorithms for feedback with gradient information on the loss function and a family of Simplified Delayed Mirror Descent algorithms for feedback with the value information of the loss function's gradients at corresponding decision points. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity, respectively. We also demonstrate the efficiency of each algorithm under different norms through concrete examples. Furthermore, our theoretical results are consistent with the current best bounds when degenerated to standard settings.
CLMay 21, 2025
Exploring In-Image Machine Translation with Real-World BackgroundYanzhi Tian, Zeming Liu, Zhengyang Liu et al.
In-Image Machine Translation (IIMT) aims to translate texts within images from one language to another. Previous research on IIMT was primarily conducted on simplified scenarios such as images of one-line text with black font in white backgrounds, which is far from reality and impractical for applications in the real world. To make IIMT research practically valuable, it is essential to consider a complex scenario where the text backgrounds are derived from real-world images. To facilitate research of complex scenario IIMT, we design an IIMT dataset that includes subtitle text with real-world background. However previous IIMT models perform inadequately in complex scenarios. To address the issue, we propose the DebackX model, which separates the background and text-image from the source image, performs translation on text-image directly, and fuses the translated text-image with the background, to generate the target image. Experimental results show that our model achieves improvements in both translation quality and visual effect.
CLOct 17, 2025
LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned EvaluationGao Yang, Yuhang Liu, Siyu Miao et al.
Ideal or real - that is the question.In this work, we explore whether principles from game theory can be effectively applied to the evaluation of large language models (LLMs). This inquiry is motivated by the growing inadequacy of conventional evaluation practices, which often rely on fixed-format tasks with reference answers and struggle to capture the nuanced, subjective, and open-ended nature of modern LLM behavior. To address these challenges, we propose a novel alternative: automatic mutual evaluation, where LLMs assess each other's output through self-play and peer review. These peer assessments are then systematically compared with human voting behavior to evaluate their alignment with human judgment. Our framework incorporates game-theoretic voting algorithms to aggregate peer reviews, enabling a principled investigation into whether model-generated rankings reflect human preferences. Empirical results reveal both convergences and divergences between theoretical predictions and human evaluations, offering valuable insights into the promises and limitations of mutual evaluation. To the best of our knowledge, this is the first work to jointly integrate mutual evaluation, game-theoretic aggregation, and human-grounded validation for evaluating the capabilities of LLMs.
HCAug 16, 2020
From Lost to Found: Discover Missing UI Design Semantics through Recovering Missing TagsChunyang Chen, Sidong Feng, Zhengyang Liu et al.
Design sharing sites provide UI designers with a platform to share their works and also an opportunity to get inspiration from others' designs. To facilitate management and search of millions of UI design images, many design sharing sites adopt collaborative tagging systems by distributing the work of categorization to the community. However, designers often do not know how to properly tag one design image with compact textual description, resulting in unclear, incomplete, and inconsistent tags for uploaded examples which impede retrieval, according to our empirical study and interview with four professional designers. Based on a deep neural network, we introduce a novel approach for encoding both the visual and textual information to recover the missing tags for existing UI examples so that they can be more easily found by text queries. We achieve 82.72% accuracy in the tag prediction. Through a simulation test of 5 queries, our system on average returns hundreds more results than the default Dribbble search, leading to better relatedness, diversity and satisfaction.
OCJun 12, 2020
ACMo: Angle-Calibrated Moment Methods for Stochastic OptimizationXunpeng Huang, Runxin Xu, Hao Zhou et al.
Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, \emph{Angle-Calibrated Moment method} (\method). Our theoretical results show that \method is able to achieve the same convergence rate as mainstream adaptive methods. Furthermore, extensive experiments on CV and NLP tasks demonstrate that \method has a comparable convergence to SOTA Adam-type optimizers, and gains a better generalization performance in most cases.
OCFeb 10, 2020
SPAN: A Stochastic Projected Approximate Newton MethodXunpeng Huang, Xianfeng Liang, Zhengyang Liu et al.
Second-order optimization methods have desirable convergence properties. However, the exact Newton method requires expensive computation for the Hessian and its inverse. In this paper, we propose SPAN, a novel approximate and fast Newton method. SPAN computes the inverse of the Hessian matrix via low-rank approximation and stochastic Hessian-vector products. Our experiments on multiple benchmark datasets demonstrate that SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time. Furthermore, we provide a theoretical analysis of the per-iteration complexity, the approximation error, and the convergence rate. Both the theoretical analysis and experimental results show that our proposed method achieves a better trade-off between the convergence rate and the per-iteration efficiency.