Xuancheng Li

IR
h-index15
5papers
5citations
Novelty50%
AI Score47

5 Papers

15.5IRMay 24
Beyond Exposure: Optimizing Ranking Fairness with Non-linear Time-Income Functions

Xuancheng Li, Tao Yang, Yujia Zhou et al.

Ranking systems in web search and recommendation allocate attention among items and providers, and therefore need to balance relevance-based effectiveness with provider fairness. Existing fair-ranking methods commonly focus on exposure fairness, where cumulative exposure is allocated in proportion to item merit. However, exposure is often only an intermediate signal: the actual utility received by a provider may depend on context-dependent conversion from exposure to income, such as clicks, purchases, or advertising value. This paper studies fair ranking under context-dependent provider utility, which we refer to as income. We formalize income fairness by requiring cumulative provider income to be proportional to relevance, and define an income-unfairness metric based on this proportionality condition. We then propose DIDRF, a Dynamic-Income-Derivative-aware Ranking Fairness algorithm for income-fair ranking. DIDRF uses the quadratic structure of income-fairness violations to derive a state-aware scoring rule that jointly considers ranking effectiveness and the marginal effect of each ranking decision on cumulative income fairness. Experiments on standard learning-to-rank datasets with log-calibrated semi-synthetic income environments based on advertising and e-commerce logs show that DIDRF consistently improves income fairness over representative fair-ranking baselines while preserving competitive ranking effectiveness.

CLFeb 3
ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMs

Xuancheng Li, Haitao Li, Yujia Zhou et al.

Long-context inputs in large language models (LLMs) often suffer from the "lost in the middle" problem, where critical information becomes diluted or ignored due to excessive length. Context compression methods aim to address this by reducing input size, but existing approaches struggle with balancing information preservation and compression efficiency. We propose Adaptive Task-Aware Compressor (ATACompressor), which dynamically adjusts compression based on the specific requirements of the task. ATACompressor employs a selective encoder that compresses only the task-relevant portions of long contexts, ensuring that essential information is preserved while reducing unnecessary content. Its adaptive allocation controller perceives the length of relevant content and adjusts the compression rate accordingly, optimizing resource utilization. We evaluate ATACompressor on three QA datasets: HotpotQA, MSMARCO, and SQUAD-showing that it outperforms existing methods in terms of both compression efficiency and task performance. Our approach provides a scalable solution for long-context processing in LLMs. Furthermore, we perform a range of ablation studies and analysis experiments to gain deeper insights into the key components of ATACompressor.

LGJan 30
Beyond Experience Retrieval: Learning to Generate Utility-Optimized Structured Experience for Frozen LLMs

Xuancheng Li, Haitao Li, Yujia Zhou et al.

Large language models (LLMs) are largely static and often redo reasoning or repeat mistakes. Prior experience reuse typically relies on external retrieval, which is similarity-based, can introduce noise, and adds latency. We introduce SEAM (Structured Experience Adapter Module), a lightweight, executor-specific plug-in that stores experience in its parameters and generates a structured, instance-tailored experience entry in a single forward pass to guide a frozen LLM executor. SEAM is trained for utility via executor rollouts and GRPO while keeping the executor frozen, and it can be further improved after deployment with supervised fine-tuning on logged successful trajectories. Experiments on mathematical reasoning benchmarks show consistent accuracy gains across executors with low overhead. Extensive ablations and analyses further elucidate the mechanisms underlying SEAM's effectiveness and robustness.

AIJan 30
MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop

Xuancheng Li, Haitao Li, Yujia Zhou et al.

Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning in multiple domains, yet outcome-only scalar rewards are often sparse and uninformative, especially on failed samples, where they merely indicate failure and provide no insight into why the reasoning fails. In this paper, we investigate how to leverage richer verbal feedback to guide RLVR training on failed samples, and how to convert such feedback into a trainable learning signal. Specifically, we propose a multi-turn feedback-guided reinforcement learning framework. It builds on three mechanisms: (1) dynamic multi-turn regeneration guided by feedback, triggered only on failed samples, (2) two complementary learning signals for within-turn and cross-turn optimization, and (3) structured feedback injection into the model's reasoning process. Trained on sampled OpenR1-Math, the approach outperforms supervised fine-tuning and RLVR baselines in-domain and generalizes well out-of-domain.

IRSep 22, 2021
Why Don't You Click: Neural Correlates of Non-Click Behaviors in Web Search

Ziyi Ye, Xiaohui Xie, Yiqun Liu et al.

Web search heavily relies on click-through behavior as an essential feedback signal for performance improvement and evaluation. Traditionally, click is usually treated as a positive implicit feedback signal of relevance or usefulness, while non-click (especially non-click after examination) is regarded as a signal of irrelevance or uselessness. However, there are many cases where users do not click on any search results but still satisfy their information need with the contents of the results shown on the Search Engine Result Page (SERP). This raises the problem of measuring result usefulness and modeling user satisfaction in "Zero-click" search scenarios. Previous works have solved this issue by (1) detecting user satisfaction for abandoned SERP with context information and (2) considering result-level click necessity with external assessors' annotations. However, few works have investigated the reason behind non-click behavior and estimated the usefulness of non-click results. A challenge for this research question is how to collect valuable feedback for non-click results. With neuroimaging technologies, we design a lab-based user study and reveal differences in brain signals while examining non-click search results with different usefulness levels. The findings in significant brain regions and electroencephalogram~(EEG) spectrum also suggest that the process of usefulness judgment might involve similar cognitive functions of relevance perception and satisfaction decoding. Inspired by these findings, we conduct supervised learning tasks to estimate the usefulness of non-click results with brain signals and conventional information (i.e., content and context factors). Results show that it is feasible to utilize brain signals to improve usefulness estimation performance and enhancing human-computer interactions in "Zero-click" search scenarios.