CVOct 4, 2023Code
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented BenchmarksZejun Li, Ye Wang, Mengfei Du et al.
Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluate. Most existing multi-modal benchmarks require task-oriented input-output formats, posing great challenges to automatically assess the free-form text output of LVLMs. To effectively leverage the annotations available in existing benchmarks and reduce the manual effort required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formats. Through systematic data collection and reformulation, we present the ReForm-Eval benchmark, offering substantial data for evaluating various capabilities of LVLMs. Based on ReForm-Eval, we conduct extensive experiments, thoroughly analyze the strengths and weaknesses of existing LVLMs, and identify the underlying factors. Our benchmark and evaluation framework will be open-sourced as a cornerstone for advancing the development of LVLMs.
IRJun 23, 2022
Intelligent Request Strategy Design in Recommender SystemXufeng Qian, Yue Xu, Fuyu Lv et al.
Waterfall Recommender System (RS), a popular form of RS in mobile applications, is a stream of recommended items consisting of successive pages that can be browsed by scrolling. In waterfall RS, when a user finishes browsing a page, the edge (e.g., mobile phones) would send a request to the cloud server to get a new page of recommendations, known as the paging request mechanism. RSs typically put a large number of items into one page to reduce excessive resource consumption from numerous paging requests, which, however, would diminish the RSs' ability to timely renew the recommendations according to users' real-time interest and lead to a poor user experience. Intuitively, inserting additional requests inside pages to update the recommendations with a higher frequency can alleviate the problem. However, previous attempts, including only non-adaptive strategies (e.g., insert requests uniformly), would eventually lead to resource overconsumption. To this end, we envision a new learning task of edge intelligence named Intelligent Request Strategy Design (IRSD). It aims to improve the effectiveness of waterfall RSs by determining the appropriate occasions of request insertion based on users' real-time intention. Moreover, we propose a new paradigm of adaptive request insertion strategy named Uplift-based On-edge Smart Request Framework (AdaRequest). AdaRequest 1) captures the dynamic change of users' intentions by matching their real-time behaviors with their historical interests based on attention-based neural networks. 2) estimates the counterfactual uplift of user purchase brought by an inserted request based on causal inference. 3) determines the final request insertion strategy by maximizing the utility function under online resource constraints. We conduct extensive experiments on both offline dataset and online A/B test to verify the effectiveness of AdaRequest.
IRMar 29, 2022
Modeling Users' Contextualized Page-wise Feedback for Click-Through Rate Prediction in E-commerce SearchZhifang Fan, Dan Ou, Yulong Gu et al.
Modeling user's historical feedback is essential for Click-Through Rate Prediction in personalized search and recommendation. Existing methods usually only model users' positive feedback information such as click sequences which neglects the context information of the feedback. In this paper, we propose a new perspective for context-aware users' behavior modeling by including the whole page-wisely exposed products and the corresponding feedback as contextualized page-wise feedback sequence. The intra-page context information and inter-page interest evolution can be captured to learn more specific user preference. We design a novel neural ranking model RACP(i.e., Recurrent Attention over Contextualized Page sequence), which utilizes page-context aware attention to model the intra-page context. A recurrent attention process is used to model the cross-page interest convergence evolution as denoising the interest in the previous pages. Experiments on public and real-world industrial datasets verify our model's effectiveness.
IROct 9, 2022
Multi-Objective Personalized Product Retrieval in Taobao SearchYukun Zheng, Jiang Bian, Guanghao Meng et al.
In large-scale e-commerce platforms like Taobao, it is a big challenge to retrieve products that satisfy users from billions of candidates. This has been a common concern of academia and industry. Recently, plenty of works in this domain have achieved significant improvements by enhancing embedding-based retrieval (EBR) methods, including the Multi-Grained Deep Semantic Product Retrieval (MGDSPR) model [16] in Taobao search engine. However, we find that MGDSPR still has problems of poor relevance and weak personalization compared to other retrieval methods in our online system, such as lexical matching and collaborative filtering. These problems promote us to further strengthen the capabilities of our EBR model in both relevance estimation and personalized retrieval. In this paper, we propose a novel Multi-Objective Personalized Product Retrieval (MOPPR) model with four hierarchical optimization objectives: relevance, exposure, click and purchase. We construct entire-space multi-positive samples to train MOPPR, rather than the single-positive samples for existing EBR models.We adopt a modified softmax loss for optimizing multiple objectives. Results of extensive offline and online experiments show that MOPPR outperforms the baseline MGDSPR on evaluation metrics of relevance estimation and personalized retrieval. MOPPR achieves 0.96% transaction and 1.29% GMV improvements in a 28-day online A/B test. Since the Double-11 shopping festival of 2021, MOPPR has been fully deployed in mobile Taobao search, replacing the previous MGDSPR. Finally, we discuss several advanced topics of our deeper explorations on multi-objective retrieval and ranking to contribute to the community.
AIApr 13
Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement LearningXiaozhe Li, Tianyi Lyu, Yizhao Yang et al.
Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context management from task execution. Our architecture pairs a lightweight, specialized policy model, ContextCurator, with a powerful frozen foundation model, TaskExecutor. Trained via reinforcement learning, ContextCurator actively reduces information entropy in the working memory. It aggressively prunes environmental noise while preserving reasoning anchors, that is, sparse data points that are critical for future deductions. On WebArena, our framework improves the success rate of Gemini-3.0-flash from 36.4% to 41.2% while reducing token consumption by 8.8% (from 47.4K to 43.3K). On DeepSearch, it achieves a 57.1% success rate, compared with 53.9%, while reducing token consumption by a factor of 8. Remarkably, a 7B ContextCurator matches the context management performance of GPT-4o, providing a scalable and computationally efficient paradigm for autonomous long-horizon agents.
AIMay 9Code
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search SpacesXiaozhe Li, Jixuan Chen, Xinyu Fang et al.
Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning and tool use. However, the fundamental cognitive faculties essential for problem solving, including perception, reasoning, and memory, remain the stable core of intelligence. Unlike memorizing specific patterns, humans succeed in novel environments by applying these intrinsic faculties to adapt and optimize. Yet, whether LLMs possess this essential capacity, namely the ability to continuously refine solutions in response to dynamic environmental feedback, remains underexplored. To address this challenge, we introduce OPT-BENCH, a benchmark for evaluating self-improvement capabilities in large-scale search spaces. By combining 20 machine learning tasks with 10 classic NP-hard problems, OPT-BENCH provides a rigorous setting to assess whether agents can adapt through intrinsic self-reflection rather than rote tool application. We further propose OPT-Agent, a framework that emulates human-like cognitive adaptation. It operates through a general perception, memory, and reasoning loop, iteratively refining solutions based on environmental feedback. Through extensive experiments on 19 LLMs from 7 model families, including reasoning models, general models, and open-source models ranging from 3B to 235B parameters, we demonstrate that stronger models are more effective at leveraging feedback signals for self-improvement. However, this upper-bound adaptability remains fundamentally constrained by the models' base capacity, and even the most advanced LLMs still fall short of human expert performance.
LGMay 21
Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability MetricsLiangyu Li, Shengzhi Wang, Qingwen Liu
Latent world models can contain the state needed for control, yet their terminal-cost interface can expose the planner to the wrong decision-relevant information. In common latent MPC, candidate sequences are ranked by Euclidean distance between predicted terminal and goal latent states; this assumes that raw latent distance weights reachability-relevant variables correctly. We propose trajectory reachability metrics (TRM), a post-hoc terminal-ranking method for fixed latent world models. TRM trains a small pairwise head from logged trajectory structure and uses it as a replacement or hybrid cost; the encoder, dynamics, sampler, optimizer, and evaluation manifests remain fixed. The key design choice is horizon-aware supervision: the metric is trained on broad, balanced temporal separations to match the long-horizon terminal candidate ranking problem. On a hard TwoRoom benchmark, raw latent planning with LeWorldModel (LeWM) reaches 7.0% success, while full-horizon TRM reaches 97.0%; shuffled temporal-label controls stay at 0.0%. The same recipe improves a PLDM baseline from 32.7% to 84.0% across three seeds, and a short-horizon TRM variant reaches only 35.0% with the 100,000 pair budget. In TwoRoom, we provide mechanistic evidence for why TRM works: XY position is linearly decodable (R^2=0.998), yet raw latent MSE misranks candidates; the XY-probe rowspace accounts for less than 1% of terminal-goal latent MSE but carries most candidate-quality signal; and SCSA audits show that TRM improves the ordering and selected endpoint seen by the planner. On PushT go50/go75, TRM-style task-state metrics improve SCSA ranking and selected final distance more cleanly than closed-loop success, motivating auxiliary hybrid costs in continuous manipulation. TRM is the planner-facing repair, and audits explain when terminal reachability metrics should replace or augment raw latent proximity.
AIMay 19
Beyond Mode Collapse: Distribution Matching for Diverse ReasoningXiaozhe Li, Yang Li, Xinyu Fang et al.
On-policy reinforcement learning methods like GRPO suffer from mode collapse: they exhibit reduced solution diversity, concentrating probability mass on a single solution once discovered and ceasing exploration of alternative strategies. We show this stems from reverse KL minimization's mode-seeking behavior, which reinforces the first high-reward trajectory found rather than maintaining a distribution over multiple diverse solutions. We propose DMPO (Distribution-Matching Policy Optimization), which prevents mode collapse through principled approximation of forward KL minimization. DMPO constructs a group level target distribution over sampled trajectories proportional to their rewards, then aligns the policy distribution to this target. This provides mode-covering behavior without requiring sampling from the intractable global target distribution, enabling sustained exploration throughout training. We validate DMPO on NP-hard combinatorial optimization, where exponentially many feasible solutions exist but only a few approach optimality, an ideal testbed for evaluating exploration. DMPO achieves 43.9% Quality Ratio on text-based NP-Bench (vs. GRPO's 40.1%) and 43.1% on vision-based NP-Bench (vs. 38.4%), demonstrating 9% and 12% relative improvements respectively. These gains generalize to mathematical reasoning (+2.0%) and out-of-domain tasks (+2.3%), showing that diversity-preserving training enhances general reasoning capabilities across modalities. Our work establishes distribution matching as a practical, principled approach to preventing mode collapse in on-policy RL, with consistent quality improvements demonstrating sustained exploration across diverse reasoning tasks.
AIJun 12, 2025Code
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization ProblemsXiaozhe Li, Jixuan Chen, Xinyu Fang et al. · pku
Large Language Models (LLMs) have shown remarkable capabilities in solving diverse tasks. However, their proficiency in iteratively optimizing complex solutions through learning from previous feedback remains insufficiently explored. To bridge this gap, we present OPT-BENCH, a comprehensive benchmark designed to evaluate LLM agents on large-scale search space optimization problems. OPT-BENCH includes 20 real-world machine learning tasks sourced from Kaggle and 10 classical NP problems, offering a diverse and challenging environment for assessing LLM agents on iterative reasoning and solution refinement. To enable rigorous evaluation, we introduce OPT-Agent, an end-to-end optimization framework that emulates human reasoning when tackling complex problems by generating, validating, and iteratively improving solutions through leveraging historical feedback. Through extensive experiments on 9 state-of-the-art LLMs from 6 model families, we analyze the effects of optimization iterations, temperature settings, and model architectures on solution quality and convergence. Our results demonstrate that incorporating historical context significantly enhances optimization performance across both ML and NP tasks. All datasets, code, and evaluation tools are open-sourced to promote further research in advancing LLM-driven optimization and iterative reasoning. Project page: \href{https://github.com/OliverLeeXZ/OPT-BENCH}{https://github.com/OliverLeeXZ/OPT-BENCH}.
IRJun 24, 2019Code
Query-based Interactive Recommendation by Meta-Path and Adapted Attention-GRUYu Zhu, Yu Gong, Qingwen Liu et al.
Recently, interactive recommender systems are becoming increasingly popular. The insight is that, with the interaction between users and the system, (1) users can actively intervene the recommendation results rather than passively receive them, and (2) the system learns more about users so as to provide better recommendation. We focus on the single-round interaction, i.e. the system asks the user a question (Step 1), and exploits his feedback to generate better recommendation (Step 2). A novel query-based interactive recommender system is proposed in this paper, where \textbf{personalized questions are accurately generated from millions of automatically constructed questions} in Step 1, and \textbf{the recommendation is ensured to be closely-related to users' feedback} in Step 2. We achieve this by transforming Step 1 into a query recommendation task and Step 2 into a retrieval task. The former task is our key challenge. We firstly propose a model based on Meta-Path to efficiently retrieve hundreds of query candidates from the large query pool. Then an adapted Attention-GRU model is developed to effectively rank these candidates for recommendation. Offline and online experiments on Taobao, a large-scale e-commerce platform in China, verify the effectiveness of our interactive system. The system has already gone into production in the homepage of Taobao App since Nov. 11, 2018 (see https://v.qq.com/x/page/s0833tkp1uo.html on how it works online). Our code and dataset are public in https://github.com/zyody/QueryQR.
AIMay 9
Forge: Quality-Aware Reinforcement Learning for NP-Hard Optimization in LLMsXiaozhe Li, Xinyu Fang, Shengyuan Ding et al.
Large Language Models (LLMs) have achieved remarkable success on reasoning benchmarks through Reinforcement Learning with Verifiable Rewards (RLVR), excelling at tasks such as math, coding, logic, and puzzles. However, existing benchmarks evaluate only correctness, while overlooking optimality, namely the ability to find the best solutions under constraints. We propose OPT-BENCH, the first comprehensive framework for training and evaluating LLMs on NP-hard optimization problems through quality-aware RLVR. OPT-BENCH provides three key components: a scalable training infrastructure with instance generators, quality verifiers, and optimal baselines across 10 tasks; a rigorous benchmark with 1,000 instances evaluating both feasibility, measured by Success Rate, and quality, measured by Quality Ratio; and quality-aware rewards that enable continuous improvement beyond binary correctness. Training on Qwen2.5-7B-Instruct-1M with 15K examples achieves 93.1% SR and 46.6% QR, significantly outperforming GPT-4o, which achieves 29.6% SR and 14.6% QR. Beyond optimization, training on OPT-BENCH transfers to diverse tasks, including mathematics (+2.2%), logic (+1.2%), knowledge (+4.1%), and instruction following (+6.1%). Our analysis reveals that quality-aware rewards improve solutions by 28.8% over binary rewards, and that task diversity drives generalization more than data quantity, offering insights into RLVR scaling for complex reasoning.
IRMar 22
COINBench: Moving Beyond Individual Perspectives to Collective Intent UnderstandingXiaozhe Li, Tianyi Lyu, Siyi Yang et al.
Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs excel at following individual instructions, their ability to distill Collective Intent - the process of extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discussions - remains largely unexplored. To bridge this gap, we introduce COIN-BENCH, a dynamic, real-world, live-updating benchmark specifically designed to evaluate LLMs on collective intent understanding within the consumer domain. Unlike traditional benchmarks that focus on transactional outcomes, COIN-BENCH operationalizes intent as a hierarchical cognitive structure, ranging from explicit scenarios to deep causal reasoning. We implement a robust evaluation pipeline that combines a rule-based method with an LLM-as-the-Judge approach. This framework incorporates COIN-TREE for hierarchical cognitive structuring and retrieval-augmented verification (COIN-RAG) to ensure expert-level precision in analyzing raw, collective human discussions. An extensive evaluation of 20 state-of-the-art LLMs across four dimensions - depth, breadth, informativeness, and correctness - reveals that while current models can handle surface-level aggregation, they still struggle with the analytical depth required for complex intent synthesis. COIN-BENCH establishes a new standard for advancing LLMs from passive instruction followers to expert-level analytical agents capable of deciphering the collective voice of the real world. See our project page on COIN-BENCH.
AIDec 24, 2024
SlimGPT: Layer-wise Structured Pruning for Large Language ModelsGui Ling, Ziyang Wang, Yuliang Yan et al.
Large language models (LLMs) have garnered significant attention for their remarkable capabilities across various domains, whose vast parameter scales present challenges for practical deployment. Structured pruning is an effective method to balance model performance with efficiency, but performance restoration under computational resource constraints is a principal challenge in pruning LLMs. Therefore, we present a low-cost and fast structured pruning method for LLMs named SlimGPT based on the Optimal Brain Surgeon framework. We propose Batched Greedy Pruning for rapid and near-optimal pruning, which enhances the accuracy of head-wise pruning error estimation through grouped Cholesky decomposition and improves the pruning efficiency of FFN via Dynamic Group Size, thereby achieving approximate local optimal pruning results within one hour. Besides, we explore the limitations of layer-wise pruning from the perspective of error accumulation and propose Incremental Pruning Ratio, a non-uniform pruning strategy to reduce performance degradation. Experimental results on the LLaMA benchmark show that SlimGPT outperforms other methods and achieves state-of-the-art results.
CVNov 30, 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-trainingHaicheng Wang, Chen Ju, Weixiong Lin et al.
In rapidly evolving field of vision-language models (VLMs), contrastive language-image pre-training (CLIP) has made significant strides, becoming foundation for various downstream tasks. However, relying on one-to-one (image, text) contrastive paradigm to learn alignment from large-scale messy web data, CLIP faces a serious myopic dilemma, resulting in biases towards monotonous short texts and shallow visual expressivity. To overcome these issues, this paper advances CLIP into one novel holistic paradigm, by updating both diverse data and alignment optimization. To obtain colorful data with low cost, we use image-to-text captioning to generate multi-texts for each image, from multiple perspectives, granularities, and hierarchies. Two gadgets are proposed to encourage textual diversity. To match such (image, multi-texts) pairs, we modify the CLIP image encoder into multi-branch, and propose multi-to-multi contrastive optimization for image-text part-to-part matching. As a result, diverse visual embeddings are learned for each image, bringing good interpretability and generalization. Extensive experiments and ablations across over ten benchmarks indicate that our holistic CLIP significantly outperforms existing myopic CLIP, including image-text retrieval, open-vocabulary classification, and dense visual tasks.
LGMar 18, 2025
Squeeze Out Tokens from Sample for Finer-Grained Data GovernanceWeixiong Lin, Chen Ju, Haicheng Wang et al.
Widely observed data scaling laws, in which error falls off as a power of the training size, demonstrate the diminishing returns of unselective data expansion. Hence, data governance is proposed to downsize datasets through pruning non-informative samples. Yet, isolating the impact of a specific sample on overall model performance is challenging, due to the vast computation required for tryout all sample combinations. Current data governors circumvent this complexity by estimating sample contributions through heuristic-derived scalar scores, thereby discarding low-value ones. Despite thorough sample sieving, retained samples contain substantial undesired tokens intrinsically, underscoring the potential for further compression and purification. In this work, we upgrade data governance from a 'sieving' approach to a 'juicing' one. Instead of scanning for least-flawed samples, our dual-branch DataJuicer applies finer-grained intra-sample governance. It squeezes out informative tokens and boosts image-text alignments. Specifically, the vision branch retains salient image patches and extracts relevant object classes, while the text branch incorporates these classes to enhance captions. Consequently, DataJuicer yields more refined datasets through finer-grained governance. Extensive experiments across datasets demonstrate that DataJuicer significantly outperforms existing DataSieve in image-text retrieval, classification, and dense visual reasoning.
LGDec 16, 2024
No More Tuning: Prioritized Multi-Task Learning with Lagrangian Differential Multiplier MethodsZhengxing Cheng, Yuheng Huang, Zhixuan Zhang et al.
Given the ubiquity of multi-task in practical systems, Multi-Task Learning (MTL) has found widespread application across diverse domains. In real-world scenarios, these tasks often have different priorities. For instance, In web search, relevance is often prioritized over other metrics, such as click-through rates or user engagement. Existing frameworks pay insufficient attention to the prioritization among different tasks, which typically adjust task-specific loss function weights to differentiate task priorities. However, this approach encounters challenges as the number of tasks grows, leading to exponential increases in hyper-parameter tuning complexity. Furthermore, the simultaneous optimization of multiple objectives can negatively impact the performance of high-priority tasks due to interference from lower-priority tasks. In this paper, we introduce a novel multi-task learning framework employing Lagrangian Differential Multiplier Methods for step-wise multi-task optimization. It is designed to boost the performance of high-priority tasks without interference from other tasks. Its primary advantage lies in its ability to automatically optimize multiple objectives without requiring balancing hyper-parameters for different tasks, thereby eliminating the need for manual tuning. Additionally, we provide theoretical analysis demonstrating that our method ensures optimization guarantees, enhancing the reliability of the process. We demonstrate its effectiveness through experiments on multiple public datasets and its application in Taobao search, a large-scale industrial search ranking system, resulting in significant improvements across various business metrics.
CVDec 16, 2024
Can video generation replace cinematographers? Research on the cinematic language of generated videoXiaozhe Li, Kai WU, Siyi Yang et al.
Recent advancements in text-to-video (T2V) generation have leveraged diffusion models to enhance visual coherence in videos synthesized from textual descriptions. However, existing research primarily focuses on object motion, often overlooking cinematic language, which is crucial for conveying emotion and narrative pacing in cinematography. To address this, we propose a threefold approach to improve cinematic control in T2V models. First, we introduce a meticulously annotated cinematic language dataset with twenty subcategories, covering shot framing, shot angles, and camera movements, enabling models to learn diverse cinematic styles. Second, we present CameraDiff, which employs LoRA for precise and stable cinematic control, ensuring flexible shot generation. Third, we propose CameraCLIP, designed to evaluate cinematic alignment and guide multi-shot composition. Building on CameraCLIP, we introduce CLIPLoRA, a CLIP-guided dynamic LoRA composition method that adaptively fuses multiple pre-trained cinematic LoRAs, enabling smooth transitions and seamless style blending. Experimental results demonstrate that CameraDiff ensures stable and precise cinematic control, CameraCLIP achieves an R@1 score of 0.83, and CLIPLoRA significantly enhances multi-shot composition within a single video, bridging the gap between automated video generation and professional cinematography.\textsuperscript{1}
AIOct 18, 2025
NP-Engine: Empowering Optimization Reasoning in Large Language Models with Verifiable Synthetic NP ProblemsXiaozhe Li, Xinyu Fang, Shengyuan Ding et al. · pku
Large Language Models (LLMs) have shown strong reasoning capabilities, with models like OpenAI's O-series and DeepSeek R1 excelling at tasks such as mathematics, coding, logic, and puzzles through Reinforcement Learning with Verifiable Rewards (RLVR). However, their ability to solve more complex optimization problems - particularly NP-hard tasks - remains underexplored. To bridge this gap, we propose NP-ENGINE, the first comprehensive framework for training and evaluating LLMs on NP-hard problems. NP-ENGINE covers 10 tasks across five domains, each equipped with (i) a controllable instance generator, (ii) a rule-based verifier, and (iii) a heuristic solver that provides approximate optimal solutions as ground truth. This generator-verifier-heuristic pipeline enables scalable and verifiable RLVR training under hierarchical difficulties. We also introduce NP-BENCH, a benchmark derived from NP-ENGINE-DATA, specifically designed to evaluate LLMs' ability to tackle NP-hard level reasoning problems, focusing not only on feasibility but also on solution quality. Additionally, we present QWEN2.5-7B-NP, a model trained via zero-RLVR with curriculum learning on Qwen2.5-7B-Instruct, which significantly outperforms GPT-4o on NP-BENCH and achieves SOTA performance with the same model size. Beyond in-domain tasks, we demonstrate that RLVR training on NP-ENGINE-DATA enables strong out-of-domain (OOD) generalization to reasoning tasks (logic, puzzles, math, and knowledge), as well as non-reasoning tasks such as instruction following. We also observe a scaling trend: increasing task diversity improves OOD generalization. These findings suggest that task-rich RLVR training is a promising direction for advancing LLM's reasoning ability, revealing new insights into the scaling laws of RLVR.
CLOct 15, 2025
ConsintBench: Evaluating Language Models on Real-World Consumer Intent UnderstandingXiaozhe Li, TianYi Lyu, Siyi Yang et al.
Understanding human intent is a complex, high-level task for large language models (LLMs), requiring analytical reasoning, contextual interpretation, dynamic information aggregation, and decision-making under uncertainty. Real-world public discussions, such as consumer product discussions, are rarely linear or involve a single user. Instead, they are characterized by interwoven and often conflicting perspectives, divergent concerns, goals, emotional tendencies, as well as implicit assumptions and background knowledge about usage scenarios. To accurately understand such explicit public intent, an LLM must go beyond parsing individual sentences; it must integrate multi-source signals, reason over inconsistencies, and adapt to evolving discourse, similar to how experts in fields like politics, economics, or finance approach complex, uncertain environments. Despite the importance of this capability, no large-scale benchmark currently exists for evaluating LLMs on real-world human intent understanding, primarily due to the challenges of collecting real-world public discussion data and constructing a robust evaluation pipeline. To bridge this gap, we introduce \bench, the first dynamic, live evaluation benchmark specifically designed for intent understanding, particularly in the consumer domain. \bench is the largest and most diverse benchmark of its kind, supporting real-time updates while preventing data contamination through an automated curation pipeline.
IRFeb 21, 2022
GIFT: Graph-guIded Feature Transfer for Cold-Start Video Click-Through Rate PredictionSihao Hu, Yi Cao, Yu Gong et al.
Short video has witnessed rapid growth in the past few years in e-commerce platforms like Taobao. To ensure the freshness of the content, platforms need to release a large number of new videos every day, making conventional click-through rate (CTR) prediction methods suffer from the item cold-start problem. In this paper, we propose GIFT, an efficient Graph-guIded Feature Transfer system, to fully take advantages of the rich information of warmed-up videos to compensate for the cold-start ones. Specifically, we establish a heterogeneous graph that contains physical and semantic linkages to guide the feature transfer process from warmed-up video to cold-start videos. The physical linkages represent explicit relationships, while the semantic linkages measure the proximity of multi-modal representations of two videos. We elaborately design the feature transfer function to make aware of different types of transferred features (e.g., id representations and historical statistics) from different metapaths on the graph. We conduct extensive experiments on a large real-world dataset, and the results show that our GIFT system outperforms SOTA methods significantly and brings a 6.82% lift on CTR in the homepage of Taobao App.
CVJul 11, 2021
A Cloud-Edge-Terminal Collaborative System for Temperature Measurement in COVID-19 PreventionZheyi Ma, Hao Li, Wen Fang et al.
To prevent the spread of coronavirus disease 2019 (COVID-19), preliminary temperature measurement and mask detection in public areas are conducted. However, the existing temperature measurement methods face the problems of safety and deployment. In this paper, to realize safe and accurate temperature measurement even when a person's face is partially obscured, we propose a cloud-edge-terminal collaborative system with a lightweight infrared temperature measurement model. A binocular camera with an RGB lens and a thermal lens is utilized to simultaneously capture image pairs. Then, a mobile detection model based on a multi-task cascaded convolutional network (MTCNN) is proposed to realize face alignment and mask detection on the RGB images. For accurate temperature measurement, we transform the facial landmarks on the RGB images to the thermal images by an affine transformation and select a more accurate temperature measurement area on the forehead. The collected information is uploaded to the cloud in real time for COVID-19 prevention. Experiments show that the detection model is only 6.1M and the average detection speed is 257ms. At a distance of 1m, the error of indoor temperature measurement is about 3%. That is, the proposed system can realize real-time temperature measurement in public areas.
IRApr 2, 2021
GRN: Generative Rerank Network for Context-wise RecommendationYufei Feng, Binbin Hu, Yu Gong et al.
Reranking is attracting incremental attention in the recommender systems, which rearranges the input ranking list into the final rank-ing list to better meet user demands. Most existing methods greedily rerank candidates through the rating scores from point-wise or list-wise models. Despite effectiveness, neglecting the mutual influence between each item and its contexts in the final ranking list often makes the greedy strategy based reranking methods sub-optimal. In this work, we propose a new context-wise reranking framework named Generative Rerank Network (GRN). Specifically, we first design the evaluator, which applies Bi-LSTM and self-attention mechanism to model the contextual information in the labeled final ranking list and predict the interaction probability of each item more precisely. Afterwards, we elaborate on the generator, equipped with GRU, attention mechanism and pointer network to select the item from the input ranking list step by step. Finally, we apply cross-entropy loss to train the evaluator and, subsequently, policy gradient to optimize the generator under the guidance of the evaluator. Empirical results show that GRN consistently and significantly outperforms state-of-the-art point-wise and list-wise methods. Moreover, GRN has achieved a performance improvement of 5.2% on PV and 6.1% on IPV metric after the successful deployment in one popular recommendation scenario of Taobao application.
IRDec 22, 2020
Personalized Adaptive Meta Learning for Cold-start User Preference PredictionRunsheng Yu, Yu Gong, Xu He et al.
A common challenge in personalized user preference prediction is the cold-start problem. Due to the lack of user-item interactions, directly learning from the new users' log data causes serious over-fitting problem. Recently, many existing studies regard the cold-start personalized preference prediction as a few-shot learning problem, where each user is the task and recommended items are the classes, and the gradient-based meta learning method (MAML) is leveraged to address this challenge. However, in real-world application, the users are not uniformly distributed (i.e., different users may have different browsing history, recommended items, and user profiles. We define the major users as the users in the groups with large numbers of users sharing similar user information, and other users are the minor users), existing MAML approaches tend to fit the major users and ignore the minor users. To address this cold-start task-overfitting problem, we propose a novel personalized adaptive meta learning approach to consider both the major and the minor users with three key contributions: 1) We are the first to present a personalized adaptive learning rate meta-learning approach to improve the performance of MAML by focusing on both the major and minor users. 2) To provide better personalized learning rates for each user, we introduce a similarity-based method to find similar users as a reference and a tree-based method to store users' features for fast search. 3) To reduce the memory usage, we design a memory agnostic regularizer to further reduce the space complexity to constant while maintain the performance. Experiments on MovieLens, BookCrossing, and real-world production datasets reveal that our method outperforms the state-of-the-art methods dramatically for both the minor and major users.
IRAug 13, 2020
MTBRN: Multiplex Target-Behavior Relation Enhanced Network for Click-Through Rate PredictionYufei Feng, Fuyu Lv, Binbin Hu et al.
Click-through rate (CTR) prediction is a critical task for many industrial systems, such as display advertising and recommender systems. Recently, modeling user behavior sequences attracts much attention and shows great improvements in the CTR field. Existing works mainly exploit attention mechanism based on embedding product when considering relations between user behaviors and target item. However, this methodology lacks of concrete semantics and overlooks the underlying reasons driving a user to click on a target item. In this paper, we propose a new framework named Multiplex Target-Behavior Relation enhanced Network (MTBRN) to leverage multiplex relations between user behaviors and target item to enhance CTR prediction. Multiplex relations consist of meaningful semantics, which can bring a better understanding on users' interests from different perspectives. To explore and model multiplex relations, we propose to incorporate various graphs (e.g., knowledge graph and item-item similarity graph) to construct multiple relational paths between user behaviors and target item. Then Bi-LSTM is applied to encode each path in the path extractor layer. A path fusion network and a path activation network are devised to adaptively aggregate and finally learn the representation of all paths for CTR prediction. Extensive offline and online experiments clearly verify the effectiveness of our framework.
IRMay 25, 2020
ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective RecommendationYufei Feng, Binbin Hu, Fuyu Lv et al.
Recommender system (RS) devotes to predicting user preference to a given item and has been widely deployed in most web-scale applications. Recently, knowledge graph (KG) attracts much attention in RS due to its abundant connective information. Existing methods either explore independent meta-paths for user-item pairs over KG, or employ graph neural network (GNN) on whole KG to produce representations for users and items separately. Despite effectiveness, the former type of methods fails to fully capture structural information implied in KG, while the latter ignores the mutual effect between target user and item during the embedding propagation. In this work, we propose a new framework named Adaptive Target-Behavior Relational Graph network (ATBRG for short) to effectively capture structural relations of target user-item pairs over KG. Specifically, to associate the given target item with user behaviors over KG, we propose the graph connect and graph prune techniques to construct adaptive target-behavior relational graph. To fully distill structural information from the sub-graph connected by rich relations in an end-to-end fashion, we elaborate on the model design of ATBRG, equipped with relation-aware extractor layer and representation activation layer. We perform extensive experiments on both industrial and benchmark datasets. Empirical results show that ATBRG consistently and significantly outperforms state-of-the-art methods. Moreover, ATBRG has also achieved a performance improvement of 5.1% on CTR metric after successful deployment in one popular recommendation scenario of Taobao APP.
IRMay 18, 2020
EdgeRec: Recommender System on Edge in Mobile TaobaoYu Gong, Ziwen Jiang, Yufei Feng et al.
Recommender system (RS) has become a crucial module in most web-scale applications. Recently, most RSs are in the waterfall form based on the cloud-to-edge framework, where recommended results are transmitted to edge (e.g., user mobile) by computing in advance in the cloud server. Despite effectiveness, network bandwidth and latency between cloud server and edge may cause the delay for system feedback and user perception. Hence, real-time computing on edge could help capture user preferences more preciously and thus make more satisfactory recommendations. Our work, to our best knowledge, is the first attempt to design and implement the novel Recommender System on Edge (EdgeRec), which achieves Real-time User Perception and Real-time System Feedback. Moreover, we propose Heterogeneous User Behavior Sequence Modeling and Context-aware Reranking with Behavior Attention Networks to capture user's diverse interests and adjust recommendation results accordingly. Experimental results on both the offline evaluation and online performance in Taobao home-page feeds demonstrate the effectiveness of EdgeRec.
CVApr 19, 2020
Lightweight Mask R-CNN for Long-Range Wireless Power Transfer SystemsHao Li, Aozhou Wu, Wen Fang et al.
Resonant Beam Charging (RBC) is a wireless charging technology which supports multi-watt power transfer over meter-level distance. The features of safety, mobility and simultaneous charging capability enable RBC to charge multiple mobile devices safely at the same time. To detect the devices that need to be charged, a Mask R-CNN based dection model is proposed in previous work. However, considering the constraints of the RBC system, it's not easy to apply Mask R-CNN in lightweight hardware-embedded devices because of its heavy model and huge computation. Thus, we propose a machine learning detection approach which provides a lighter and faster model based on traditional Mask R-CNN. The proposed approach makes the object detection much easier to be transplanted on mobile devices and reduce the burden of hardware computation. By adjusting the structure of the backbone and the head part of Mask R-CNN, we reduce the average detection time from $1.02\mbox{s}$ per image to $0.6132\mbox{s}$, and reduce the model size from $245\mbox{MB}$ to $47.1\mbox{MB}$. The improved model is much more suitable for the application in the RBC system.
IRMay 17, 2019
Exact-K Recommendation via Maximal Clique OptimizationYu Gong, Yu Zhu, Lu Duan et al.
This paper targets to a novel but practical recommendation problem named exact-K recommendation. It is different from traditional top-K recommendation, as it focuses more on (constrained) combinatorial optimization which will optimize to recommend a whole set of K items called card, rather than ranking optimization which assumes that "better" items should be put into top positions. Thus we take the first step to give a formal problem definition, and innovatively reduce it to Maximum Clique Optimization based on graph. To tackle this specific combinatorial optimization problem which is NP-hard, we propose Graph Attention Networks (GAttN) with a Multi-head Self-attention encoder and a decoder with attention mechanism. It can end-to-end learn the joint distribution of the K items and generate an optimal card rather than rank individual items by prediction scores. Then we propose Reinforcement Learning from Demonstrations (RLfD) which combines the advantages in behavior cloning and reinforcement learning, making it sufficient- and-efficient to train the model. Extensive experiments on three datasets demonstrate the effectiveness of our proposed GAttN with RLfD method, it outperforms several strong baselines with a relative improvement of 7.7% and 4.7% on average in Precision and Hit Ratio respectively, and achieves state-of-the-art (SOTA) performance for the exact-K recommendation problem.
SYSep 25, 2018
Adaptive Resonant Beam Charging for Intelligent Wireless Power TransferQingqing Zhang, Wen Fang, Mingliang Xiong et al.
As a long-range high-power wireless power transfer (WPT) technology, resonant beam charging (RBC) can transmit Watt-level power over long distance for the devices in the internet of things (IoT). Due to its open-loop architecture, RBC faces the challenge of providing dynamic current and voltage to optimize battery charging performance. In RBC, battery overcharge may cause energy waste, thermal effects, and even safety issues. On the other hand, battery undercharge may lead to charging time extension and significant battery capacity reduction. In this paper, we present an adaptive resonant beam charging (ARBC) system for battery charging optimization. Based on RBC, ARBC uses a feedback system to control the supplied power dynamically according to the battery preferred charging values. Moreover, in order to transform the received current and voltage to match the battery preferred charging values, ARBC adopts a direct current to direct current (DC-DC) conversion circuit. Relying on the analytical models for RBC power transmission, we obtain the end-to-end power transfer relationship in the approximate linear closed-form of ARBC. Thus, the battery preferred charging power at the receiver can be mapped to the supplied power at the transmitter for feedback control. Numerical evaluation demonstrates that ARBC can save 61% battery charging energy and 53%-60% supplied energy compared with RBC. Furthermore, ARBC has high energy-saving gain over RBC when the WPT is unefficient. ARBC in WPT is similar to link adaption in wireless communications. Both of them play the important roles in their respective areas.
SYSep 25, 2018
Optimal Resonant Beam Charging for Electronic Vehicles in Internet of Intelligent VehiclesQingqing Zhang, Mingqing Liu, Xing Lin et al.
To enable electric vehicles (EVs) to access to the internet of intelligent vehicles (IoIV), charging EVs wirelessly anytime and anywhere becomes an urgent need. The resonant beam charging (RBC) technology can provide high-power and long-range wireless energy for EVs. However, the RBC system is unefficient. To improve the RBC power transmission efficiency, the adaptive resonant beam charging (ARBC) technology was introduced. In this paper, after analyzing the modular model of the ARBC system, we obtain the closed-form formula of the end-to-end power transmission efficiency. Then, we prove that the optimal power transmission efficiency uniquely exists. Moreover, we analyze the relationships among the optimal power transmission efficiency, the source power, the output power, and the beam transmission efficiency, which provide the guidelines for the optimal ARBC system design and implementation. Hence, perpetual energy can be supplied to EVs in IoIV virtually.
SYSep 25, 2018
Fair Scheduling in Resonant Beam Charging for IoT DevicesWen Fang, Qingqing Zhang, Qingwen Liu et al.
Resonant Beam Charging (RBC) is the Wireless Power Transfer (WPT) technology, which can provide high-power, long-distance, mobile, and safe wireless charging for Internet of Things (IoT) devices. Supporting multiple IoT devices charging simultaneously is a significant feature of the RBC system. To optimize the multi-user charging performance, the transmitting power should be scheduled for charging all IoT devices simultaneously. In order to keep all IoT devices working as long as possible for fairness, we propose the First Access First Charge (FAFC) scheduling algorithm. Then, we formulate the scheduling parameters quantitatively for algorithm implementation. Finally, we analyze the performance of FAFC scheduling algorithm considering the impacts of the receiver number, the transmitting power and the charging time. Based on the analysis, we summarize the methods of improving the WPT performance for multiple IoT devices, which include limiting the receiver number, increasing the transmitting power, prolonging the charging time and improving the single-user's charging efficiency. The FAFC scheduling algorithm design and analysis provide a fair WPT solution for the multi-user RBC system.
SYSep 25, 2018
Earning Maximization with Quality of Charging Service Guarantee for IoT DevicesWen Fang, Qingqing Zhang, Mingqing Liu et al.
Resonant Beam Charging (RBC) is a promising Wireless Power Transfer (WPT) technology to provide long-range, high-power, mobile and safe wireless power for the Internet of Things (IoT) devices. The Point-to-Multipoint (PtMP) RBC system can charge multiple receivers simultaneously similar to WiFi communications. To guarantee the Quality of Charging Service (QoCS) for each receiver and maximize the overall earning in the PtMP RBC service, we specify the Charging Pricing Strategy (CPS) and develop the High Priority Charge (HPC) scheduling algorithm to control the charging order and power allocation. Each receiver is assigned a priority, which is updated dynamically based on its State of Charging (SOC) and specified charging power. The receivers with high priorities are scheduled to be charged in each time slot. We present the pseudo code of the HPC algorithm based on quantifying the receiver's SOC, discharging energy and various relevant parameters. Relying on simulation analysis, we demonstrate that the HPC algorithm can achieve better QoCS and earning than the Round-Robin Charge (RRC) scheduling algorithm. Based on the performance evaluation, we illustrate that the methods to improve the PtMP RBC service are: 1) limiting the receiver number within a reasonable range and 2) prolonging the charging duration as long as possible. In summary, the HPC scheduling algorithm provides a practical strategy to maximize the earning of the PtMP RBC service with each receiver's QoCS guarantee.