Ying Qi

AI
h-index5
3papers
6citations
Novelty53%
AI Score44

3 Papers

LGJan 29
SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation

Yu Xie, Xing Kai Ren, Ying Qi et al.

Reinforcement learning-based preference optimization is increasingly used to align list-wise generative recommenders with complex, multi-objective user feedback, yet existing optimizers such as Gradient-Bounded Policy Optimization (GBPO) exhibit structural limitations in recommendation settings. We identify a Symmetric Conservatism failure mode in which symmetric update bounds suppress learning from rare positive signals (e.g., cold-start items), static negative-sample constraints fail to prevent diversity collapse under rejection-dominated feedback, and group-normalized multi-objective rewards lead to low-resolution training signals. To address these issues, we propose SAGE (Sequence-level Adaptive Gradient Evolution), a unified optimizer designed for list-wise generative recommendation. SAGE introduces sequence-level signal alignment via a geometric-mean importance ratio and a decoupled multi-objective advantage estimator to reduce token-level variance and mitigate reward collapse, together with asymmetric adaptive bounding that applies positive Boost updates to successful slates and an entropy-aware penalty to discourage low-diversity failures. Experiments on Amazon Product Reviews and the large-scale RecIF-Bench demonstrate consistent improvements in top-K accuracy, cold-start recall, and diversity across both Semantic-ID and native-text action spaces, while preserving numerical stability during training. These results suggest that asymmetric, sequence-aware policy optimization provides a principled and effective framework for addressing optimization failures in generative recommendation.

AIJun 24, 2025
RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1

Yu Xie, Xingkai Ren, Ying Qi et al.

Traditional recommendation systems often grapple with "filter bubbles", underutilization of external knowledge, and a disconnect between model optimization and business policy iteration. To address these limitations, this paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs) and drawing inspiration from the DeepSeek R1 methodology. The framework initiates by transforming user profiles, historical interactions, and multi-faceted item attributes into LLM-interpretable natural language prompts through a carefully engineered data construction process. Subsequently, a two-stage training paradigm is employed: the initial stage involves Supervised Fine-Tuning (SFT) to imbue the LLM with fundamental recommendation capabilities. The subsequent stage utilizes Group Relative Policy Optimization (GRPO), a reinforcement learning technique, augmented with a Chain-of-Thought (CoT) mechanism. This stage guides the model through multi-step reasoning and holistic decision-making via a flexibly defined reward function, aiming to concurrently optimize recommendation accuracy, diversity, and other bespoke business objectives. Empirical evaluations on a real-world user behavior dataset from a large-scale social media platform demonstrate that RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty. It effectively mitigates the filter bubble effect and presents a promising avenue for the integrated optimization of recommendation models and policies under intricate business goals.

42.2HCMar 30
From Passive Feeds to Guided Discovery: AI-Initiated Interaction for Vague Intent in Content Exploration

Yu Xie, Ying Qi

Recommendation feeds work well when people are simply browsing, and search works well when they can formulate a query. Between these two cases is a common but poorly supported state: users feel that their feed has become repetitive, yet cannot clearly specify what they want instead. We refer to this state as vague intent. We present Red-Rec, an AI-supported exploration interface for this middle ground. After a period of browsing, the system summarizes patterns in the current feed (e.g., dominant content categories and possible latent interests), offers clickable exploration options, asks at most one follow-up question, and then gradually blends new content into the feed. The design is motivated by a formative study which found that users often recognize feed staleness but struggle to articulate alternatives, suggesting the need for proactive and low-effort interaction.We evaluated Red-Rec in a mixed-design lab study against three comparison conditions: a passive feed, search, and a user-initiated chat interface. Compared with user-initiated chat, Red-Rec led to broader exploration, higher serendipity ratings, and lower interaction effort. Participants in the AI-initiated condition typed very little , relying mainly on option selection, whereas participants in the user-initiated chat condition typed substantially more . We discuss how proactive, option-based AI support can help users move beyond repetitive feeds without undermining their sense of control, and we outline design implications for recommendation interfaces that support open-ended exploration.