Jizhi Zhang

IR
h-index28
15papers
1,352citations
Novelty51%
AI Score57

15 Papers

IRMay 30
Trustworthy Recommendation in the Era of Large Language Models: Opportunities and Challenges

Bohao Wang, Yu Cui, Zhenxiang Xu et al.

The field of recommender systems (RS) is currently undergoing two profound paradigm shifts. From the perspective of objectives, the goal has shifted beyond mere recommendation accuracy to comprehensive trustworthiness, encompassing multiple dimensions such as robustness, fairness, and privacy preservation. From a technical perspective, Large Language Models (LLMs) have been extensively integrated into RS, reshaping the foundations of recommendation through richer semantic understanding, stronger intent reasoning, and more flexible user interactions. The convergence of these two shifts prompts a timely and pivotal question: how does the integration of LLMs reshape the landscape of trustworthy recommendation? In this work, we present a systematic review of trustworthy LLM-empowered recommendation. By comprehensively analyzing over 200 recent studies, we reveal that the introduction of LLMs acts as a double-edged sword. While their advanced mechanisms and user-friendly interfaces offer unprecedented opportunities to enhance trustworthiness, they simultaneously introduce new risks, such as novel forms of bias and hallucination-induced issues. To characterize this dual impact, we systematically identify 13 opportunities and 18 challenges across six fundamental dimensions of trustworthiness, and accordingly organize the existing literature into a novel taxonomy. We also provide a comprehensive review of commonly used datasets and evaluation metrics to facilitate empirical validation. Finally, we identify critical open challenges and outline future directions, hoping to inspire future research on this emerging topic.

IRFeb 7, 2023
On the Theories Behind Hard Negative Sampling for Recommendation

Wentao Shi, Jiawei Chen, Fuli Feng et al.

Negative sampling has been heavily used to train recommender models on large-scale data, wherein sampling hard examples usually not only accelerates the convergence but also improves the model accuracy. Nevertheless, the reasons for the effectiveness of Hard Negative Sampling (HNS) have not been revealed yet. In this work, we fill the research gap by conducting thorough theoretical analyses on HNS. Firstly, we prove that employing HNS on the Bayesian Personalized Ranking (BPR) learner is equivalent to optimizing One-way Partial AUC (OPAUC). Concretely, the BPR equipped with Dynamic Negative Sampling (DNS) is an exact estimator, while with softmax-based sampling is a soft estimator. Secondly, we prove that OPAUC has a stronger connection with Top-K evaluation metrics than AUC and verify it with simulation experiments. These analyses establish the theoretical foundation of HNS in optimizing Top-K recommendation performance for the first time. On these bases, we offer two insightful guidelines for effective usage of HNS: 1) the sampling hardness should be controllable, e.g., via pre-defined hyper-parameters, to adapt to different Top-K metrics and datasets; 2) the smaller the $K$ we emphasize in Top-K evaluation metrics, the harder the negative samples we should draw. Extensive experiments on three real-world benchmarks verify the two guidelines.

IROct 25, 2023
Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

Chengpeng Li, Zhengyi Yang, Jizhi Zhang et al.

Reinforcement learning (RL) has been widely applied in recommendation systems due to its potential in optimizing the long-term engagement of users. From the perspective of RL, recommendation can be formulated as a Markov decision process (MDP), where recommendation system (agent) can interact with users (environment) and acquire feedback (reward signals).However, it is impractical to conduct online interactions with the concern on user experience and implementation complexity, and we can only train RL recommenders with offline datasets containing limited reward signals and state transitions. Therefore, the data sparsity issue of reward signals and state transitions is very severe, while it has long been overlooked by existing RL recommenders.Worse still, RL methods learn through the trial-and-error mode, but negative feedback cannot be obtained in implicit feedback recommendation tasks, which aggravates the overestimation problem of offline RL recommender. To address these challenges, we propose a novel RL recommender named model-enhanced contrastive reinforcement learning (MCRL). On the one hand, we learn a value function to estimate the long-term engagement of users, together with a conservative value learning mechanism to alleviate the overestimation problem.On the other hand, we construct some positive and negative state-action pairs to model the reward function and state transition function with contrastive learning to exploit the internal structure information of MDP. Experiments demonstrate that the proposed method significantly outperforms existing offline RL and self-supervised RL methods with different representative backbone networks on two real-world datasets.

CVMay 22
SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

Xinyu Chen, Yuyi Qian, Jiang Lin et al.

Video object insertion requires ensuring spatio-temporal coherence and interactive realism, extending far beyond simple content placement. However, current approaches are often hindered by a reliance on explicit motion engineering or resource-intensive retraining, restricting their flexibility and generalization. To bridge this gap, we present \textit{SimInsert}, a training-free paradigm that efficiently decouples the task into intuitive single-frame editing and semantic motion description. By harnessing the robust generative priors of image-to-video diffusion models, SimInsert propagates edits temporally, strictly preserving background invariance while enabling plausible, text-driven interactions between the inserted object and the dynamic environment. Our approach hinges on non-invasive guidance mechanisms that enforce structural consistency, facilitate seamless boundary fusion, and counteract the fidelity drift that typically accumulates during the denoising trajectory. Extensive quantitative experiments validate our efficacy: SimInsert surpasses state-of-the-art methods with an 18.8\% gain in PSNR, 20.1\% in SSIM, and a 44.1\% decrease in LPIPS, offering a streamlined solution for high-fidelity video editing.

IRFeb 29, 2024Code
Large Language Models are Learnable Planners for Long-Term Recommendation

Wentao Shi, Xiangnan He, Yang Zhang et al.

Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning (RL) to learn planning capacity by maximizing cumulative reward for long-term recommendation. However, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch, resulting in sub-optimal performance. In this light, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key to achieving the target lies in formulating a guidance plan following principles of enhancing long-term engagement and grounding the plan to effective and executable actions in a personalized manner. To this end, we propose a Bi-level Learnable LLM Planner framework, which consists of a set of LLM instances and breaks down the learning process into macro-learning and micro-learning to learn macro-level guidance and micro-level personalized recommendation policies, respectively. Extensive experiments validate that the framework facilitates the planning ability of LLMs for long-term recommendation. Our code and data can be found at https://github.com/jizhi-zhang/BiLLP.

IROct 30, 2024Code
Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation

Yang Zhang, Juntao You, Yimeng Bai et al.

Recent advancements in recommender systems have focused on leveraging Large Language Models (LLMs) to improve user preference modeling, yielding promising outcomes. However, current LLM-based approaches struggle to fully leverage user behavior sequences, resulting in suboptimal preference modeling for personalized recommendations. In this study, we propose a novel Counterfactual Fine-Tuning (CFT) method to address this issue by explicitly emphasizing the role of behavior sequences when generating recommendations. Specifically, we employ counterfactual reasoning to identify the causal effects of behavior sequences on model output and introduce a task that directly fits the ground-truth labels based on these effects, achieving the goal of explicit emphasis. Additionally, we develop a token-level weighting mechanism to adjust the emphasis strength for different item tokens, reflecting the diminishing influence of behavior sequences from earlier to later tokens during predicting an item. Extensive experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling. Our codes are available at https://github.com/itsmeyjt/CFT.

IROct 26, 2024Code
Agentic Feedback Loop Modeling Improves Recommendation and User Simulation

Shihao Cai, Jizhi Zhang, Keqin Bao et al.

Large language model-based agents are increasingly applied in the recommendation field due to their extensive knowledge and strong planning capabilities. While prior research has primarily focused on enhancing either the recommendation agent or the user agent individually, the collaborative interaction between the two has often been overlooked. Towards this research gap, we propose a novel framework that emphasizes the feedback loop process to facilitate the collaboration between the recommendation agent and the user agent. Specifically, the recommendation agent refines its understanding of user preferences by analyzing the feedback from the user agent on the item recommendation. Conversely, the user agent further identifies potential user interests based on the items and recommendation reasons provided by the recommendation agent. This iterative process enhances the ability of both agents to infer user behaviors, enabling more effective item recommendations and more accurate user simulations. Extensive experiments on three datasets demonstrate the effectiveness of the agentic feedback loop: the agentic feedback loop yields an average improvement of 11.52% over the single recommendation agent and 21.12% over the single user agent. Furthermore, the results show that the agentic feedback loop does not exacerbate popularity or position bias, which are typically amplified by the real-world feedback loop, highlighting its robustness. The source code is available at https://github.com/Lanyu0303/AFL.

LGDec 20, 2023Code
BSL: Understanding and Improving Softmax Loss for Recommendation

Junkang Wu, Jiawei Chen, Jiancan Wu et al.

Loss functions steer the optimization direction of recommendation models and are critical to model performance, but have received relatively little attention in recent recommendation research. Among various losses, we find Softmax loss (SL) stands out for not only achieving remarkable accuracy but also better robustness and fairness. Nevertheless, the current literature lacks a comprehensive explanation for the efficacy of SL. Toward addressing this research gap, we conduct theoretical analyses on SL and uncover three insights: 1) Optimizing SL is equivalent to performing Distributionally Robust Optimization (DRO) on the negative data, thereby learning against perturbations on the negative distribution and yielding robustness to noisy negatives. 2) Comparing with other loss functions, SL implicitly penalizes the prediction variance, resulting in a smaller gap between predicted values and and thus producing fairer results. Building on these insights, we further propose a novel loss function Bilateral SoftMax Loss (BSL) that extends the advantage of SL to both positive and negative sides. BSL augments SL by applying the same Log-Expectation-Exp structure to positive examples as is used for negatives, making the model robust to the noisy positives as well. Remarkably, BSL is simple and easy-to-implement -- requiring just one additional line of code compared to SL. Experiments on four real-world datasets and three representative backbones demonstrate the effectiveness of our proposal. The code is available at https://github.com/junkangwu/BSL

CVJun 17, 2024Code
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

Shihao Cai, Keqin Bao, Hangyu Guo et al.

Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source datasets and related efforts are either too challenging for direct model learning or suffer from misalignment between text and images. To overcome this issue, we introduce a novel pipeline that leverages GPT-4 and GPT-4V to generate relatively basic geometry problems with aligned text and images, facilitating model learning. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Experimental results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks. The code is available at https://github.com/Lanyu0303/GeoGPT4V_Project

IRMay 12, 2023Code
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation

Jizhi Zhang, Keqin Bao, Yang Zhang et al.

The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm -- Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate the fairness of RecLLM with respect to various sensitive attributes on the user side. Due to the differences between the RecLLM paradigm and the traditional recommendation paradigm, it is problematic to directly use the fairness benchmark of traditional recommendation. To address the dilemma, we propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes1 in two recommendation scenarios: music and movies. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations. Our code and dataset can be found at https://github.com/jizhi-zhang/FaiRLLM.

CVNov 7, 2025
FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction

Jiang Lin, Xinyu Chen, Song Wu et al.

Controlling the spatial and semantic structure of diffusion-generated images remains a challenge. Existing methods like ControlNet rely on handcrafted condition maps and retraining, limiting flexibility and generalization. Inversion-based approaches offer stronger alignment but incur high inference cost due to dual-path denoising. We present FreeControl, a training-free framework for semantic structural control in diffusion models. Unlike prior methods that extract attention across multiple timesteps, FreeControl performs one-step attention extraction from a single, optimally chosen key timestep and reuses it throughout denoising. This enables efficient structural guidance without inversion or retraining. To further improve quality and stability, we introduce Latent-Condition Decoupling (LCD): a principled separation of the key timestep and the noised latent used in attention extraction. LCD provides finer control over attention quality and eliminates structural artifacts. FreeControl also supports compositional control via reference images assembled from multiple sources - enabling intuitive scene layout design and stronger prompt alignment. FreeControl introduces a new paradigm for test-time control, enabling structurally and semantically aligned, visually coherent generation directly from raw images, with the flexibility for intuitive compositional design and compatibility with modern diffusion models at approximately 5 percent additional cost.

IRFeb 28, 2024
Prospect Personalized Recommendation on Large Language Model-based Agent Platform

Jizhi Zhang, Keqin Bao, Wenjie Wang et al.

The new kind of Agent-oriented information system, exemplified by GPTs, urges us to inspect the information system infrastructure to support Agent-level information processing and to adapt to the characteristics of Large Language Model (LLM)-based Agents, such as interactivity. In this work, we envisage the prospect of the recommender system on LLM-based Agent platforms and introduce a novel recommendation paradigm called Rec4Agentverse, comprised of Agent Items and Agent Recommender. Rec4Agentverse emphasizes the collaboration between Agent Items and Agent Recommender, thereby promoting personalized information services and enhancing the exchange of information beyond the traditional user-recommender feedback loop. Additionally, we prospect the evolution of Rec4Agentverse and conceptualize it into three stages based on the enhancement of the interaction and information exchange among Agent Items, Agent Recommender, and the user. A preliminary study involving several cases of Rec4Agentverse validates its significant potential for application. Lastly, we discuss potential issues and promising directions for future research.

IRMay 2, 2024
Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach

Tianhao Shi, Yang Zhang, Jizhi Zhang et al.

As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequate means of capturing these attributes. In practice, the efficacy of these approaches is limited, pushing us to investigate ways of promoting fairness with limited sensitive attribute information. Toward this goal, it is important to reconstruct missing sensitive attributes. Nevertheless, reconstruction errors are inevitable due to the complexity of real-world sensitive attribute reconstruction problems and legal regulations. Thus, we pursue fair learning methods that are robust to reconstruction errors. To this end, we propose Distributionally Robust Fair Optimization (DRFO), which minimizes the worst-case unfairness over all potential probability distributions of missing sensitive attributes instead of the reconstructed one to account for the impact of the reconstruction errors. We provide theoretical and empirical evidence to demonstrate that our method can effectively ensure fairness in recommender systems when only limited sensitive attributes are accessible.

CLMay 23, 2023
Robust Prompt Optimization for Large Language Models Against Distribution Shifts

Moxin Li, Wenjie Wang, Fuli Feng et al.

Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. However, their effectiveness is highly dependent on the phrasing of the task prompt, leading to research on automatic prompt optimization using labeled task data. We reveal that these prompt optimization techniques are vulnerable to distribution shifts such as subpopulation shifts, which are common for LLMs in real-world scenarios such as customer reviews analysis. In this light, we propose a new problem of robust prompt optimization for LLMs against distribution shifts, which requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group. To solve this problem, we propose Generalized Prompt Optimization framework, which incorporates the unlabeled data from the target group into prompt optimization. Extensive experimental results demonstrate the effectiveness of the proposed framework with significant performance improvement on the target group and comparable performance on the source group.

CLJun 6, 2021
Empowering Language Understanding with Counterfactual Reasoning

Fuli Feng, Jizhi Zhang, Xiangnan He et al.

Present language understanding methods have demonstrated extraordinary ability of recognizing patterns in texts via machine learning. However, existing methods indiscriminately use the recognized patterns in the testing phase that is inherently different from us humans who have counterfactual thinking, e.g., to scrutinize for the hard testing samples. Inspired by this, we propose a Counterfactual Reasoning Model, which mimics the counterfactual thinking by learning from few counterfactual samples. In particular, we devise a generation module to generate representative counterfactual samples for each factual sample, and a retrospective module to retrospect the model prediction by comparing the counterfactual and factual samples. Extensive experiments on sentiment analysis (SA) and natural language inference (NLI) validate the effectiveness of our method.