Chengrui Zhou

LG
h-index4
7papers
74citations
Novelty53%
AI Score49

7 Papers

LGMar 24, 2025
Solving Situation Puzzles with Large Language Model and External Reformulation

Kun Li, Xinwei Chen, Tianyou Song et al.

In recent years, large language models (LLMs) have shown an impressive ability to perform arithmetic and symbolic reasoning tasks. However, we found that LLMs (e.g., ChatGPT) cannot perform well on reasoning that requires multiple rounds of dialogue, especially when solving situation puzzles. Specifically, LLMs intend to ask very detailed questions focusing on a specific aspect or same/similar questions after several rounds of Q&As. To help LLMs get out of the above dilemma, we propose a novel external reformulation methodology, where the situation puzzle will be reformulated after several rounds of Q&A or when the LLMs raise an incorrect guess. Experiments show superior performance (e.g., win rate, number of question/guess attempts) of our method than directly using LLMs for solving situation puzzles, highlighting the potential of strategic problem reformulation to enhance the reasoning capabilities of LLMs in complex interactive scenarios.

DCJun 21, 2025
Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

Haowei Yang, Yu Tian, Zhongheng Yang et al.

With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper systematically investigates two classes of optimization methods-model parallelism and data parallelism-for distributed training of LLMs in recommendation scenarios. For model parallelism, we implement both tensor parallelism and pipeline parallelism, and introduce an adaptive load-balancing mechanism to reduce cross-device communication overhead. For data parallelism, we compare synchronous and asynchronous modes, combining gradient compression and sparsification techniques with an efficient aggregation communication framework to significantly improve bandwidth utilization. Experiments conducted on a real-world recommendation dataset in a simulated service environment demonstrate that our proposed hybrid parallelism scheme increases training throughput by over 30% and improves resource utilization by approximately 20% compared to traditional single-mode parallelism, while maintaining strong scalability and robustness. Finally, we discuss trade-offs among different parallel strategies in online deployment and outline future directions involving heterogeneous hardware integration and automated scheduling technologies.

LGJul 22, 2025
Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs

Yushang Zhao, Huijie Shen, Dannier Li et al.

Generative, explainable, and flexible recommender systems, derived using Large Language Models (LLM) are promising and poorly adapted to the cold-start user situation, where there is little to no history of interaction. The current solutions i.e. supervised fine-tuning and collaborative filtering are dense-user-item focused and would be expensive to maintain and update. This paper introduces a meta-learning framework, that can be used to perform parameter-efficient prompt-tuning, to effectively personalize LLM-based recommender systems quickly at cold-start. The model learns soft prompt embeddings with first-order (Reptile) and second-order (MAML) optimization by treating each of the users as the tasks. As augmentations to the input tokens, these learnable vectors are the differentiable control variables that represent user behavioral priors. The prompts are meta-optimized through episodic sampling, inner-loop adaptation, and outer-loop generalization. On MovieLens-1M, Amazon Reviews, and Recbole, we can see that our adaptive model outperforms strong baselines in NDCG@10, HR@10, and MRR, and it runs in real-time (i.e., below 300 ms) on consumer GPUs. Zero-history personalization is also supported by this scalable solution, and its 275 ms rate of adaptation allows successful real-time risk profiling of financial systems by shortening detection latency and improving payment network stability. Crucially, the 275 ms adaptation capability can enable real-time risk profiling for financial institutions, reducing systemic vulnerability detection latency significantly versus traditional compliance checks. By preventing contagion in payment networks (e.g., Fedwire), the framework strengthens national financial infrastructure resilience.

IRJun 13, 2025
Deep Learning Model Acceleration and Optimization Strategies for Real-Time Recommendation Systems

Junli Shao, Jing Dong, Dingzhou Wang et al.

With the rapid growth of Internet services, recommendation systems play a central role in delivering personalized content. Faced with massive user requests and complex model architectures, the key challenge for real-time recommendation systems is how to reduce inference latency and increase system throughput without sacrificing recommendation quality. This paper addresses the high computational cost and resource bottlenecks of deep learning models in real-time settings by proposing a combined set of modeling- and system-level acceleration and optimization strategies. At the model level, we dramatically reduce parameter counts and compute requirements through lightweight network design, structured pruning, and weight quantization. At the system level, we integrate multiple heterogeneous compute platforms and high-performance inference libraries, and we design elastic inference scheduling and load-balancing mechanisms based on real-time load characteristics. Experiments show that, while maintaining the original recommendation accuracy, our methods cut latency to less than 30% of the baseline and more than double system throughput, offering a practical solution for deploying large-scale online recommendation services.

16.5HCApr 1
Cognitive Alignment Deciphered: A Self-Developed Scenario-Based Prompt Scale Coupled with Representational Similarity Analysis and Social Network Analysis for Unraveling Bias Mechanisms Across Humans and LLMs

Chengrui Zhou

Traditional cognitive bias measurement tools are limited by narrow bias coverage, low ecological validity, and reliance on abstract self reports, constraining scenario based and human AI comparisons. We introduce the context based Cognitive Bias Assessment Scale CBAS, a scenario driven prompt template covering 58 cognitive biases across five hot cold dual system dimensions: Calculation, Belief, Information, Social, and Memory. Psychometric testing with 330 participants shows satisfactory reliability Cronbachs alpha 0.714 and good model fit chi squared df 1.83, RMSEA 0.057, CFI 0.908, TLI 0.903. We then combine Representational Similarity Analysis RSA and Social Network Analysis SNA to compare human age groups and three large language models Baidu ERNIE 3.5 8K, DeepSeek V3, DeepSeek R1. Humans show coherent hot cold integration with high inter individual variability, whereas LLMs display fragmented, inflexible response patterns and lower variability. Human cognitive networks exhibit strong inter module connectivity, while LLMs show fixed core biases and isolated information processing components. Prompt interventions integrating role playing and bias mitigation instructions effectively improve LLM response accuracy, reaching 84.86 percent for DeepSeek R1 and 78.24 percent for DeepSeek V3, and partially reshape their internal representations. Our work establishes a replicable assessment and analysis pipeline for cognitive alignment research, bridging empirical psychological evaluation and interpretable artificial intelligence.

IRSep 16, 2025
Efficient Cold-Start Recommendation via BPE Token-Level Embedding Initialization with LLM

Yushang Zhao, Xinyue Han, Qian Leng et al.

The cold-start issue is the challenge when we talk about recommender systems, especially in the case when we do not have the past interaction data of new users or new items. Content-based features or hybrid solutions are common as conventional solutions, but they can only work in a sparse metadata environment with shallow patterns. In this paper, the efficient cold-start recommendation strategy is presented, which is based on the sub word-level representations by applying Byte Pair Encoding (BPE) tokenization and pre-trained Large Language Model (LLM) embedding in the initialization procedure. We obtain fine-grained token-level vectors that are aligned with the BPE vocabulary as opposed to using coarse-grained sentence embeddings. Together, these token embeddings can be used as dense semantic priors on unseen entities, making immediate recommendation performance possible without user-item interaction history. Our mechanism can be compared to collaborative filtering systems and tested over benchmark datasets with stringent cold-start assumptions. Experimental findings show that the given BPE-LLM method achieves higher Recall@k, NDCG@k, and Hit Rate measurements compared to the standard baseline and displays the same capability of sufficient computational performance. Furthermore, we demonstrate that using subword-aware embeddings yields better generalizability and is more interpretable, especially within a multilingual and sparse input setting. The practical application of token-level semantic initialization as a lightweight, but nevertheless effective extension to modern recommender systems in the zero-shot setting is indicated within this work.

LGAug 7, 2025
RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders

Zhongheng Yang, Aijia Sun, Yushang Zhao et al.

Conversational recommender systems (CRS) based on Large Language Models (LLMs) need to constantly be aligned to the user preferences to provide satisfying and context-relevant item recommendations. The traditional supervised fine-tuning cannot capture the implicit feedback signal, e.g., dwell time, sentiment polarity, or engagement patterns. In this paper, we share a fine-tuning solution using human feedback reinforcement learning (RLHF) to maximize implied user feedback (IUF) in a multi-turn recommendation context. We specify a reward model $R_φ$ learnt on weakly-labelled engagement information and maximize user-centric utility by optimizing the foundational LLM M_θ through a proximal policy optimization (PPO) approach. The architecture models conversational state transitions $s_t \to a_t \to s_{t +1}$, where the action $a_t$ is associated with LLM-generated item suggestions only on condition of conversation history in the past. The evaluation across synthetic and real-world datasets (e.g.REDIAL, OpenDialKG) demonstrates that our RLHF-fine-tuned models can perform better in terms of top-$k$ recommendation accuracy, coherence, and user satisfaction compared to (arrow-zero-cmwrquca-teja-falset ensuite 2Round group-deca States penalty give up This paper shows that implicit signal alignment can be efficient in achieving scalable and user-adaptive design of CRS.