IR AIJan 14

On-Device Large Language Models for Sequential Recommendation

arXiv:2601.09306v1h-index: 1WSDM

Originality Incremental advance

AI Analysis

This addresses the need for efficient, private, and robust on-device recommendation systems, though it is incremental as it builds on existing compression and LLM methods.

The paper tackles the problem of deploying large language models (LLMs) for sequential recommendation on resource-constrained devices by proposing OD-LLM, a task-adaptive compression framework that halves model size without loss in effectiveness, as shown in empirical evaluations on benchmarks.

On-device recommendation is critical for a number of real-world applications, especially in scenarios that have agreements on execution latency, user privacy, and robust functionality when internet connectivity is unstable or even impossible. While large language models (LLMs) can now provide exceptional capabilities that model user behavior for sequential recommendation tasks, their substantial memory footprint and computational overhead make the deployment on resource-constrained devices a high risk proposition. In this paper, we propose OD-LLM, the first task-adaptive compression framework explicitly designed to provide efficient and accurate on-device deployment of LLMs for sequential recommendation tasks. OD-LLM uniquely integrates two complementary compression strategies: a low-rank structural compression algorithm which uses Singular Value Decomposition (SVD) to significantly reduce parameter redundancy in the model, and a novel tokenization normalization technique that better complements the low-rank decomposition process being used. Additionally, to minimize any potential performance degradation when using higher compression ratios, a novel progressive alignment algorithm is used to iteratively refine the parameters required layerwise in the target model. Empirical evaluations conducted on sequential recommendation benchmarks show that OD-LLM exhibits no loss in effectiveness when compared to the original recommendation model, when the deployed model size is halved. These promising results demonstrate the efficacy and scalability of OD-LLM, making this novel solution a practical alternative for real-time, on-device solutions wishing to replace expensive, remotely executed LLMs.

View on arXiv PDF

Similar