IR CL LGAug 9, 2023

TBIN: Modeling Long Textual Behavior Data for CTR Prediction

Shuwei Chen, Xiang Li, Jian Dong, Jin Zhang, Yongkang Wang, Xingxing Wang

arXiv:2308.08483v18.36 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses the challenge of handling long textual behavior data in CTR prediction for recommendation systems, representing an incremental improvement over existing LM-based methods.

The paper tackles the problem of modeling long user behavior data for CTR prediction by proposing TBIN, which uses efficient hashing and shifted chunk-based self-attention to avoid truncation and dynamically activate diverse interests, achieving effectiveness in offline and online experiments on a real-world food recommendation platform.

Click-through rate (CTR) prediction plays a pivotal role in the success of recommendations. Inspired by the recent thriving of language models (LMs), a surge of works improve prediction by organizing user behavior data in a \textbf{textual} format and using LMs to understand user interest at a semantic level. While promising, these works have to truncate the textual data to reduce the quadratic computational overhead of self-attention in LMs. However, it has been studied that long user behavior data can significantly benefit CTR prediction. In addition, these works typically condense user diverse interests into a single feature vector, which hinders the expressive capability of the model. In this paper, we propose a \textbf{T}extual \textbf{B}ehavior-based \textbf{I}nterest Chunking \textbf{N}etwork (TBIN), which tackles the above limitations by combining an efficient locality-sensitive hashing algorithm and a shifted chunk-based self-attention. The resulting user diverse interests are dynamically activated, producing user interest representation towards the target item. Finally, the results of both offline and online experiments on real-world food recommendation platform demonstrate the effectiveness of TBIN.

View on arXiv PDF

Similar