TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds
For recommender system researchers and practitioners, TokenFormer provides a unified architecture that avoids dimensional collapse, enabling better performance in both multi-field and sequential recommendation.
Recommender systems have two paradigms (feature interaction and sequential models) that are hard to unify due to Sequential Collapse Propagation (SCP). TokenFormer introduces BFTS attention and NLIR to overcome this, achieving SOTA on public benchmarks and Tencent's advertising platform.
Recommender systems have historically developed along two largely independent paradigms: feature interaction models for modeling correlations among multi-field categorical features, and sequential models for capturing user behavior dynamics from historical interaction sequences. Although recent trends attempt to bridge these paradigms within shared backbones, we empirically reveal that naive unifying these two branches may lead to a failure mode of Sequential Collapse Propagation (SCP). That is, the interaction with those dimensionally ill non-sequence fields leads to the dimensional collapse of the sequence features. To overcome this challenge, we propose TokenFormer, a unified recommendation architecture with the following innovations. First, we introduce a Bottom-Full-Top-Sliding (BFTS) attention scheme, which applies full self-attention in the lower layers and shrinking-window sliding attention in the upper layers. Second, we introduce a Non-Linear Interaction Representation (NLIR) that applies one-sided non-linear multiplicative transformations to the hidden states. Extensive experiments on public benchmarks and Tencent's advertising platform demonstrate state-of-the-art performance, while detailed analysis confirm that TokenFormer significantly improves dimensional robustness and representation discriminability under unified modeling.