IRAIFeb 3

Distribution-Aware End-to-End Embedding for Streaming Numerical Features in Click-Through Rate Prediction

arXiv:2602.03223v11 citationsh-index: 11
Originality Highly original
AI Analysis

It addresses a critical bottleneck in streaming CTR prediction for large-scale platforms, offering a novel solution to improve accuracy and deployment efficiency.

This paper tackles the problem of numerical feature embedding for Click-Through Rate prediction in streaming environments, where conventional methods suffer from semantic drift and neglect distributional information, and proposes DAES, an end-to-end framework that integrates distributional information with adaptive modulation, achieving significant performance improvements as shown in offline and online experiments on a platform with hundreds of millions of users.

This paper explores effective numerical feature embedding for Click-Through Rate prediction in streaming environments. Conventional static binning methods rely on offline statistics of numerical distributions; however, this inherently two-stage process often triggers semantic drift during bin boundary updates. While neural embedding methods enable end-to-end learning, they often discard explicit distributional information. Integrating such information end-to-end is challenging because streaming features often violate the i.i.d. assumption, precluding unbiased estimation of the population distribution via the expectation of order statistics. Furthermore, the critical context dependency of numerical distributions is often neglected. To this end, we propose DAES, an end-to-end framework designed to tackle numerical feature embedding in streaming training scenarios by integrating distributional information with an adaptive modulation mechanism. Specifically, we introduce an efficient reservoir-sampling-based distribution estimation method and two field-aware distribution modulation strategies to capture streaming distributions and field-dependent semantics. DAES significantly outperforms existing approaches as demonstrated by extensive offline and online experiments and has been fully deployed on a leading short-video platform with hundreds of millions of daily active users.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes