IR AIDec 1, 2022

CL4CTR: A Contrastive Learning Framework for CTR Prediction

Fangye Wang, Yingxu Wang, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Ning Gu

arXiv:2212.00522v118.373 citationsh-index: 88Has Code

Originality Highly original

AI Analysis

This work addresses a domain-specific issue in CTR prediction for recommendation systems, offering an incremental improvement through a novel method for a known bottleneck.

The paper tackles the problem of sub-optimal feature representation learning in CTR prediction, which often neglects low-frequency features, by proposing a contrastive learning framework that improves performance, achieving state-of-the-art results on four datasets.

Many Click-Through Rate (CTR) prediction works focused on designing advanced architectures to model complex feature interactions but neglected the importance of feature representation learning, e.g., adopting a plain embedding layer for each feature, which results in sub-optimal feature representations and thus inferior CTR prediction performance. For instance, low frequency features, which account for the majority of features in many CTR tasks, are less considered in standard supervised learning settings, leading to sub-optimal feature representations. In this paper, we introduce self-supervised learning to produce high-quality feature representations directly and propose a model-agnostic Contrastive Learning for CTR (CL4CTR) framework consisting of three self-supervised learning signals to regularize the feature representation learning: contrastive loss, feature alignment, and field uniformity. The contrastive module first constructs positive feature pairs by data augmentation and then minimizes the distance between the representations of each positive feature pair by the contrastive loss. The feature alignment constraint forces the representations of features from the same field to be close, and the field uniformity constraint forces the representations of features from different fields to be distant. Extensive experiments verify that CL4CTR achieves the best performance on four datasets and has excellent effectiveness and compatibility with various representative baselines.

View on arXiv PDF Code

Similar