Time Aggregation Features for XGBoost Models
This work addresses incremental improvements in click-through rate prediction for online advertising, focusing on practical feature engineering under strict time constraints.
The paper tackles improving click-through rate prediction with XGBoost models by comparing time aggregation features, such as trailing and event count windows, against a baseline of time-aware target encoding. It finds that trailing windows improve ROC AUC by about 0.0066 to 0.0082 and PR AUC by about 0.0084 to 0.0094 on the Avazu dataset, with event count windows offering only small additional gains.
This paper studies time aggregation features for XGBoost models in click-through rate prediction. The setting is the Avazu click-through rate prediction dataset with strict out-of-time splits and a no-lookahead feature constraint. Features for hour H use only impressions from hours strictly before H. This paper compares a strong time-aware target encoding baseline to models augmented with entity history time aggregation under several window designs. Across two rolling-tail folds on a deterministic ten percent sample, a trailing window specification improves ROC AUC by about 0.0066 to 0.0082 and PR AUC by about 0.0084 to 0.0094 relative to target encoding alone. Within the time aggregation design grid, event count windows provide the only consistent improvement over trailing windows, and the gain is small. Gap windows and bucketized windows underperform simple trailing windows in this dataset and protocol. These results support a practical default of trailing windows, with an optional event count window when marginal ROC AUC gains matter.