IR AI DBApr 27

Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

Liang Guo, Ge Song, Litao Deng, Jianhui Sun, Chufeng Hu, Lu Zhang, Zhen Ma, Shouwei Chen, Weiran Liu, Sarang Masti Sreeshylan, Xiaoxuan Meng

arXiv:2604.2480647.2

Predicted impact top 68% in IR · last 90 daysOriginality Highly original

AI Analysis

For large-scale recommendation systems, this work addresses a critical infrastructure bottleneck that limits sequence length scaling, enabling significant model quality gains with reduced resource usage.

The paper tackles the storage and I/O bottleneck in training deep learning recommendation models with ultra-long user interaction sequences, where the standard 'Fat Row' paradigm causes excessive data redundancy. The proposed versioned late materialization system reduces data infrastructure resource usage while enabling sequence length scaling that improves model quality, serving as the foundation for architectures like HSTU and ULTRA-HSTU.

Modern Deep Learning Recommendation Models (DLRMs) follow scaling laws with sequence length, driving the frontier toward ultra-long User Interaction History (UIH). However, the industry-standard "Fat Row" paradigm, which pre-materializes these sequences into every training example, creates a storage and I/O wall where data infrastructure usage exceeds GPU training capacity due to data redundancy that is amplified in multi-tenant environments where models with vastly different sequence length requirements share a union dataset. We present a \emph{versioned late materialization} paradigm that eliminates this redundancy by storing UIH once in a normalized, immutable tier and reconstructing sequences just-in-time during training via lightweight versioned pointers. The system ensures Online-to-Offline (O2O) consistency through a bifurcated protocol that prevents future leakage across both streaming and batch training, while a read-optimized immutable storage layer provides multi-dimensional projection pushdown for heterogeneous model tenants. Disaggregated data preprocessing with pipelined I/O prefetching and data-affinity optimizations masks the latency of training-time sequence reconstruction, keeping training throughput compute-bound by GPUs. Deployed on production DLRMs, the system reduces training data infrastructure resource usage while enabling aggressive sequence length scaling that delivers significant model quality gains, serving as the foundational data infrastructure for modern recommendation model architectures, including HSTU and ULTRA-HSTU.

View on arXiv PDF

Similar