LG IROct 24, 2022

Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment

Chenxiao Yang, Qitian Wu, Qingsong Wen, Zhiqiang Zhou, Liang Sun, Junchi Yan

arXiv:2210.13005v211.837 citationsh-index: 70Has Code

Originality Highly original

AI Analysis

This addresses the challenge of out-of-distribution generalization in sequential prediction for applications such as recommendation systems and user behavior analysis, offering a novel causal approach to mitigate distribution shift.

The paper tackles the problem of sequential event prediction under temporal distribution shift by revealing that existing maximum likelihood estimation methods fail due to latent context confounders, and proposes a new learning objective based on backdoor adjustment and variational inference, achieving improved performance in tasks like sequential recommendation.

The goal of sequential event prediction is to estimate the next event based on a sequence of historical events, with applications to sequential recommendation, user behavior analysis and clinical treatment. In practice, the next-event prediction models are trained with sequential data collected at one time and need to generalize to newly arrived sequences in remote future, which requires models to handle temporal distribution shift from training to testing. In this paper, we first take a data-generating perspective to reveal a negative result that existing approaches with maximum likelihood estimation would fail for distribution shift due to the latent context confounder, i.e., the common cause for the historical events and the next event. Then we devise a new learning objective based on backdoor adjustment and further harness variational inference to make it tractable for sequence learning problems. On top of that, we propose a framework with hierarchical branching structures for learning context-specific representations. Comprehensive experiments on diverse tasks (e.g., sequential recommendation) demonstrate the effectiveness, applicability and scalability of our method with various off-the-shelf models as backbones.

View on arXiv PDF Code

Similar