LGFeb 14, 2022

What is Next when Sequential Prediction Meets Implicitly Hard Interaction?

Kaixi Hu, Lin Li, Qing Xie, Jianquan Liu, Xiaohui Tao

arXiv:2202.06620v15.823 citations

Originality Incremental advance

AI Analysis

This addresses generalization issues in sequential prediction for applications in cyber and physical spaces, but it is incremental as it builds on existing base networks.

The paper tackles the problem of implicitly hard interactions in sequential prediction tasks, where models may learn only a subset of patterns, weakening generalization. The proposed HAIL framework, using mutual exclusivity distillation, outperforms state-of-the-art methods on four datasets in terms of top-k metrics.

Hard interaction learning between source sequences and their next targets is challenging, which exists in a myriad of sequential prediction tasks. During the training process, most existing methods focus on explicitly hard interactions caused by wrong responses. However, a model might conduct correct responses by capturing a subset of learnable patterns, which results in implicitly hard interactions with some unlearned patterns. As such, its generalization performance is weakened. The problem gets more serious in sequential prediction due to the interference of substantial similar candidate targets. To this end, we propose a Hardness Aware Interaction Learning framework (HAIL) that mainly consists of two base sequential learning networks and mutual exclusivity distillation (MED). The base networks are initialized differently to learn distinctive view patterns, thus gaining different training experiences. The experiences in the form of the unlikelihood of correct responses are drawn from each other by MED, which provides mutual exclusivity knowledge to figure out implicitly hard interactions. Moreover, we deduce that the unlikelihood essentially introduces additional gradients to push the pattern learning of correct responses. Our framework can be easily extended to more peer base networks. Evaluation is conducted on four datasets covering cyber and physical spaces. The experimental results demonstrate that our framework outperforms several state-of-the-art methods in terms of top-k based metrics.

View on arXiv PDF

Similar