Generating Multi-type Temporal Sequences to Mitigate Class-imbalanced Problem
This addresses class imbalance in ad network tasks like fraud detection and CTR prediction, but it is incremental as it builds on existing GAN methods with specific training modifications.
The study tackled the class imbalance problem in ad network user activity sequences by proposing two multi-type training approaches for GANs to generate synthetic sequences, with experiments on synthetic data showing the generator can produce sequences meeting desired criteria.
From the ad network standpoint, a user's activity is a multi-type sequence of temporal events consisting of event types and time intervals. Understanding user patterns in ad networks has received increasing attention from the machine learning community. Particularly, the problems of fraud detection, Conversion Rate (CVR), and Click-Through Rate (CTR) prediction are of interest. However, the class imbalance between major and minor classes in these tasks can bias a machine learning model leading to poor performance. This study proposes using two multi-type (continuous and discrete) training approaches for GANs to deal with the limitations of traditional GANs in passing the gradient updates for discrete tokens. First, we used the Reinforcement Learning (RL)-based training approach and then, an approximation of the multinomial distribution parameterized in terms of the softmax function (Gumble-Softmax). Our extensive experiments based on synthetic data have shown the trained generator can generate sequences with desired properties measured by multiple criteria.