LG AIOct 29, 2021

Improving the quality of generative models through Smirnov transformation

Ángel González-Prieto, Alberto Mozo, Sandra Gómez-Canaval, Edgar Talavera

arXiv:2110.15914v13.110 citations

Originality Incremental advance

AI Analysis

This addresses data quality and privacy problems for machine learning practitioners, though it is incremental as it builds on existing GAN methods.

The paper tackles convergence issues in Generative Adversarial Networks (GANs) by proposing a novel activation function based on the Smirnov transformation, which improves generated data quality and allows synthetic data to fully substitute real data for training a classifier without accuracy loss.

Solving the convergence issues of Generative Adversarial Networks (GANs) is one of the most outstanding problems in generative models. In this work, we propose a novel activation function to be used as output of the generator agent. This activation function is based on the Smirnov probabilistic transformation and it is specifically designed to improve the quality of the generated data. In sharp contrast with previous works, our activation function provides a more general approach that deals not only with the replication of categorical variables but with any type of data distribution (continuous or discrete). Moreover, our activation function is derivable and therefore, it can be seamlessly integrated in the backpropagation computations during the GAN training processes. To validate this approach, we evaluate our proposal against two different data sets: a) an artificially rendered data set containing a mixture of discrete and continuous variables, and b) a real data set of flow-based network traffic data containing both normal connections and cryptomining attacks. To evaluate the fidelity of the generated data, we analyze both their results in terms of quality measures of statistical nature and also regarding the use of these synthetic data to feed a nested machine learning-based classifier. The experimental results evince a clear outperformance of the GAN network tuned with this new activation function with respect to both a naïve mean-based generator and a standard GAN. The quality of the data is so high that the generated data can fully substitute real data for training the nested classifier without a fall in the obtained accuracy. This result encourages the use of GANs to produce high-quality synthetic data that are applicable in scenarios in which data privacy must be guaranteed.

View on arXiv PDF

Similar