LGSep 8, 2022

Stochastic gradient descent with gradient estimator for categorical features

arXiv:2209.03771v2h-index: 26
AI Analysis

This addresses a bottleneck in applying machine learning to categorical data in domains like health or supply chain, with incremental improvements in optimization methods.

The paper tackled the problem of gradient estimators being unsuited for sparse categorical data from one-hot encoding, introducing a novel gradient estimator that performs better than common estimators on various datasets and model architectures.

Categorical data are present in key areas such as health or supply chain, and this data require specific treatment. In order to apply recent machine learning models on such data, encoding is needed. In order to build interpretable models, one-hot encoding is still a very good solution, but such encoding creates sparse data. Gradient estimators are not suited for sparse data: the gradient is mainly considered as zero while it simply does not always exists, thus a novel gradient estimator is introduced. We show what this estimator minimizes in theory and show its efficiency on different datasets with multiple model architectures. This new estimator performs better than common estimators under similar settings. A real world retail dataset is also released after anonymization. Overall, the aim of this paper is to thoroughly consider categorical data and adapt models and optimizers to these key features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes