LGFeb 7, 2021

Additive Feature Hashing

arXiv:2102.03943v1

Originality Synthesis-oriented

AI Analysis

This work offers an alternative method for feature encoding, potentially benefiting practitioners working with categorical data, though it appears to be an incremental improvement.

This paper proposes additive feature hashing, a new method for encoding categorical features into fixed-length numerical vectors by directly adding hash values. The authors demonstrate that its performance is comparable to the traditional hashing trick across synthetic, language recognition, and SMS spam detection datasets.

The hashing trick is a machine learning technique used to encode categorical features into a numerical vector representation of pre-defined fixed length. It works by using the categorical hash values as vector indices, and updating the vector values at those indices. Here we discuss a different approach based on additive-hashing and the "almost orthogonal" property of high-dimensional random vectors. That is, we show that additive feature hashing can be performed directly by adding the hash values and converting them into high-dimensional numerical vectors. We show that the performance of additive feature hashing is similar to the hashing trick, and we illustrate the results numerically using synthetic, language recognition, and SMS spam detection data.

View on arXiv PDF

Similar