Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing
This addresses the challenge of feature generation for data scientists by automating the identification of useful interactions, though it appears incremental as it builds on existing representation and reinforcement learning techniques.
The paper tackles the problem of automatically generating meaningful features from feature interactions, proposing a framework that combines categorical hashing representation and hierarchical reinforcement crossing to achieve self-optimizing feature generation, with experimental results demonstrating its effectiveness and efficiency.
Feature generation aims to generate new and meaningful features to create a discriminative representation space.A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space, in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities.We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation.To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach.We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.