LGIRJun 30, 2022

Using Person Embedding to Enrich Features and Data Augmentation for Classification

arXiv:2206.15162v11.8h-index: 3
Originality Synthesis-oriented
AI Analysis

This addresses fraud detection for businesses by enhancing classification models with embedding-based features and augmentation, but it is incremental as it applies existing NLP methods to a new domain.

The study tackled fraud detection classification on an imbalanced dataset by using word embedding to create customer vectors as features and re-labeling similar rows as positive for data augmentation, resulting in improved model success with a positive effect observed.

Today, machine learning is applied in almost any field. In machine learning, where there are numerous methods, classification is one of the most basic and crucial ones. Various problems can be solved by classification. The feature selection for model setup is extremely important, and producing new features via feature engineering also has a vital place in the success of the model. In our study, fraud detection classification models are built on a labeled and imbalanced dataset as a case-study. Although it is a natural language processing method, a customer space has been created with word embedding, which has been used in different areas, especially for recommender systems. The customer vectors in the created space are fed to the classification model as a feature. Moreover, to increase the number of positive labels, rows with similar characteristics are re-labeled as positive by using customer similarity determined by embedding. The model in which embedding methods are included in the classification, which provides a better representation of customers, has been compared with other models. Considering the results, it is observed that the customer embedding method had a positive effect on the success of the classification models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes