Sampling Techniques in Bayesian Target Encoding
This work addresses a specific issue in tabular data processing for machine learning practitioners, but it is incremental as it builds on existing Bayesian encoding methods.
The paper tackled the problem of target leakage and poor generalization in Bayesian target encoding by incorporating sampling techniques to better capture intra-category distributions, resulting in improved encoding performance.
Target encoding is an effective encoding technique of categorical variables and is often used in machine learning systems for processing tabular data sets with mixed numeric and categorical variables. Recently en enhanced version of this encoding technique was proposed by using conjugate Bayesian modeling. This paper presents a further development of Bayesian encoding method by using sampling techniques, which helps in extracting information from intra-category distribution of the target variable, improves generalization and reduces target leakage.