DB IRJan 3, 2019

Une nouvelle approche de complétion des valeurs manquantes dans les bases de données

arXiv:1901.00671v1

Originality Incremental advance

AI Analysis

This work addresses data quality issues for data scientists and analysts, but it appears incremental as it builds on existing association rule methods.

The paper tackles the problem of missing values in datasets by proposing a new approach that uses generic association rules to complete missing data, reducing conflicts and achieving a high percentage of correct completion accuracy.

When tackling real-life datasets, it is common to face the existence of scrambled missing values within data. Considered as 'dirty data', usually it is removed during a pre-processing step. Starting from the fact that 'making up this missing data is better than throwing out it away', we present a new approach trying to complete missing data. The main singularity of the introduced approach is that it sheds light on a fruitful synergy between generic basis of association rules and the topic of missing values handling. In fact, beyond interesting compactness rate, such generic association rules make it possible to get a considerable reduction of conflicts during the completion step. A new metric called 'Robustness' is also introduced, and aims to select the robust association rule for the completion of a missing value whenever a conflict appears. Carried out experiments on benchmark datasets confirm the soundness of our approach. Thus, it reduces conflict during the completion step while offering a high percentage of correct completion accuracy.

View on arXiv PDF

Similar