ptype: Probabilistic Type Inference
This addresses data quality issues for data scientists and analysts, though it appears incremental as it builds on existing type inference methods.
The paper tackles the problem of type inference in data columns that contain missing data and anomalies, proposing ptype, a probabilistic robust method that outperforms existing approaches.
Type inference refers to the task of inferring the data type of a given column of data. Current approaches often fail when data contains missing data and anomalies, which are found commonly in real-world data sets. In this paper, we propose ptype, a probabilistic robust type inference method that allows us to detect such entries, and infer data types. We further show that the proposed method outperforms the existing methods.