LGSep 20, 2022

Comparing Shape-Constrained Regression Algorithms for Data Validation

arXiv:2209.09602v12 citationsh-index: 22
Originality Synthesis-oriented
AI Analysis

This work addresses data validation challenges in industrial and scientific applications, but it is incremental as it compares existing algorithms rather than introducing new methods.

The paper tackled the problem of automating data validation for large datasets by comparing shape-constrained regression algorithms, focusing on their classification accuracy and runtime performance to handle domain expert rules like monotonicity and convexity.

Industrial and scientific applications handle large volumes of data that render manual validation by humans infeasible. Therefore, we require automated data validation approaches that are able to consider the prior knowledge of domain experts to produce dependable, trustworthy assessments of data quality. Prior knowledge is often available as rules that describe interactions of inputs with regard to the target e.g. the target must be monotonically decreasing and convex over increasing input values. Domain experts are able to validate multiple such interactions at a glance. However, existing rule-based data validation approaches are unable to consider these constraints. In this work, we compare different shape-constrained regression algorithms for the purpose of data validation based on their classification accuracy and runtime performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes