LG AIFeb 3

QuAIL: Quality-Aware Inertial Learning for Robust Training under Data Corruption

Mattia Sabella, Alberto Archetti, Pietro Pinoli, Matteo Matteucci, Cinzia Cappiello

arXiv:2602.03686v11.4h-index: 31

Originality Incremental advance

AI Analysis

This addresses the challenge of robust tabular learning for practitioners dealing with corrupted data, but it is incremental as it builds on existing models with a novel training mechanism.

The paper tackled the problem of training tabular machine learning models on data with non-uniform corruption, such as noisy measurements and missing entries, by introducing QuAIL, a quality-aware training mechanism that incorporates feature reliability priors into the learning process. The result showed that QuAIL consistently improved average performance over neural baselines across 50 classification and regression datasets under various corruption types, with robust behavior in low-data and biased settings.

Tabular machine learning systems are frequently trained on data affected by non-uniform corruption, including noisy measurements, missing entries, and feature-specific biases. In practice, these defects are often documented only through column-level reliability indicators rather than instance-wise quality annotations, limiting the applicability of many robustness and cleaning techniques. We present QuAIL, a quality-informed training mechanism that incorporates feature reliability priors directly into the learning process. QuAIL augments existing models with a learnable feature-modulation layer whose updates are selectively constrained by a quality-dependent proximal regularizer, thereby inducing controlled adaptation across features of varying trustworthiness. This stabilizes optimization under structured corruption without explicit data repair or sample-level reweighting. Empirical evaluation across 50 classification and regression datasets demonstrates that QuAIL consistently improves average performance over neural baselines under both random and value-dependent corruption, with especially robust behavior in low-data and systematically biased settings. These results suggest that incorporating feature reliability information directly into optimization dynamics is a practical and effective approach for resilient tabular learning.

View on arXiv PDF

Similar