LGMLJun 4, 2020

Inject Machine Learning into Significance Test for Misspecified Linear Models

arXiv:2006.03167v1
Originality Incremental advance
AI Analysis

This addresses a critical issue for researchers in social science and statistics who rely on linear models for interpretability but face inaccuracies in significance tests when assumptions are violated, though it appears incremental as it builds on existing linear approximation techniques.

The paper tackles the problem of linear regression failing to provide correct significance levels under model misspecification, especially for non-linear ground truth functions, by proposing an assumption-free method that uses machine learning to fit and linearly approximate the function, with experimental results showing it significantly outperforms linear regression in non-linear scenarios.

Due to its strong interpretability, linear regression is widely used in social science, from which significance test provides the significance level of models or coefficients in the traditional statistical inference. However, linear regression methods rely on the linear assumptions of the ground truth function, which do not necessarily hold in practice. As a result, even for simple non-linear cases, linear regression may fail to report the correct significance level. In this paper, we present a simple and effective assumption-free method for linear approximation in both linear and non-linear scenarios. First, we apply a machine learning method to fit the ground truth function on the training set and calculate its linear approximation. Afterward, we get the estimator by adding adjustments based on the validation set. We prove the concentration inequalities and asymptotic properties of our estimator, which leads to the corresponding significance test. Experimental results show that our estimator significantly outperforms linear regression for non-linear ground truth functions, indicating that our estimator might be a better tool for the significance test.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes