OCLGMEMLJul 12, 2023

Outlier detection in regression: conic quadratic formulations

arXiv:2307.05975v17 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses the issue of weak relaxations and poor performance in outlier detection for regression, which is important for applications with corrupted data, though it appears incremental as it builds on existing optimization frameworks.

The paper tackles the problem of outlier detection in linear regression by developing stronger second-order conic relaxations that avoid big-M constraints, resulting in formulations that are several orders-of-magnitude faster than existing methods.

In many applications, when building linear regression models, it is important to account for the presence of outliers, i.e., corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic terms, each given by the product of a binary variable and a quadratic term of the continuous variables. Existing approaches in the literature, typically relying on the linearization of the cubic terms using big-M constraints, suffer from weak relaxation and poor performance in practice. In this work we derive stronger second-order conic relaxations that do not involve big-M constraints. Our computational experiments indicate that the proposed formulations are several orders-of-magnitude faster than existing big-M formulations in the literature for this problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes