AILGOct 9, 2023

Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective

arXiv:2310.05464v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the need for transparent and cost-effective predictive models in clinical domains, though it is incremental as it builds on existing optimization and feature selection techniques.

The authors tackled the problem of selecting interpretable feature subsets for logistic regression, particularly in clinical settings, by proposing a certifiably optimal method using mixed-integer conic optimization that accounts for feature costs, and they evaluated it on synthetic clinical datasets to identify limitations in low-data and noisy-label scenarios.

A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes