NEFeb 28

GeNeX: Genetic Network eXperts framework for addressing Validation Overfitting

arXiv:2603.11056h-index: 29

AI Analysis

This addresses reliability issues in machine learning deployment, especially in low-data or shifting scenarios, though it is an incremental improvement over existing ensemble methods.

The authors tackled validation overfitting (VO) in model selection and ensemble construction by proposing GeNeX, a framework that uses genetic evolution and clustering to generate robust models and build ensembles without relying on validation scores, achieving improved generalization under distribution shifts.

Excessive reliance on validation performance during model selection can lead to validation overfitting (VO), where models appear effective during development but fail at test time. This issue is further amplified in low-data regimes and under distribution shifts, where validation signals become unreliable. Although ensemble learning is widely used to improve robustness and generalization, most ensemble construction pipelines depend heavily on validation scores, leaving them vulnerable to VO and limiting their reliability in real-world deployment scenarios. To address this, we propose GeNeX (Genetic Network eXperts), a framework that mitigates validation overfitting at both model generation and ensemble construction stages. In the generation phase, GeNeX employs a dual-path strategy: gradient-based training is coupled with genetic model evolution. Offspring networks are created via crossover of trained parents, promoting structural diversity and weight-level regeneration without relying on validation feedback. This results in a candidate pool of robust, non-overfitted models. During ensemble construction, the candidate networks are clustered by prediction behavior to identify complementary model spaces. Within each cluster, multiple diverse experts are selected using criteria such as robustness and representativeness, and fused at the weight level to form compact prototype networks. The final ensemble aggregates these prototypes, with predictions optimized via Sequential Quadratic Programming for output-level synergy. To rigorously evaluate VO resilience, we introduce a VO-aware evaluation protocol that simulates realistic deployment scenarios by enforcing distributional divergence between training and test subsets.

View on arXiv PDF

Similar