LG AI CVDec 14, 2025

PerNodeDrop: A Method Balancing Specialized Subnets and Regularization in Deep Neural Networks

arXiv:2512.12663v1

Originality Incremental advance

AI Analysis

This addresses the problem of overfitting for practitioners using deep neural networks, but it is incremental as it builds on existing noise-based regularizers like Dropout and DropConnect.

The paper tackles overfitting in deep neural networks by introducing PerNodeDrop, a stochastic regularization method that applies per-sample, per-node perturbations to preserve useful co-adaptation while reducing spurious patterns, resulting in improved generalization on vision, text, and audio benchmarks.

Deep neural networks possess strong representational capacity yet remain vulnerable to overfitting, primarily because neurons tend to co-adapt in ways that, while capturing complex and fine-grained feature interactions, also reinforce spurious and non-generalizable patterns that inflate training performance but reduce reliability on unseen data. Noise-based regularizers such as Dropout and DropConnect address this issue by injecting stochastic perturbations during training, but the noise they apply is typically uniform across a layer or across a batch of samples, which can suppress both harmful and beneficial co-adaptation. This work introduces PerNodeDrop, a lightweight stochastic regularization method. It applies per-sample, per-node perturbations to break the uniformity of the noise injected by existing techniques, thereby allowing each node to experience input-specific variability. Hence, PerNodeDrop preserves useful co-adaptation while applying regularization. This narrows the gap between training and validation performance and improves reliability on unseen data, as evident from the experiments. Although superficially similar to DropConnect, PerNodeDrop operates at the sample level. It drops weights at the sample level, not the batch level. An expected-loss analysis formalizes how its perturbations attenuate excessive co-adaptation while retaining predictive interactions. Empirical evaluations on vision, text, and audio benchmarks indicate improved generalization relative to the standard noise-based regularizer.

View on arXiv PDF

Similar