Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint
This addresses a critical issue in sectors like healthcare where few-shot data is clinically relevant, offering an incremental improvement by integrating distribution information into regression, which has been rarely explored compared to classification.
The paper tackles the problem of imbalanced data distributions in regression tasks, where models overfit in many-shot regions and underperform in few-shot regions, by introducing a novel loss function called Dist Loss that minimizes distribution distance between predictions and labels, achieving state-of-the-art results in sparse data regions across datasets like IMDB-WIKI-DIR, AgeDB-DIR, and ECG-Ka-DIR.
Imbalanced data distributions are prevalent in real-world scenarios, posing significant challenges in both imbalanced classification and imbalanced regression tasks. They often cause deep learning models to overfit in areas of high sample density (many-shot regions) while underperforming in areas of low sample density (few-shot regions). This characteristic restricts the utility of deep learning models in various sectors, notably healthcare, where areas with few-shot data hold greater clinical relevance. While recent studies have shown the benefits of incorporating distribution information in imbalanced classification tasks, such strategies are rarely explored in imbalanced regression. In this paper, we address this issue by introducing a novel loss function, termed Dist Loss, designed to minimize the distribution distance between the model's predictions and the target labels in a differentiable manner, effectively integrating distribution information into model training. Dist Loss enables deep learning models to regularize their output distribution during training, effectively enhancing their focus on few-shot regions. We have conducted extensive experiments across three datasets spanning computer vision and healthcare: IMDB-WIKI-DIR, AgeDB-DIR, and ECG-Ka-DIR. The results demonstrate that Dist Loss effectively mitigates the negative impact of imbalanced data distribution on model performance, achieving state-of-the-art results in sparse data regions. Furthermore, Dist Loss is easy to integrate, complementing existing methods.