LGDBMEOct 13, 2023

A Survey of Methods for Handling Disk Data Imbalance

arXiv:2310.08867v1h-index: 7
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview for researchers working on imbalanced data classification, but it is incremental as it surveys existing methods without introducing new ones.

This paper surveys methods for handling class imbalance in classification problems, using the Backblaze hard disk dataset as an example of severe imbalance, and organizes the discussion into data-level, algorithmic-level, and hybrid approaches to help researchers select appropriate techniques.

Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes