$\text{O}^2$PF: Oversampling via Optimum-Path Forest for Breast Cancer Detection
This work addresses imbalanced data in medical diagnosis, specifically for breast cancer detection, but appears incremental as it builds on existing oversampling techniques.
The paper tackles the problem of imbalanced medical datasets in breast cancer detection by proposing O2PF, an oversampling method based on the Optimum-Path Forest Algorithm, which achieved robust performance compared to three established methods across six datasets.
Breast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the $\text{O}^2$PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets.