CVApr 28, 2021

A Deep Learning Object Detection Method for an Efficient Clusters Initialization

arXiv:2104.13634v329 citations
AI Analysis

This addresses the initialization bottleneck in clustering for applications like banking and e-commerce, but it is incremental as it builds on existing deep learning and clustering methods.

The paper tackles the problem of clustering algorithm instability due to initialization parameters by applying the YOLO-v5 object detection model to detect initial clustering parameters like number of clusters and centroids, resulting in near-optimal initialization with low computational overhead compared to existing solutions.

Clustering is an unsupervised machine learning method grouping data samples into clusters of similar objects. In practice, clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines. However, the existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters (e.g. number of clusters, centroids). Different solutions were presented in the literature to overcome this limitation (i.e. internal and external validation metrics). However, these solutions require high computational complexity and memory consumption, especially when dealing with big data. In this paper, we apply the recent object detection Deep Learning (DL) model, named YOLO-v5, to detect the initial clustering parameters such as the number of clusters with their sizes and centroids. Mainly, the proposed solution consists of adding a DL-based initialization phase making the clustering algorithms free of initialization. Two model solutions are provided in this work, one for isolated clusters and the other one for overlapping clusters. The features of the incoming dataset determine which model to use. Moreover, The results show that the proposed solution can provide near-optimal clusters initialization parameters with low computational and resources overhead compared to existing solutions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes