Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space
This work addresses a computational bottleneck for machine learning practitioners needing fast diameter approximations, but it is incremental as it applies known greedy methods to a specific problem.
The paper tackled the problem of efficiently approximating the diameter of datasets in multidimensional Euclidean space, which scales poorly with dimension in existing algorithms, by implementing four greedy algorithms that achieve near-linear time complexity and prove efficient in experiments on machine learning datasets.
Finding the diameter of a dataset in multidimensional Euclidean space is a well-established problem, with well-known algorithms. However, most of the algorithms found in the literature do not scale well with large values of data dimension, so the time complexity grows exponentially in most cases, which makes these algorithms impractical. Therefore, we implemented 4 simple greedy algorithms to be used for approximating the diameter of a multidimensional dataset; these are based on minimum/maximum l2 norms, hill climbing search, Tabu search and Beam search approaches, respectively. The time complexity of the implemented algorithms is near-linear, as they scale near-linearly with data size and its dimensions. The results of the experiments (conducted on different machine learning data sets) prove the efficiency of the implemented algorithms and can therefore be recommended for finding the diameter to be used by different machine learning applications when needed.