Generalizing k-means for an arbitrary distance matrix
This work addresses a limitation in clustering for researchers and practitioners dealing with non-Euclidean or abstract data, though it appears incremental as it builds on existing relational and fuzzy clustering ideas.
The paper tackles the problem of applying k-means clustering when only a distance matrix is available and data points lack a Euclidean representation, proposing a generalization called relational k-means to handle such scenarios.
The original k-means clustering method works only if the exact vectors representing the data points are known. Therefore calculating the distances from the centroids needs vector operations, since the average of abstract data points is undefined. Existing algorithms can be extended for those cases when the sole input is the distance matrix, and the exact representing vectors are unknown. This extension may be named relational k-means after a notation for a similar algorithm invented for fuzzy clustering. A method is then proposed for generalizing k-means for scenarios when the data points have absolutely no connection with a Euclidean space.