Randolph Wiredu-Aidoo

2papers

2 Papers

8.5MLMar 24
The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Randolph Wiredu-Aidoo

In clustering, strong dominance in the size of a particular cluster is often undesirable, motivating a measure of cluster size uniformity that can be used to filter such partitions. A basic requirement of such a measure is stability: partitions that differ only slightly in their point assignments should receive similar uniformity scores. A difficulty arises because cluster labels are not fixed objects; algorithms may produce different numbers of labels even when the underlying point distribution changes very little. Measures defined directly over labels can therefore become unstable under label-count perturbations. I introduce the Mass Agreement Score (MAS), a point-centric metric bounded in [0, 1] that evaluates the consistency of expected cluster size as measured from the perspective of points in each cluster. Its construction yields fragment robustness by design, assigning similar scores to partitions with similar bulk structure while remaining sensitive to genuine redistribution of cluster mass.

LGOct 29, 2025
EVINGCA: Adaptive Graph Clustering with Evolving Neighborhood Statistics

Randolph Wiredu-Aidoo

Clustering algorithms often rely on restrictive assumptions: K-Means and Gaussian Mixtures presuppose convex, Gaussian-like clusters, while DBSCAN and HDBSCAN capture non-convexity but can be highly sensitive. I introduce EVINGCA (Evolving Variance-Informed Nonparametric Graph Construction Algorithm), a density-variance based clustering algorithm that treats cluster formation as an adaptive, evolving process on a nearest-neighbor graph. EVINGCA expands rooted graphs via breadth-first search, guided by continuously updated local distance and shape statistics, replacing fixed density thresholds with local statistical feedback. With spatial indexing, EVINGCA features log-linear complexity in the average case and exhibits competitive performance against baselines across a variety of synthetic, real-world, low-d, and high-d datasets.