NNK-Means: Data summarization using dictionary learning with non-negative kernel regression
This addresses the challenge of data summarization in complex, high-parameter systems, offering a scalable solution with runtime similar to kMeans, though it appears incremental as it builds on existing dictionary learning and kernel methods.
The paper tackles data summarization for large-scale systems by proposing NNK-Means, a dictionary learning method that uses non-negative kernel regression to learn geometric dictionaries representative of the input data, resulting in better class separation compared to linear and kernel versions of kMeans and kSVD.
An increasing number of systems are being designed by gathering significant amounts of data and then optimizing the system parameters directly using the obtained data. Often this is done without analyzing the dataset structure. As task complexity, data size, and parameters all increase to millions or even billions, data summarization is becoming a major challenge. In this work, we investigate data summarization via dictionary learning~(DL), leveraging the properties of recently introduced non-negative kernel regression (NNK) graphs. Our proposed NNK-Means, unlike previous DL techniques, such as kSVD, learns geometric dictionaries with atoms that are representative of the input data space. Experiments show that summarization using NNK-Means can provide better class separation compared to linear and kernel versions of kMeans and kSVD. Moreover, NNK-Means is scalable, with runtime complexity similar to that of kMeans.