DCMSNEMay 7, 2013

Somoclu: An Efficient Parallel Library for Self-Organizing Maps

arXiv:1305.1422v460 citations
Originality Synthesis-oriented
AI Analysis

This provides an efficient tool for researchers and practitioners working with large-scale data, such as in text mining, but it is incremental as it builds on existing parallelization techniques.

The authors tackled the problem of training self-organizing maps on large datasets by developing Somoclu, a massively parallel library in C++ that supports multicore, cluster, and GPU execution, resulting in highly optimized memory use and fast training even on a single computer.

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes