CLSep 1, 2025

Parallel Needleman-Wunsch on CUDA to measure word similarity based on phonetic transcriptions

arXiv:2509.01654v1
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for computational linguistics, enabling faster phonetic analysis of languages.

The paper tackled the problem of measuring word similarity based on phonetic transcriptions by implementing a parallelized Needleman-Wunsch algorithm on CPU and GPU, achieving significant performance improvements for large datasets.

We present a method to calculate the similarity between words based on their phonetic transcription (their pronunciation) using the Needleman-Wunsch algorithm. We implement this algorithm in Rust and parallelize it on both CPU and GPU to handle large datasets efficiently. The GPU implementation leverages CUDA and the cudarc Rust library to achieve significant performance improvements. We validate our approach by constructing a fully-connected graph where nodes represent words and edges have weights according to the similarity between the words. This graph is then analyzed using clustering algorithms to identify groups of phonetically similar words. Our results demonstrate the feasibility and effectiveness of the proposed method in analyzing the phonetic structure of languages. It might be easily expanded to other languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes