CLAIAug 21, 2023

Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space

arXiv:2308.10464v14 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of efficient and accurate topic segmentation for downstream language tasks like summarization, though it is incremental as it applies an existing HDC method to a new domain.

The paper tackles unsupervised dialogue topic segmentation by introducing HyperSeg, a hyperdimensional computing approach, which outperforms the state-of-the-art in 4 out of 5 benchmarks and is 10 times faster on average.

We present HyperSeg, a hyperdimensional computing (HDC) approach to unsupervised dialogue topic segmentation. HDC is a class of vector symbolic architectures that leverages the probabilistic orthogonality of randomly drawn vectors at extremely high dimensions (typically over 10,000). HDC generates rich token representations through its low-cost initialization of many unrelated vectors. This is especially beneficial in topic segmentation, which often operates as a resource-constrained pre-processing step for downstream transcript understanding tasks. HyperSeg outperforms the current state-of-the-art in 4 out of 5 segmentation benchmarks -- even when baselines are given partial access to the ground truth -- and is 10 times faster on average. We show that HyperSeg also improves downstream summarization accuracy. With HyperSeg, we demonstrate the viability of HDC in a major language task. We open-source HyperSeg to provide a strong baseline for unsupervised topic segmentation.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes