LGDATA-ANSep 18, 2025

Stochastic Clock Attention for Aligning Continuous and Ordered Sequences

arXiv:2509.14678v1h-index: 1
Originality Highly original
AI Analysis

This addresses the need for better alignment models in sequence-to-sequence tasks like text-to-speech, offering a drop-in replacement that enhances performance for frame-synchronous targets.

The paper tackled the problem of aligning continuous and ordered sequences in attention mechanisms, which standard methods fail to enforce continuity or monotonicity for; the result was a novel attention framework using learned clocks that improved alignment stability and robustness to time-scaling, matching or improving accuracy in a Transformer text-to-speech testbed.

We formulate an attention mechanism for continuous and ordered sequences that explicitly functions as an alignment model, which serves as the core of many sequence-to-sequence tasks. Standard scaled dot-product attention relies on positional encodings and masks but does not enforce continuity or monotonicity, which are crucial for frame-synchronous targets. We propose learned nonnegative \emph{clocks} to source and target and model attention as the meeting probability of these clocks; a path-integral derivation yields a closed-form, Gaussian-like scoring rule with an intrinsic bias toward causal, smooth, near-diagonal alignments, without external positional regularizers. The framework supports two complementary regimes: normalized clocks for parallel decoding when a global length is available, and unnormalized clocks for autoregressive decoding -- both nearly-parameter-free, drop-in replacements. In a Transformer text-to-speech testbed, this construction produces more stable alignments and improved robustness to global time-scaling while matching or improving accuracy over scaled dot-product baselines. We hypothesize applicability to other continuous targets, including video and temporal signal modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes