Nadav Har-Tuv

35.5LGMay 9

Contextual Plackett-Luce: An Efficient Neural Model for Probabilistic Sequence Selection under Ambiguity

Noam Mizrachi, Nadav Har-Tuv, Shai Shalev-Shwartz

Selecting a coherent sequence or subset of elements is a fundamental problem in structured prediction, arising in tasks such as detection, trajectory forecasting, and representative subset selection. In many such settings, the target is inherently ambiguous: each input admits multiple valid outputs, while supervision provides only a single sampled instance. This induces a mismatch between the underlying multi-modal target distribution and the observed training signal. We propose Contextual Plackett-Luce (CPL), a structured probabilistic model for sequence selection that extends the classical Plackett-Luce model to a context-dependent setting following an Ising-style parameterization with unary and pairwise interaction terms. CPL can be viewed as a hybrid between fully autoregressive prediction and parallel sequence selection: autoregressive models effectively capture uncertainty but are computationally expensive on modern parallel hardware such as GPUs, while parallel methods are efficient but struggle to represent multi-modal dependencies. CPL combines the strengths of both by constructing the parameters of a probabilistic selection model in a fully parallel manner, followed by a lightweight autoregressive selection process in which each step applies incremental updates to contextual logits. This decoupling of parallel scoring and sequential selection enables efficient computation without sacrificing expressivity. We evaluate CPL on two structured selection tasks: multi-modal path prediction and representative subset selection. CPL achieves improved structural consistency and robustness under ambiguous supervision compared to strong parallel baselines.

SDMay 20, 2025

PAST: Phonetic-Acoustic Speech Tokenizer

Nadav Har-Tuv, Or Tal, Yossi Adi · meta-ai

We present PAST, a novel end-to-end framework that jointly models phonetic information alongside signal reconstruction, eliminating the need for external pretrained models. Unlike previous approaches that rely on pretrained self-supervised models, PAST employs supervised phonetic data, directly integrating domain knowledge into the tokenization process via auxiliary tasks. Additionally, we introduce a streamable, causal variant of PAST, enabling real-time speech applications. Results demonstrate that PAST surpasses existing evaluated baseline tokenizers across common evaluation metrics, including phonetic representation and speech reconstruction. Notably, PAST also achieves superior performance when serving as a speech representation for speech language models, further highlighting its effectiveness as a foundation for spoken language generation. To foster further research, we release the full implementation. For code, model checkpoints, and samples see: https://pages.cs.huji.ac.il/adiyoss-lab/PAST

Nadav Har-Tuv

2 Papers