CL CVSep 5, 2025

Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization

arXiv:2509.04745v14.91 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses the need for models that generalize to unseen signs in sign language datasets, which are often not representative, by incorporating linguistically-motivated biases to improve representation learning.

The paper tackled the problem of improving out-of-vocabulary generalization in isolated sign language recognition by introducing phonological inductive biases into a vector-quantized autoencoder, resulting in more effective one-shot reconstruction of unseen signs and better discriminative performance for sign identification compared to a baseline.

Sign language datasets are often not representative in terms of vocabulary, underscoring the need for models that generalize to unseen signs. Vector quantization is a promising approach for learning discrete, token-like representations, but it has not been evaluated whether the learned units capture spurious correlations that hinder out-of-vocabulary performance. This work investigates two phonological inductive biases: Parameter Disentanglement, an architectural bias, and Phonological Semi-Supervision, a regularization technique, to improve isolated sign recognition of known signs and reconstruction quality of unseen signs with a vector-quantized autoencoder. The primary finding is that the learned representations from the proposed model are more effective for one-shot reconstruction of unseen signs and more discriminative for sign identification compared to a controlled baseline. This work provides a quantitative analysis of how explicit, linguistically-motivated biases can improve the generalization of learned representations of sign language.

View on arXiv PDF

Similar