LG CLFeb 12

A$^{2}$V-SLP: Alignment-Aware Variational Modeling for Disentangled Sign Language Production

Sümeyye Meryem Taşyürek, Enis Mücahid İskender, Hacer Yalim Keles

arXiv:2602.11861v11.4h-index: 11

Originality Incremental advance

AI Analysis

This work addresses sign language production for accessibility applications, offering incremental improvements over existing disentanglement frameworks.

The paper tackled the problem of generating realistic sign language sequences from text by proposing A$^{2}$V-SLP, an alignment-aware variational framework that learns articulator-wise disentangled latent distributions, resulting in state-of-the-art back-translation performance and improved motion realism in a gloss-free setting.

Building upon recent structural disentanglement frameworks for sign language production, we propose A$^{2}$V-SLP, an alignment-aware variational framework that learns articulator-wise disentangled latent distributions rather than deterministic embeddings. A disentangled Variational Autoencoder (VAE) encodes ground-truth sign pose sequences and extracts articulator-specific mean and variance vectors, which are used as distributional supervision for training a non-autoregressive Transformer. Given text embeddings, the Transformer predicts both latent means and log-variances, while the VAE decoder reconstructs the final sign pose sequences through stochastic sampling at the decoding stage. This formulation maintains articulator-level representations by avoiding deterministic latent collapse through distributional latent modeling. In addition, we integrate a gloss attention mechanism to strengthen alignment between linguistic input and articulated motion. Experimental results show consistent gains over deterministic latent regression, achieving state-of-the-art back-translation performance and improved motion realism in a fully gloss-free setting.

View on arXiv PDF

Similar