Bringing Emerging Architectures to Sequence Labeling in NLP
This work addresses the gap in applying novel architectures to sequence labeling in NLP, though it reveals limitations in generalization, making it incremental.
The paper investigated whether emerging architectures like xLSTMs and structured state-space models, which show promise in language modeling, can effectively adapt to diverse sequence labeling tasks across languages and structural complexities, finding that their strong performance in simpler settings often fails to generalize.
Pretrained Transformer encoders are the dominant approach to sequence labeling. While some alternative architectures-such as xLSTMs, structured state-space models, diffusion models, and adversarial learning-have shown promise in language modeling, few have been applied to sequence labeling, and mostly on flat or simplified tasks. We study how these architectures adapt across tagging tasks that vary in structural complexity, label space, and token dependencies, with evaluation spanning multiple languages. We find that the strong performance previously observed in simpler settings does not always generalize well across languages or datasets, nor does it extend to more complex structured tasks.