CLMay 22, 2023

Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition

arXiv:2305.12676v31 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speech recognition accuracy by applying ELMs with updated techniques, but it is incremental as it builds on existing ELM concepts with newer models.

The paper tackled the problem of improving speech recognition by exploring energy-based language models (ELMs) with modern architectures and training methods, using large pretrained models as backbones, and found that these approaches enhanced rescoring capabilities.

Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs). As an important application, ELMs have been successfully used as a means for calculating sentence scores in speech recognition, but they all use less-modern CNN or LSTM networks. The recent progress in Transformer networks and large pretrained models such as BERT and GPT2 opens new possibility to further advancing ELMs. In this paper, we explore different architectures of energy functions and different training methods to investigate the capabilities of ELMs in rescoring for speech recognition, all using large pretrained models as backbones.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes