CLHCLGSDASFeb 26, 2023

Efficient Ensemble for Multimodal Punctuation Restoration using Time-Delay Neural Network

arXiv:2302.13376v21 citationsh-index: 18Has Code
Originality Highly original
AI Analysis

This work addresses the need for efficient punctuation restoration models in speech recognition post-processing, offering a computationally efficient solution with specific performance gains.

The paper tackles the problem of punctuation restoration in automatic speech recognition by proposing EfficientPunct, an ensemble method with a multimodal time-delay neural network that achieves a 1.0 F1 point improvement over the current best model while using less than a tenth of its inference network parameters.

Punctuation restoration plays an essential role in the post-processing procedure of automatic speech recognition, but model efficiency is a key requirement for this task. To that end, we present EfficientPunct, an ensemble method with a multimodal time-delay neural network that outperforms the current best model by 1.0 F1 points, using less than a tenth of its inference network parameters. We streamline a speech recognizer to efficiently output hidden layer acoustic embeddings for punctuation restoration, as well as BERT to extract meaningful text embeddings. By using forced alignment and temporal convolutions, we eliminate the need for attention-based fusion, greatly increasing computational efficiency and raising performance. EfficientPunct sets a new state of the art with an ensemble that weights BERT's purely language-based predictions slightly more than the multimodal network's predictions. Our code is available at https://github.com/lxy-peter/EfficientPunct.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes