CLAIJan 23, 2023

Efficient Encoders for Streaming Sequence Tagging

arXiv:2301.09244v2268 citationsh-index: 37
Originality Incremental advance
AI Analysis

This addresses computational and performance issues for real-time applications like speech transcription, though it is incremental as it builds on existing encoder methods.

The paper tackled the inefficiency of applying bidirectional encoders to streaming sequence tagging by proposing HEAR, which reduces FLOPs by up to 71.1% and improves streaming exact match by up to +10%.

A naive application of state-of-the-art bidirectional encoders for streaming sequence tagging would require encoding each token from scratch for each new token in an incremental streaming input (like transcribed speech). The lack of re-usability of previous computation leads to a higher number of Floating Point Operations (or FLOPs) and higher number of unnecessary label flips. Increased FLOPs consequently lead to higher wall-clock time and increased label flipping leads to poorer streaming performance. In this work, we present a Hybrid Encoder with Adaptive Restart (HEAR) that addresses these issues while maintaining the performance of bidirectional encoders over the offline (or complete) inputs while improving performance on streaming (or incomplete) inputs. HEAR has a Hybrid unidirectional-bidirectional encoder architecture to perform sequence tagging, along with an Adaptive Restart Module (ARM) to selectively guide the restart of bidirectional portion of the encoder. Across four sequence tagging tasks, HEAR offers FLOP savings in streaming settings upto 71.1% and also outperforms bidirectional encoders for streaming predictions by upto +10% streaming exact match.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes