Contrastive Feedback Mechanism for Simultaneous Speech Translation
This work addresses a specific bottleneck in simultaneous speech translation for real-time applications, offering an incremental improvement over existing decision policies.
The paper tackles the problem of unstable predictions in simultaneous speech translation by introducing a contrastive feedback mechanism that uses these predictions as feedback to improve translation quality, achieving performance gains across 8 languages in the MuST-C v1.0 dataset.
Recent advances in simultaneous speech translation (SST) focus on the decision policies that enable the use of offline-trained ST models for simultaneous inference. These decision policies not only control the quality-latency trade-off in SST but also mitigate the impact of unstable predictions on translation quality by delaying translation for more context or discarding these predictions through stable hypothesis detection. However, these policies often overlook the potential benefits of utilizing unstable predictions. We introduce the contrastive feedback mechanism (CFM) for SST, a novel method that leverages these unstable predictions as feedback to improve translation quality. CFM guides the system to eliminate undesired model behaviors from these predictions through a contrastive objective. The experiments on 3 state-of-the-art decision policies across 8 languages in the MuST-C v1.0 dataset show that CFM effectively improves the performance of SST.