A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing
This work provides incremental insights into phrase table smoothing for machine translation, showing that vocabulary reduction is not critical for performance but helps with scalability.
The study analyzed how vocabulary reduction affects phrase translation model smoothing, finding that vocabulary choice does not significantly impact smoothing performance, indicating high sparsity in standard models, with vocabulary reduction being more effective for large-scale phrase tables.
This work systematically analyzes the smoothing effect of vocabulary reduction for phrase translation models. We extensively compare various word-level vocabularies to show that the performance of smoothing is not significantly affected by the choice of vocabulary. This result provides empirical evidence that the standard phrase translation model is extremely sparse. Our experiments also reveal that vocabulary reduction is more effective for smoothing large-scale phrase tables.