Coordinate System Selection for Minimum Error Rate Training in Statistical Machine Translation
This is an incremental improvement for statistical machine translation researchers and practitioners, addressing a known bottleneck in training methods.
The paper tackles the problem of local optima and misaligned feature weights in Minimum Error Rate Training (MERT) for statistical machine translation by introducing coordinate system selection (RSS) to modify search directions, resulting in improved translation performance without additional language knowledge.
Minimum error rate training (MERT) is a widely used training procedure for statistical machine translation. A general problem of this approach is that the search space is easy to converge to a local optimum and the acquired weight set is not in accord with the real distribution of feature functions. This paper introduces coordinate system selection (RSS) into the search algorithm for MERT. Contrary to previous approaches in which every dimension only corresponds to one independent feature function, we create several coordinate systems by moving one of the dimensions to a new direction. The basic idea is quite simple but critical that the training procedure of MERT should be based on a coordinate system formed by search directions but not directly on feature functions. Experiments show that by selecting coordinate systems with tuning set results, better results can be obtained without any other language knowledge.