PyMarian: Fast Neural Machine Translation and Evaluation in Python
This provides a practical tool for machine translation researchers and practitioners by bridging speed and flexibility, though it is incremental as it builds on existing software.
The authors tackled the problem of integrating the fast C++-based Marian NMT toolkit with Python's extensive ecosystem, resulting in a Python interface that enables state-of-the-art COMET metric computation with a speedup of up to 7.8× over existing implementations.
The deep learning language of choice these days is Python; measured by factors such as available libraries and technical support, it is hard to beat. At the same time, software written in lower-level programming languages like C++ retain advantages in speed. We describe a Python interface to Marian NMT, a C++-based training and inference toolkit for sequence-to-sequence models, focusing on machine translation. This interface enables models trained with Marian to be connected to the rich, wide range of tools available in Python. A highlight of the interface is the ability to compute state-of-the-art COMET metrics from Python but using Marian's inference engine, with a speedup factor of up to 7.8$\times$ the existing implementations. We also briefly spotlight a number of other integrations, including Jupyter notebooks, connection with prebuilt models, and a web app interface provided with the package. PyMarian is available in PyPI via $\texttt{pip install pymarian}$.