CLSep 19, 2014

Using crowdsourcing system for creating site-specific statistical machine translation engine

arXiv:1409.5502v11 citations

Originality Synthesis-oriented

AI Analysis

This addresses the need for domain-specific machine translation for site globalization, but it is incremental as it applies existing methods to new data sources.

The paper tackles the problem of creating site-specific statistical machine translation engines by leveraging crowdsourced translation data, resulting in a method to train and estimate such engines using a sentence-aligned corpus from narrow domain content.

A crowdsourcing translation approach is an effective tool for globalization of site content, but it is also an important source of parallel linguistic data. For the given site, processed with a crowdsourcing system, a sentence-aligned corpus can be fetched, which covers a very narrow domain of terminology and language patterns - a site-specific domain. These data can be used for training and estimation of site-specific statistical machine translation engine

View on arXiv PDF

Similar