Using crowdsourcing system for creating site-specific statistical machine translation engine
This addresses the need for domain-specific machine translation for site globalization, but it is incremental as it applies existing methods to new data sources.
The paper tackles the problem of creating site-specific statistical machine translation engines by leveraging crowdsourced translation data, resulting in a method to train and estimate such engines using a sentence-aligned corpus from narrow domain content.
A crowdsourcing translation approach is an effective tool for globalization of site content, but it is also an important source of parallel linguistic data. For the given site, processed with a crowdsourcing system, a sentence-aligned corpus can be fetched, which covers a very narrow domain of terminology and language patterns - a site-specific domain. These data can be used for training and estimation of site-specific statistical machine translation engine