CLJul 6, 2018

A Concept Specification and Abstraction-based Semantic Representation: Addressing the Barriers to Rule-based Machine Translation

arXiv:1807.02226v30.2

Originality Incremental advance

AI Analysis

This addresses the problem of data efficiency in machine translation for minority languages, though it appears incremental as it builds on existing rule-based approaches.

The authors tackled the labor-intensive nature of rule-based machine translation by proposing a semantic representation that treats meaning as concepts in a network, aiming to reduce training time and handle language variety. The result is a framework that supports learning rules from data, making it more efficient for minority languages.

Rule-based machine translation is more data efficient than the big data-based machine translation approaches, making it appropriate for languages with low bilingual corpus resources -- i.e., minority languages. However, the rule-based approach has declined in popularity relative to its big data cousins primarily because of the extensive training and labour required to define the language rules. To address this, we present a semantic representation that 1) treats all bits of meaning as individual concepts that 2) modify or further specify one another to build a network that relates entities in space and time. Also, the representation can 3) encapsulate propositions and thereby define concepts in terms of other concepts, supporting the abstraction of underlying linguistic and ontological details. These features afford an exact, yet intuitive semantic representation aimed at handling the great variety in language and reducing labour and training time. The proposed natural language generation, parsing, and translation strategies are also amenable to probabilistic modeling and thus to learning the necessary rules from example data.

View on arXiv PDF

Similar