Christian Boitet

2papers

2 Papers

AIJan 13, 2022
Transforming UNL graphs in OWL representations

David Rouquet, Valérie Bellynck, Christian Boitet et al.

Extracting formal knowledge (ontologies) from natural language is a challenge that can benefit from a (semi-) formal linguistic representation of texts, at the semantic level. We propose to achieve such a representation by implementing the Universal Networking Language (UNL) specifications on top of RDF. Thus, the meaning of a statement in any language will be soundly expressed as a RDF-UNL graph that constitutes a middle ground between natural language and formal knowledge. In particular, we show that RDF-UNL graphs can support content extraction using generic SHACL rules and that reasoning on the extracted facts allows detecting incoherence in the original texts. This approach is experimented in the UNseL project that aims at extracting ontological representations from system requirements/specifications in order to check that they are consistent, complete and unambiguous. Our RDF-UNL implementation and all code for the working examples of this paper are publicly available under the CeCILL-B license at https://gitlab.tetras-libre.fr/unl/rdf-unl

CLFeb 21, 2019
Development of a classifiers/quantifiers dictionary towards French-Japanese MT

Mutsuko Tomokiyo, Mathieu Mangeot, Christian Boitet

Although classifiers/quantifiers (CQs) expressions appear frequently in everyday communications or written documents, they are described neither in classical bilingual paper dictionaries , nor in machine-readable dictionaries. The paper describes a CQs dictionary, edited from the corpus we have annotated, and its usage in the framework of French-Japanese machine translation (MT). CQs treatment in MT often causes problems of lexical ambiguity, polylexical phrase recognition difficulties in analysis and doubtful output in transfer-generation, in particular for distant languages pairs like French and Japanese. Our basic treatment of CQs is to annotate the corpus by UNL-UWs (Universal Networking Language-Universal words) 1 , and then to produce a bilingual or multilingual dictionary of CQs, based on synonymy through identity of UWs.