CLAug 2, 2018

OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages

arXiv:1808.00694v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses semantic understanding for Indian languages, but it is incremental as it extends existing ontological approaches to new languages and data.

The paper developed a formal ontology-based framework to capture intrinsic and extrinsic meanings of words for Hindi and Telugu, resulting in a manually annotated gold-standard lexical resource that was used to analyze verb sense distributions and enrich word embeddings.

Following approaches for understanding lexical meaning developed by Yaska, Patanjali and Bhartrihari from Indian linguistic traditions and extending approaches developed by Leibniz and Brentano in the modern times, a framework of formal ontology of language was developed. This framework proposes that meaning of words are in-formed by intrinsic and extrinsic ontological structures. The paper aims to capture such intrinsic and extrinsic meanings of words for two major Indian languages, namely, Hindi and Telugu. Parts-of-speech have been rendered into sense-types and sense-classes. Using them we have developed a gold- standard annotated lexical resource to support semantic understanding of a language. The resource has collection of Hindi and Telugu lexicons, which has been manually annotated by native speakers of the languages following our annotation guidelines. Further, the resource was utilised to derive adverbial sense-class distribution of verbs and karaka-verb sense- type distribution. Different corpora (news, novels) were compared using verb sense-types distribution. Word Embedding was used as an aid for the enrichment of the resource. This is a work in progress that aims at lexical coverage of language extensively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes