CL IRJan 9, 2025

Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility

arXiv:2501.05606v15 citationsh-index: 12024 5th International Conference on Computers and Artificial Intelligence Technology (CAIT)

Originality Synthesis-oriented

AI Analysis

This work addresses metadata harmonization for language resource users, but it is incremental as it builds on existing standards and techniques.

The paper tackled the problem of harmonizing metadata from diverse language resource repositories by integrating data into a unified model using linked data and RDF techniques, resulting in the development of Linghub, a portal that successfully addressed many user requests from real queries, though with some limitations.

This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs). Leveraging linked data and RDF techniques, we integrate data from multiple sources into a unified model based on DCAT and META-SHARE OWL ontology. Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal. Real user queries from the Corpora Mailing List (CML) were evaluated to assess Linghub capability to satisfy actual user needs. Results indicate that while some limitations persist, many user requests can be successfully addressed. The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization. This initial research underscores the importance of API-based access to LRs, promoting machine usability and data subset extraction for specific purposes, paving the way for more efficient and standardized LR utilization.

View on arXiv PDF

Similar