Adaptive and Multi-Source Entity Matching for Name Standardization of Astronomical Observation Facilities
This work addresses the need for consistent naming in astronomy data integration, though it appears incremental as it builds on existing methods for entity matching.
The paper tackles the problem of standardizing names for astronomical observation facilities by developing a multi-source entity matching methodology that uses adaptable criteria, NLP techniques, and LLM validation to generate synonym sets, resulting in a mapping integrated into IVOA Vocabularies and OntoPortal-Astro for use in a Name Resolver API.
This ongoing work focuses on the development of a methodology for generating a multi-source mapping of astronomical observation facilities. To compare two entities, we compute scores with adaptable criteria and Natural Language Processing (NLP) techniques (Bag-of-Words approaches, sequential approaches, and surface approaches) to map entities extracted from eight semantic artifacts, including Wikidata and astronomy-oriented resources. We utilize every property available, such as labels, definitions, descriptions, external identifiers, and more domain-specific properties, such as the observation wavebands, spacecraft launch dates, funding agencies, etc. Finally, we use a Large Language Model (LLM) to accept or reject a mapping suggestion and provide a justification, ensuring the plausibility and FAIRness of the validated synonym pairs. The resulting mapping is composed of multi-source synonym sets providing only one standardized label per entity. Those mappings will be used to feed our Name Resolver API and will be integrated into the International Virtual Observatory Alliance (IVOA) Vocabularies and the OntoPortal-Astro platform.