Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding
This addresses inefficiencies in ML-Ops for data engineers and scientists, though it appears incremental as it builds on existing entity resolution and embedding methods.
The paper tackles the problem of manual metadata harmonization in ML data curation by automating the process using entity resolution and contextual embeddings, achieving accurate mapping of source schemas to standardized schemas and inferring ontological structures.
ML Data Curation process typically consist of heterogeneous & federated source systems with varied schema structures; requiring curation process to standardize metadata from different schemas to an inter-operable schema. This manual process of Metadata Harmonization & cataloging slows efficiency of ML-Ops lifecycle. We demonstrate automation of this step with the help of entity resolution methods & also by using Cogntive Database's Db2Vec embedding approach to capture hidden inter-column & intra-column relationships which detect similarity of metadata and then predict metadata columns from source schemas to any standardized schemas. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.