Features matching using natural language processing
This addresses the problem of efficient feature matching for data integration tasks, but it is incremental as it combines existing methods.
The paper tackles the problem of feature matching across datasets by proposing a hybrid model combining BERT and Jaccard similarity, which reduces the time required for matching compared to manual methods, though no concrete numbers are provided.
The feature matching is a basic step in matching different datasets. This article proposes shows a new hybrid model of a pretrained Natural Language Processing (NLP) based model called BERT used in parallel with a statistical model based on Jaccard similarity to measure the similarity between list of features from two different datasets. This reduces the time required to search for correlations or manually match each feature from one dataset to another.