CLLGFeb 8, 2017

Character-level Deep Conflation for Business Data Analytics

arXiv:1702.02640v114 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of merging tables in databases for business analytics, but it is incremental as it applies existing deep learning architectures to a specific domain problem.

The paper tackles the problem of entity conflation in business data analytics by developing a character-level deep learning model to match text strings with semantic understanding, achieving significant performance improvements over a baseline bag-of-character model on a real-world dataset.

Connecting different text attributes associated with the same entity (conflation) is important in business data analytics since it could help merge two different tables in a database to provide a more comprehensive profile of an entity. However, the conflation task is challenging because two text strings that describe the same entity could be quite different from each other for reasons such as misspelling. It is therefore critical to develop a conflation model that is able to truly understand the semantic meaning of the strings and match them at the semantic level. To this end, we develop a character-level deep conflation model that encodes the input text strings from character level into finite dimension feature vectors, which are then used to compute the cosine similarity between the text strings. The model is trained in an end-to-end manner using back propagation and stochastic gradient descent to maximize the likelihood of the correct association. Specifically, we propose two variants of the deep conflation model, based on long-short-term memory (LSTM) recurrent neural network (RNN) and convolutional neural network (CNN), respectively. Both models perform well on a real-world business analytics dataset and significantly outperform the baseline bag-of-character (BoC) model.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes