Combining Global and Local Merges in Logic-based Entity Resolution
This work addresses entity resolution in databases, which is crucial for data integration and cleaning, but appears incremental as it builds on the existing Lace framework.
The paper tackles the problem of entity resolution by extending the Lace framework to support both global merges (equating all occurrences of entity references) and local merges (selectively equating specific occurrences of data values), and explores the computational properties of this combined approach.
In the recently proposed Lace framework for collective entity resolution, logical rules and constraints are used to identify pairs of entity references (e.g. author or paper ids) that denote the same entity. This identification is global: all occurrences of those entity references (possibly across multiple database tuples) are deemed equal and can be merged. By contrast, a local form of merge is often more natural when identifying pairs of data values, e.g. some occurrences of 'J. Smith' may be equated with 'Joe Smith', while others should merge with 'Jane Smith'. This motivates us to extend Lace with local merges of values and explore the computational properties of the resulting formalism.