IRJun 19, 2019
A survey of OpenRefine reconciliation servicesAntonin Delpeuch
We review the services implementing the OpenRefine reconciliation API, comparing their design to the state of the art in record linkage. Due to the design of the API, the matching scores returned by the services are of little help to guide matching decisions. This suggests possible improvements to the specifications of the API, which could improve user workflows by giving more control over the scoring mechanism to the client.
CLApr 19, 2019
OpenTapioca: Lightweight Entity Linking for WikidataAntonin Delpeuch
We propose a simple Named Entity Linking system that can be trained from Wikidata only. This demonstrates the strengths and weaknesses of this data source for this task and provides an easily reproducible baseline to compare other systems against. Our model is lightweight to train, to run and to keep synchronous with Wikidata in real time.
CTNov 14, 2014
Autonomization of Monoidal CategoriesAntonin Delpeuch
We show that contrary to common belief in the DisCoCat community, a monoidal category is all that is needed to define a categorical compositional model of natural language. This relies on a construction which freely adds adjoints to a monoidal category. In the case of distributional semantics, this broadens the range of available models, to include non-linear maps and cartesian products for instance. We illustrate the applications of this principle to various distributional models of meaning.
CLApr 13, 2014
Complexity of Grammar Induction for Quantum TypesAntonin Delpeuch
Most categorical models of meaning use a functor from the syntactic category to the semantic category. When semantic information is available, the problem of grammar induction can therefore be defined as finding preimages of the semantic types under this forgetful functor, lifting the information flow from the semantic level to a valid reduction at the syntactic level. We study the complexity of grammar induction, and show that for a variety of type systems, including pivotal and compact closed categories, the grammar induction problem is NP-complete. Our approach could be extended to linguistic type systems such as autonomous or bi-closed categories.