IRFeb 12, 2018

Towards an Open Science Platform for the Evaluation of Data Fusion

Weinan Huang, Junyi Chen, Lei Meng, David Lillis

arXiv:1802.04068v11.7

Originality Synthesis-oriented

AI Analysis

This work tackles the problem of inconsistent evaluation in data fusion research for researchers in information retrieval and big data, but it is incremental as it builds on existing ideas for standardization.

The paper addresses the lack of a universally-accepted evaluation methodology for data fusion techniques, which hinders meaningful comparisons between algorithms, and proposes a centralized software platform with an early prototype to reduce the burden of re-implementing algorithms and encourage more comparable results.

Combining the results of different search engines in order to improve upon their performance has been the subject of many research papers. This has become known as the "Data Fusion" task, and has great promise in dealing with the vast quantity of unstructured textual data that is a feature of many Big Data scenarios. However, no universally-accepted evaluation methodology has emerged in the community. This makes it difficult to make meaningful comparisons between the various proposed techniques from reading the literature alone. Variations in the datasets, metrics, and baseline results have all contributed to this difficulty. This paper argues that a more unified approach is required, and that a centralised software platform should be developed to aid researchers in making comparisons between their algorithms and others. The desirable qualities of such a system have been identified and proposed, and an early prototype has been developed. Re-implementing algorithms published by other researchers is a great burden on those proposing new techniques. The prototype system has the potential to greatly reduce this burden and thus encourage more comparable results being generated and published more easily.

View on arXiv PDF

Similar