IRDCNov 11, 2017

A distributed system for SearchOnMath based on the Microsoft BizSpark program

arXiv:1711.04189v113 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient distributed systems to support large-scale mathematical information retrieval, but it is incremental as it focuses on optimizing existing infrastructure.

The study tackled the problem of scaling mathematical formula search by evaluating 38 distributed system scenarios under the Microsoft BizSpark program to find the best response times for searching 120 formulas in SearchOnMath databases, achieving optimized performance.

Mathematical information retrieval is a relatively new area, so the first search tools capable of retrieving mathematical formulas began to appear only a few years ago. The proposals made public so far mostly implement searches on internal university databases, small sets of scientific papers, or Wikipedia in English. As such, only modest computing power is required. In this context, SearchOnMath has emerged as a pioneering tool in that it indexes several different databases and is compatible with several mathematical representation languages. Given the significantly greater number of formulas it handles, a distributed system becomes necessary to support it. The present study is based on the Microsoft BizSpark program and has aimed, for 38 different distributed-system scenarios, to pinpoint the one affording the best response times when searching the SearchOnMath databases for a collection of 120 formulas.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes