Auctus: A Dataset Search Engine for Data Augmentation
This work addresses the difficulty of discovering structured data for researchers, analysts, and businesses, but it is incremental as it builds on existing search engine concepts for a specific data type.
The paper tackles the problem of finding relevant structured data for scientific, societal, and business applications by introducing Auctus, a dataset search engine that addresses challenges in structured data discovery. The result is a system that enables users to explore datasets through rich queries and supports data augmentation to improve machine learning models and enrich analytics, as demonstrated in case studies.
The large volumes of structured data currently available, from Web tables to open-data portals and enterprise data, open up new opportunities for progress in answering many important scientific, societal, and business questions. However, finding relevant data is difficult. While search engines have addressed this problem for Web documents, there are many new challenges involved in supporting the discovery of structured data. We demonstrate how the Auctus dataset search engine addresses some of these challenges. We describe the system architecture and how users can explore datasets through a rich set of queries. We also present case studies which show how Auctus supports data augmentation to improve machine learning models as well as to enrich analytics.