LGAIDBQMCPSep 10, 2023

A compendium of data sources for data science, machine learning, and artificial intelligence

arXiv:2309.05682v1h-index: 3
Originality Synthesis-oriented
AI Analysis

This provides a useful resource for data scientists and ML experts, but it is incremental as it compiles existing information without new methods or data.

The paper tackles the growing demand for data in data science, machine learning, and AI by compiling a comprehensive list of data sources across various application areas, such as finance, life sciences, and social media, to benefit practitioners.

Recent advances in data science, machine learning, and artificial intelligence, such as the emergence of large language models, are leading to an increasing demand for data that can be processed by such models. While data sources are application-specific, and it is impossible to produce an exhaustive list of such data sources, it seems that a comprehensive, rather than complete, list would still benefit data scientists and machine learning experts of all levels of seniority. The goal of this publication is to provide just such an (inevitably incomplete) list -- or compendium -- of data sources across multiple areas of applications, including finance and economics, legal (laws and regulations), life sciences (medicine and drug discovery), news sentiment and social media, retail and ecommerce, satellite imagery, and shipping and logistics, and sports.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes