NASA Science Mission Directorate Knowledge Graph Discovery
This work addresses the problem of time-consuming data discovery for NASA researchers, but it is incremental as it applies existing NLP methods to a new domain-specific dataset.
The paper tackles the challenge of discovering connections in NASA's growing Science Mission Directorate data by proposing a pipeline to generate knowledge graphs from textual data using NLP methods, which can serve as a basis for dataset search engines to save researchers time and support new connections.
The size of the National Aeronautics and Space Administration (NASA) Science Mission Directorate (SMD) is growing exponentially, allowing researchers to make discoveries. However, making discoveries is challenging and time-consuming due to the size of the data catalogs, and as many concepts and data are indirectly connected. This paper proposes a pipeline to generate knowledge graphs (KGs) representing different NASA SMD domains. These KGs can be used as the basis for dataset search engines, saving researchers time and supporting them in finding new connections. We collected textual data and used several modern natural language processing (NLP) methods to create the nodes and the edges of the KGs. We explore the cross-domain connections, discuss our challenges, and provide future directions to inspire researchers working on similar challenges.