DBCLLGAug 24, 2024

GNN: Graph Neural Network and Large Language Model for Data Discovery

arXiv:2408.13609v21 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses data discovery challenges for data scientists by incrementally improving on existing methods to include text understanding.

The paper tackles the problem of data discovery by extending previous methods to handle text values using graph neural networks and large language models, resulting in more reliable outcome predictions without requiring predefined utility functions or human input for attribute ranking.

Our algorithm GNN: Graph Neural Network and Large Language Model for Data Discovery inherit the benefits of \cite{hoang2024plod} (PLOD: Predictive Learning Optimal Data Discovery), \cite{Hoang2024BODBO} (BOD: Blindly Optimal Data Discovery) in terms of overcoming the challenges of having to predefine utility function and the human input for attribute ranking, which helps prevent the time-consuming loop process. In addition to these previous works, our algorithm GNN leverages the advantages of graph neural networks and large language models to understand text type values that cannot be understood by PLOD and MOD, thus making the task of predicting outcomes more reliable. GNN could be seen as an extension of PLOD in terms of understanding the text type value and the user's preferences, not only numerical values but also text values, making the promise of data science and analytics purposes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes