Humans, Machine Learning, and Language Models in Union: A Cognitive Study on Table Unionability
This work addresses data discovery challenges for data scientists by providing insights into human behavior and proposing a hybrid approach, though it is incremental in building on existing methods.
This research tackled the problem of table unionability in data discovery by studying human decision-making and developing a machine learning framework to enhance human performance, with a preliminary study showing that combining human and LLM approaches yields better results.
Data discovery and table unionability in particular became key tasks in modern Data Science. However, the human perspective for these tasks is still under-explored. Thus, this research investigates the human behavior in determining table unionability within data discovery. We have designed an experimental survey and conducted a comprehensive analysis, in which we assess human decision-making for table unionability. We use the observations from the analysis to develop a machine learning framework to boost the (raw) performance of humans. Furthermore, we perform a preliminary study on how LLM performance is compared to humans indicating that it is typically better to consider a combination of both. We believe that this work lays the foundations for developing future Human-in-the-Loop systems for efficient data discovery.