CLSEJun 6, 2020

StackOverflow vs Kaggle: A Study of Developer Discussions About Data Science

arXiv:2006.08334v18 citations
Originality Synthesis-oriented
AI Analysis

This study helps educators and researchers tailor data science communication for different developer communities, though it is incremental as it applies existing methods to new data.

The paper analyzed 197,836 posts from StackOverflow and Kaggle to study data science discussions, finding that TensorFlow topics were most prevalent on StackOverflow while meta discussions dominated Kaggle, with DS discussions increasing rapidly overall.

Software developers are increasingly required to understand fundamental Data science (DS) concepts. Recently, the presence of machine learning (ML) and deep learning (DL) has dramatically increased in the development of user applications, whether they are leveraged through frameworks or implemented from scratch. These topics attract much discussion on online platforms. This paper conducts large-scale qualitative and quantitative experiments to study the characteristics of 197836 posts from StackOverflow and Kaggle. Latent Dirichlet Allocation topic modelling is used to extract twenty-four DS discussion topics. The main findings include that TensorFlow-related topics were most prevalent in StackOverflow, while meta discussion topics were the prevalent ones on Kaggle. StackOverflow tends to include lower-level troubleshooting, while Kaggle focuses on practicality and optimising leaderboard performance. In addition, across both communities, DS discussion is increasing at a dramatic rate. While TensorFlow discussion on StackOverflow is slowing, interest in Keras is rising. Finally, ensemble algorithms are the most mentioned ML/DL algorithms in Kaggle but are rarely discussed on StackOverflow. These findings can help educators and researchers to more effectively tailor and prioritise efforts in researching and communicating DS concepts towards different developer communities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes