Stephan Diehl

h-index32

7papers

441citations

Novelty14%

AI Score17

Ranked #191,073 of 194,257 authors (top 98%)#2,726 in SE (top 90%)

7 Papers

15.5SESep 8, 2018

SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets

Sebastian Baltes, Christoph Treude, Stephan Diehl

Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Like other software artifacts, code on SO evolves over time, for example when bugs are fixed or APIs are updated to the most recent version. To be able to analyze how code and the surrounding text on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text and code blocks. It connects code snippets from SO posts to other platforms by aggregating URLs from surrounding text blocks and comments, and by collecting references from GitHub files to SO posts. Our vision is that researchers will use SOTorrent to investigate and understand the evolution and maintenance of code on SO and its relation to other platforms such as GitHub.

17.2HCSep 2, 2018

Exploring the Limits of Complexity: A Survey of Empirical Studies on Graph Visualisation

Vahan Yoghourdjian, Daniel Archambault, Stephan Diehl et al.

For decades, researchers in information visualisation and graph drawing have focused on developing techniques for the layout and display of very large and complex networks. Experiments involving human participants have also explored the readability of different styles of layout and representations for such networks. In both bodies of literature, networks are frequently referred to as being 'large' or 'complex', yet these terms are relative. From a human-centred, experiment point-of-view, what constitutes 'large' (for example) depends on several factors, such as data complexity, visual complexity, and the technology used. In this paper, we survey the literature on human-centred experiments to understand how, in practice, different features and characteristics of node-link diagrams affect visual complexity.

26.0SEFeb 8, 2018

Usage and Attribution of Stack Overflow Code Snippets in GitHub Projects

Sebastian Baltes, Stephan Diehl

Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Using those snippets raises maintenance and legal issues. SO's license (CC BY-SA 3.0) requires attribution, i.e., referencing the original question or answer, and requires derived work to adopt a compatible license. While there is a heated debate on SO's license model for code snippets and the required attribution, little is known about the extent to which snippets are copied from SO without proper attribution. We present results of a large-scale empirical study analyzing the usage and attribution of non-trivial Java code snippets from SO answers in public GitHub (GH) projects. We followed three different approaches to triangulate an estimate for the ratio of unattributed usages and conducted two online surveys with software developers to complement our results. For the different sets of projects that we analyzed, the ratio of projects containing files with a reference to SO varied between 3.3% and 11.9%. We found that at most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0. Moreover, we estimate that at most a quarter of the copied code snippets from SO are attributed as required. Of the surveyed developers, almost one half admitted copying code from SO without attribution and about two thirds were not aware of the license of SO code snippets and its implications.

5.2SEAug 5, 2017

Round-Trip Sketches: Supporting the Lifecycle of Software Development Sketches from Analog to Digital and Back

Sebastian Baltes, Fabrice Hollerich, Stephan Diehl

Sketching is an important activity for understanding, designing, and communicating different aspects of software systems such as their requirements or architecture. Often, sketches start on paper or whiteboards, are revised, and may evolve into a digital version. Users may then print a revised sketch, change it on paper, and digitize it again. Existing tools focus on a paperless workflow, i.e., archiving analog documents, or rely on special hardware - they do not focus on integrating digital versions into the analog-focused workflow that many users follow. In this paper, we present the conceptual design and a prototype of LivelySketches, a tool that supports the "round-trip" lifecycle of sketches from analog to digital and back. The proposed workflow includes capturing both analog and digital sketches as well as relevant context information. In addition, users can link sketches to other related sketches or documents. They may access the linked artifacts and captured information using digital as well as augmented analog versions of the sketches. We further present results from a formative user study with four students and outline possible directions for future work.

18.9SEJul 4, 2017

Worse Than Spam: Issues In Sampling Software Developers

Sebastian Baltes, Stephan Diehl

Background: Reaching out to professional software developers is a crucial part of empirical software engineering research. One important method to investigate the state of practice is survey research. As drawing a random sample of professional software developers for a survey is rarely possible, researchers rely on various sampling strategies. Objective: In this paper, we report on our experience with different sampling strategies we employed, highlight ethical issues, and motivate the need to maintain a collection of key demographics about software developers to ease the assessment of the external validity of studies. Method: Our report is based on data from two studies we conducted in the past. Results: Contacting developers over public media proved to be the most effective and efficient sampling strategy. However, we not only describe the perspective of researchers who are interested in reaching goals like a large number of participants or a high response rate, but we also shed light onto ethical implications of different sampling strategies. We present one specific ethical guideline and point to debates in other research communities to start a discussion in the software engineering research community about which sampling strategies should be considered ethical.

10.1SEJun 29, 2017

Linking Sketches and Diagrams to Source Code Artifacts

Sebastian Baltes, Peter Schmitz, Stephan Diehl

Recent studies have shown that sketches and diagrams play an important role in the daily work of software developers. If these visual artifacts are archived, they are often detached from the source code they document, because there is no adequate tool support to assist developers in capturing, archiving, and retrieving sketches related to certain source code artifacts. This paper presents SketchLink, a tool that aims at increasing the value of sketches and diagrams created during software development by supporting developers in these tasks. Our prototype implementation provides a web application that employs the camera of smartphones and tablets to capture analog sketches, but can also be used on desktop computers to upload, for instance, computer-generated diagrams. We also implemented a plugin for a Java IDE that embeds the links in Javadoc comments and visualizes them in situ in the source code editor as graphical icons.

17.6SEJun 28, 2017

Sketches and Diagrams in Practice

Sebastian Baltes, Stephan Diehl

Sketches and diagrams play an important role in the daily work of software developers. In this paper, we investigate the use of sketches and diagrams in software engineering practice. To this end, we used both quantitative and qualitative methods. We present the results of an exploratory study in three companies and an online survey with 394 participants. Our participants included software developers, software architects, project managers, consultants, as well as researchers. They worked in different countries and on projects from a wide range of application areas. Most questions in the survey were related to the last sketch or diagram that the participants had created. Contrary to our expectations and previous work, the majority of sketches and diagrams contained at least some UML elements. However, most of them were informal. The most common purposes for creating sketches and diagrams were designing, explaining, and understanding, but analyzing requirements was also named often. More than half of the sketches and diagrams were created on analog media like paper or whiteboards and have been revised after creation. Most of them were used for more than a week and were archived. We found that the majority of participants related their sketches to methods, classes, or packages, but not to source code artifacts with a lower level of abstraction.