CLNov 16, 2023
GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language UnderstandingAndor Diera, Abdelhalim Dahou, Lukas Galke et al.
Language models can serve as a valuable tool for software developers to increase productivity. Large generative models can be used for code generation and code completion, while smaller encoder-only models are capable of performing code search tasks using natural language queries.These capabilities are heavily influenced by the quality and diversity of the available training data. Source code datasets used for training usually focus on the most popular languages and testing is mostly conducted on the same distributions, often overlooking low-resource programming languages. Motivated by the NLP generalization taxonomy proposed by Hupkes et.\,al., we propose a new benchmark dataset called GenCodeSearchNet (GeCS) which builds upon existing natural language code search datasets to systemically evaluate the programming language understanding generalization capabilities of language models. As part of the full dataset, we introduce a new, manually curated subset StatCodeSearch that focuses on R, a popular but so far underrepresented programming language that is often used by researchers outside the field of computer science. For evaluation and comparison, we collect several baseline results using fine-tuned BERT-style models and GPT-style large language models in a zero-shot setting.
29.9SEApr 17Code
Supporting the Comprehension of Data Analysis ScriptsFlorian Sihler, Oliver Gerstl, Lars Pfrenger et al.
A lot of research relies on data analysis scripts to process, clean, and visualize data. However, recent studies show that these scripts are often hard to comprehend and maintain, hindering reproducibility and reuse, accompanied by a lack of tool support for handling such scripts. In this work, we focus on the R programming language, addressing this problem by presenting flowR as an extension for the common data analysis IDEs Positron and VS Code. Alongside a previously presented static backward program slicer, flowR provides an overview of data analysis scripts, interactive graph visualizations, linting, and inline value annotations to support data analysts. FlowR incrementally analyzes R projects by intertwining interprocedural data- and control-flow analyses to build a comprehensive dataflow graph, incorporating R's dynamic and explorative features. Additionally, flowR offers a plugin system and interfaces, allowing the integration of further analyses, such as new linting rules or custom visualizations. Requiring an average of 576ms to calculate the full dataflow graph of real-world projects, this enables near real-time feedback. The demonstration video is available at https://youtu.be/hJzr-r-NmMg . For the full source code and extensive documentation, refer to https://github.com/flowr-analysis/flowr . To try the docker image, use `docker run --rm -it eagleoutice/flowr`.
43.0SEMay 7
Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm RecognitionDenis Neumüller, Florian Sihler, Raphael Straub et al.
The automated recognition of algorithm implementations can support many software maintenance and re-engineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing inefficient algorithms like Bubble Sort and suggesting superior alternatives from a library can help in assessing and improving the quality of a system. Approaches from related work suffer from usability as well as scalability issues and their accuracy is not evaluated. In this paper, we investigate how well our approach based on the abstract syntax tree of a program performs for automatic algorithm recognition. To this end, we have implemented a prototype consisting of: A domain-specific language designed to capture the key features of an algorithm and used to express a search pattern on the abstract syntax tree, a matching algorithm to find these features, and an initial catalog of "ready to use" patterns. To create our search patterns we performed a web search using the algorithm name and described key features of the found reference implementations with our domain-specific language. We evaluate our prototype on a subset of the BigCloneEval benchmark containing algorithms like Fibonacci, Bubble Sort, and Binary Search. We achieve an average F1-score of 0.74 outperforming the large language model Codellama which attains 0.35. Additionally, we use multiple code clone detection tools as a baseline for comparison, achieving a recall of 0.62 while the best-performing tool reaches 0.20.
SESep 24, 2021
A Domain-Specific Language for Modeling and Analyzing Solution Spaces for Technology RoadmappingAlexander Breckel, Jakob Pietron, Katharina Juhnke et al.
The introduction of major innovations in industry requires a collaboration across the whole value chain. A common way to organize such a collaboration is the use of technology roadmaps, which act as an industry-wide long-term planning tool. Technology roadmaps are used to identify industry needs, estimate the availability of technological solutions, and identify the need for innovation in the future. Roadmaps are inherently both time-dependent and based on uncertain values, i.e., properties and structural components can change over time. Furthermore, roadmaps have to reason about alternative solutions as well as their key performance indicators. Current approaches for model-based engineering do not inherently support these aspects. We present a novel model-based approach treating those aspects as first-class citizens. To address the problem of missing support for time in the context of roadmap modeling, we introduce the concepts of a common global time, time-dependent properties, and time-dependent availability. This includes requirements, properties, and the structure of the model or its components as well. Furthermore, we support the specification and analysis of key performance indicators for alternative solutions. These concepts result in a continuous range of various valid models over time instead of a single valid model at a certain point of time. We present a graphical user interface to enable the user to efficiently create and analyze those models. We further show the semantics of the resulting model by a translation into a set of global constraints as well as how we solve the resulting constraint system. We report on the evaluation of these concepts and the Iris tool with domain experts from different companies in the automotive value chain based on the industrial case of a smart sensing electrical fuse.