Ralf Lämmel

SE
h-index4
12papers
96citations
Novelty29%
AI Score38

12 Papers

CVNov 29, 2023
AutArch: An AI-assisted workflow for object detection and automated recording in archaeological catalogues

Kevin Klein, Antoine Muller, Alyssa Wohde et al.

The context of this paper is the creation of large uniform archaeological datasets from heterogeneous published resources, such as find catalogues - with the help of AI and Big Data. The paper is concerned with the challenge of consistent assemblages of archaeological data. We cannot simply combine existing records, as they differ in terms of quality and recording standards. Thus, records have to be recreated from published archaeological illustrations. This is only a viable path with the help of automation. The contribution of this paper is a new workflow for collecting data from archaeological find catalogues available as legacy resources, such as archaeological drawings and photographs in large unsorted PDF files; the workflow relies on custom software (AutArch) supporting image processing, object detection, and interactive means of validating and adjusting automatically retrieved data. We integrate artificial intelligence (AI) in terms of neural networks for object detection and classification into the workflow, thereby speeding up, automating, and standardising data collection. Objects commonly found in archaeological catalogues - such as graves, skeletons, ceramics, ornaments, stone tools and maps - are detected. Those objects are spatially related and analysed to extract real-life attributes, such as the size and orientation of graves based on the north arrow and the scale. We also automate recording of geometric whole-outlines through contour detection, as an alternative to landmark-based geometric morphometrics. Detected objects, contours, and other automatically retrieved data can be manually validated and adjusted. We use third millennium BC Europe (encompassing cultures such as 'Corded Ware' and 'Bell Beaker', and their burial practices) as a 'testing ground' and for evaluation purposes; this includes a user study for the workflow and the AutArch software.

SEMay 17
Towards an Ontology for the Foundations of Software Languages

Ralf Lämmel

The notion of software languages subsumes programming languages, modeling languages, and yet many other types of languages used in software engineering. The emerging ontology `Foundations of Software Languages' (FSL) organizes the foundations underlying software languages. We are concerned with language categories, language concepts, associated tools and methodological approaches, the formal systems or other formal entities underlying software languages, and the embedding of software languages into into software engineering activities. The primary objective of FSL is to serve as a knowledge resource in Computer Science education by connecting several subject areas in a principled manner. The first release of FSL (V1), as discussed in this paper, was built through a relatively standard methodology involving common steps for expectations, reuse, conceptualization, formalization, and validation. We leveraged GenAI to support ontology engineering (discovery, classification, linkage, completion, and transformation).

AIJul 31, 2024
eSPARQL: Representing and Reconciling Agnostic and Atheistic Beliefs in RDF-star Knowledge Graphs

Xinyi Pan, Daniel Hernández, Philipp Seifer et al.

Over the past few years, we have seen the emergence of large knowledge graphs combining information from multiple sources. Sometimes, this information is provided in the form of assertions about other assertions, defining contexts where assertions are valid. A recent extension to RDF which admits statements over statements, called RDF-star, is in revision to become a W3C standard. However, there is no proposal for a semantics of these RDF-star statements nor a built-in facility to operate over them. In this paper, we propose a query language for epistemic RDF-star metadata based on a four-valued logic, called eSPARQL. Our proposed query language extends SPARQL-star, the query language for RDF-star, with a new type of FROM clause to facilitate operating with multiple and sometimes conflicting beliefs. We show that the proposed query language can express four use case queries, including the following features: (i) querying the belief of an individual, (ii) the aggregating of beliefs, (iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs (i.e., nesting of beliefs).

SEApr 3, 2013Code
A Literature Survey on Empirical Evidence in Software Engineering

Ekaterina Pek, Ralf Lämmel

Context: Software Engineering research makes use of collections of software artifacts (corpora) to derive empirical evidence from. Goal: To improve quality and reproducibility of research, we need to understand the characteristics of used corpora. Method: For that, we perform a literature survey using grounded theory. We analyze the latest proceedings of seven relevant conferences. Results: While almost all papers use corpora of some kind with the common case of collections of source code of open-source Java projects, there are no frequently used projects or corpora across all the papers. For some conferences we can detect recurrences. We discover several forms of requirements and applied tunings for corpora which indicate more specific needs of research efforts. Conclusion: Our survey feeds into a quantitative basis for discussing the current state of empirical research in software engineering, thereby enabling ultimately improvement of research quality specifically in terms of use (and reuse) of empirical evidence.

DBFeb 13, 2024
From Shapes to Shapes: Inferring SHACL Shapes for Results of SPARQL CONSTRUCT Queries (Extended Version)

Philipp Seifer, Daniel Hernández, Ralf Lämmel et al.

SPARQL CONSTRUCT queries allow for the specification of data processing pipelines that transform given input graphs into new output graphs. It is now common to constrain graphs through SHACL shapes allowing users to understand which data they can expect and which not. However, it becomes challenging to understand what graph data can be expected at the end of a data processing pipeline without knowing the particular input data: Shape constraints on the input graph may affect the output graph, but may no longer apply literally, and new shapes may be imposed by the query template. In this paper, we study the derivation of shape constraints that hold on all possible output graphs of a given SPARQL CONSTRUCT query. We assume that the SPARQL CONSTRUCT query is fixed, e.g., being part of a program, whereas the input graphs adhere to input shape constraints but may otherwise vary over time and, thus, are mostly unknown. We study a fragment of SPARQL CONSTRUCT queries (SCCQ) and a fragment of SHACL (Simple SHACL). We formally define the problem of deriving the most restrictive set of Simple SHACL shapes that constrain the results from evaluating a SCCQ over any input graph restricted by a given set of Simple SHACL shapes. We propose and implement an algorithm that statically analyses input SHACL shapes and CONSTRUCT queries and prove its soundness and complexity.

DBJul 12, 2021
ProGS: Property Graph Shapes Language (Extended Version)

Philipp Seifer, Ralf Lämmel, Steffen Staab

Property graphs constitute data models for representing knowledge graphs. They allow for the convenient representation of facts, including facts about facts, represented by triples in subject or object position of other triples. Knowledge graphs such as Wikidata are created by a diversity of contributors and a range of sources leaving them prone to two types of errors. The first type of error, falsity of facts, is addressed by property graphs through the representation of provenance and validity, making triples occur as first-order objects in subject position of metadata triples. The second type of error, violation of domain constraints, has not been addressed with regard to property graphs so far. In RDF representations, this error can be addressed by shape languages such as SHACL or ShEx, which allow for checking whether graphs are valid with respect to a set of domain constraints. Borrowing ideas from the syntax and semantics definitions of SHACL, we design a shape language for property graphs, ProGS, which allows for formulating shape constraints on property graphs including their specific constructs, such as edges with identities and key-value annotations to both nodes and edges. We define a formal semantics of ProGS, investigate the resulting complexity of validating property graphs against sets of ProGS shapes, compare with corresponding results for SHACL, and implement a prototypical validator that utilizes answer set programming.

SEFeb 28, 2021
Seamless Variability Management With the Virtual Platform

Wardah Mahmood, Daniel Strüber, Thorsten Berger et al.

Customization is a general trend in software engineering, demanding systems that support variable stakeholder requirements. Two opposing strategies are commonly used to create variants: software clone & own and software configuration with an integrated platform. Organizations often start with the former, which is cheap, agile, and supports quick innovation, but does not scale. The latter scales by establishing an integrated platform that shares software assets between variants, but requires high up-front investments or risky migration processes. So, could we have a method that allows an easy transition or even combine the benefits of both strategies? We propose a method and tool that supports a truly incremental development of variant-rich systems, exploiting a spectrum between both opposing strategies. We design, formalize, and prototype the variability-management framework virtual platform. It bridges clone & own and platform-oriented development. Relying on programming-language-independent conceptual structures representing software assets, it offers operators for engineering and evolving a system, comprising: traditional, asset-oriented operators and novel, feature-oriented operators for incrementally adopting concepts of an integrated platform. The operators record meta-data that is exploited by other operators to support the transition. Among others, they eliminate expensive feature-location effort or the need to trace clones. Our evaluation simulates the evolution of a real-world, clone-based system, measuring its costs and benefits.

SEApr 15, 2020
Ownership at Large -- Open Problems and Challenges in Ownership Management

John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk et al.

Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given point in time. By such efforts on ownership health, accountability of ownership is increased. The problem of finding the most suitable owners for an asset is essentially a program comprehension problem: how do we automatically determine who would be best placed to understand, maintain, evolve (and thereby assume ownership of) a given asset. This paper introduces the Facebook Ownesty system, which uses a combination of ultra large scale data mining and machine learning and has been deployed at Facebook as part of the company's ownership management approach. Ownesty processes many millions of software assets (e.g., source-code files) and it takes into account workflow and organizational aspects. The paper sets out open problems and challenges on ownership for the research community with advances expected from the fields of software engineering, programming languages, and machine learning.

SEApr 13, 2020
Understanding What Software Engineers Are Working on -- The Work-Item Prediction Challenge

Ralf Lämmel, Alvin Kerber, Liane Praza

Understanding what a software engineer (a developer, an incident responder, a production engineer, etc.) is working on is a challenging problem -- especially when considering the more complex software engineering workflows in software-intensive organizations: i) engineers rely on a multitude (perhaps hundreds) of loosely integrated tools; ii) engineers engage in concurrent and relatively long running workflows; ii) infrastructure (such as logging) is not fully aware of work items; iv) engineering processes (e.g., for incident response) are not explicitly modeled. In this paper, we explain the corresponding 'work-item prediction challenge' on the grounds of representative scenarios, report on related efforts at Facebook, discuss some lessons learned, and review related work to call to arms to leverage, advance, and combine techniques from program comprehension, mining software repositories, process mining, and machine learning.

SEApr 11, 2020
WES: Agent-based User Interaction Simulation on Real Infrastructure

John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk et al.

We introduce the Web-Enabled Simulation (WES) research agenda, and describe FACEBOOK's WW system. We describe the application of WW to reliability, integrity and privacy at FACEBOOK , where it is used to simulate social media interactions on an infrastructure consisting of hundreds of millions of lines of code. The WES agenda draws on research from many areas of study, including Search Based Software Engineering, Machine Learning, Programming Languages, Multi Agent Systems, Graph Theory, Game AI, and AI Assisted Game Play. We conclude with a set of open problems and research challenges to motivate wider investigation.

PLJan 27, 2017
Relationship Maintenance in Software Language Repositories

Ralf Lämmel

The context of this research is testing and building software systems and, specifically, software language repositories (SLRs), i.e., repositories with components for language processing (interpreters, translators, analyzers, transformers, pretty printers, etc.). SLRs are typically set up for developing and using metaprogramming systems, language workbenches, language definition frameworks, executable semantic frameworks, and modeling frameworks. This work is an inquiry into testing and building SLRs in a manner that the repository is seen as a collection of language-typed artifacts being related by the applications of language-typed functions or relations which serve language processing. The notion of language is used in a broad sense to include text-, tree-, graph-based languages as well as representations based on interchange formats and also proprietary formats for serialization. The overall approach underlying this research is one of language design driven by a complex case study, i.e., a specific SLR with a significant number of processed languages and language processors as well as a noteworthy heterogeneity in terms of representation types and implementation languages. The knowledge gained by our research is best understood as a declarative language design for regression testing and build management, we introduce a corresponding language Ueber with an executable semantics which maintains relationships between language-typed artifacts in an SLR. The grounding of the reported research is based on the comprehensive, formal, executable (logic programming-based) definition of the Ueber language and its systematic application to the management of the SLR YAS which consists of hundreds of language definition and processing components (such as interpreters and transformations) for more than thirty languages (not counting different representation types) with Prolog, Haskell, Java, and Python being used as implementation languages. The importance of this work follows from the significant costs implied by regression testing and build management and also from the complexity of SLRs which calls for means to help with understanding.

PLJan 27, 2017
Interconnected Linguistic Architecture

Johannes Härtel, Lukas Härtel, Ralf Lämmel et al.

The context of the reported research is the documentation of software technologies such as object/relational mappers, web-application frameworks, or code generators. We assume that documentation should model a macroscopic view on usage scenarios of technologies in terms of involved artifacts, leveraged software languages, data flows, conformance relationships, and others. In previous work, we referred to such documentation also as 'linguistic architecture'. The corresponding models may also be referred to as 'megamodels' while adopting this term from the technological space of modeling/model-driven engineering. This work is an inquiry into making such documentation less abstract and more effective by means of connecting (mega)models, systems, and developer experience in several ways. To this end, we adopt an approach that is primarily based on prototyping (i.e., implementa- tion of a megamodeling infrastructure with all conceivable connections) and experimentation with showcases (i.e., documentation of concrete software technologies). The knowledge gained by this research is a notion of interconnected linguistic architecture on the grounds of connecting primary model elements, inferred model elements, static and runtime system artifacts, traceability links, system contexts, knowledge resources, plugged interpretations of model elements, and IDE views. A corresponding suite of aspects of interconnected linguistic architecture is systematically described. As to the grounding of this research, we describe a literature survey which tracks scattered occurrences and thus demonstrates the relevance of the identified aspects of interconnected linguistic architecture. Further, we describe the MegaL/Xtext+IDE infrastructure which realizes interconnected linguistic architecture. The importance of this work lies in providing more formal (ontologically rich, navigable, verifiable) documentation of software technologies helping developers to better understand how to use technologies in new systems (prescriptive mode) or how technologies are used in existing systems (descriptive mode).