Timo Sztyler

LG
h-index59
6papers
66citations
Novelty36%
AI Score30

6 Papers

CLJul 10, 2022
Human-Centric Research for NLP: Towards a Definition and Guiding Questions

Bhushan Kotnis, Kiril Gashteovski, Julia Gastinger et al.

With Human-Centric Research (HCR) we can steer research activities so that the research outcome is beneficial for human stakeholders, such as end users. But what exactly makes research human-centric? We address this question by providing a working definition and define how a research pipeline can be split into different stages in which human-centric components can be added. Additionally, we discuss existing NLP with HCR components and define a series of guiding questions, which can serve as starting points for researchers interested in exploring human-centric research approaches. We hope that this work would inspire researchers to refine the proposed definition and to pose other questions that might be meaningful for achieving HCR.

LGApr 25, 2024
History repeats Itself: A Baseline for Temporal Knowledge Graph Forecasting

Julia Gastinger, Christian Meilicke, Federico Errica et al.

Temporal Knowledge Graph (TKG) Forecasting aims at predicting links in Knowledge Graphs for future timesteps based on a history of Knowledge Graphs. To this day, standardized evaluation protocols and rigorous comparison across TKG models are available, but the importance of simple baselines is often neglected in the evaluation, which prevents researchers from discerning actual and fictitious progress. We propose to close this gap by designing an intuitive baseline for TKG Forecasting based on predicting recurring facts. Compared to most TKG models, it requires little hyperparameter tuning and no iterative training. Further, it can help to identify failure modes in existing approaches. The empirical findings are quite unexpected: compared to 11 methods on five datasets, our baseline ranks first or third in three of them, painting a radically different picture of the predictive quality of the state of the art.

IRFeb 21, 2025
On Synthesizing Data for Context Attribution in Question Answering

Gorjan Radevski, Kiril Gashteovski, Shahbaz Syed et al.

Question Answering (QA) accounts for a significant portion of LLM usage "in the wild". However, LLMs sometimes produce false or misleading responses, also known as "hallucinations". Therefore, grounding the generated answers in contextually provided information -- i.e., providing evidence for the generated text -- is paramount for LLMs' trustworthiness. Providing this information is the task of context attribution. In this paper, we systematically study LLM-based approaches for this task, namely we investigate (i) zero-shot inference, (ii) LLM ensembling, and (iii) fine-tuning of small LMs on synthetic data generated by larger LLMs. Our key contribution is SynQA: a novel generative strategy for synthesizing context attribution data. Given selected context sentences, an LLM generates QA pairs that are supported by these sentences. This leverages LLMs' natural strengths in text generation while ensuring clear attribution paths in the synthetic training data. We show that the attribution data synthesized via SynQA is highly effective for fine-tuning small LMs for context attribution in different QA tasks and domains. Finally, with a user study, we validate the usefulness of small LMs (fine-tuned on synthetic data from SynQA) in context attribution for QA.

LGSep 10, 2021
ProcK: Machine Learning for Knowledge-Intensive Processes

Tobias Jacobs, Jingyi Yu, Julia Gastinger et al.

We present a novel methodology to build powerful predictive process models. Our method, denoted ProcK (Process & Knowledge), relies not only on sequential input data in the form of event logs, but can learn to use a knowledge graph to incorporate information about the attribute values of the events and their mutual relationships. The idea is realized by mapping event attributes to nodes of a knowledge graph and training a sequence model alongside a graph neural network in an end-to-end fashion. This hybrid approach substantially enhances the flexibility and applicability of predictive process monitoring, as both the static and dynamic information residing in the databases of organizations can be directly taken as input data. We demonstrate the potential of ProcK by applying it to a number of predictive process monitoring tasks, including tasks with knowledge graphs available as well as an existing process monitoring benchmark where no such graph is given. The experiments provide evidence that our methodology achieves state-of-the-art performance and improves predictive power when a knowledge graph is available.

LGOct 12, 2020
Explaining Neural Matrix Factorization with Gradient Rollback

Carolin Lawrence, Timo Sztyler, Mathias Niepert

Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model's behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The most generally applicable existing method is based on influence functions, which scale poorly for larger sample sizes and models. We propose gradient rollback, a general approach for influence estimation, applicable to neural models where each parameter update step during gradient descent touches a smaller number of parameters, even if the overall number of parameters is large. Neural matrix factorization models trained with gradient descent are part of this model class. These models are popular and have found a wide range of applications in industry. Especially knowledge graph embedding methods, which belong to this class, are used extensively. We show that gradient rollback is highly efficient at both training and test time. Moreover, we show theoretically that the difference between gradient rollback's influence approximation and the true influence on a model's behavior is smaller than known bounds on the stability of stochastic gradient descent. This establishes that gradient rollback is robustly estimating example influence. We also conduct experiments which show that gradient rollback provides faithful explanations for knowledge base completion and recommender datasets.

CYMar 15, 2018
Challenges in Annotation of useR Data for UbiquitOUs Systems: Results from the 1st ARDUOUS Workshop

Kristina Yordanova, Adeline Paiement, Max Schröder et al.

Labelling user data is a central part of the design and evaluation of pervasive systems that aim to support the user through situation-aware reasoning. It is essential both in designing and training the system to recognise and reason about the situation, either through the definition of a suitable situation model in knowledge-driven applications, or through the preparation of training data for learning tasks in data-driven models. Hence, the quality of annotations can have a significant impact on the performance of the derived systems. Labelling is also vital for validating and quantifying the performance of applications. In particular, comparative evaluations require the production of benchmark datasets based on high-quality and consistent annotations. With pervasive systems relying increasingly on large datasets for designing and testing models of users' activities, the process of data labelling is becoming a major concern for the community. In this work we present a qualitative and quantitative analysis of the challenges associated with annotation of user data and possible strategies towards addressing these challenges. The analysis was based on the data gathered during the 1st International Workshop on Annotation of useR Data for UbiquitOUs Systems (ARDUOUS) and consisted of brainstorming as well as annotation and questionnaire data gathered during the talks, poster session, live annotation session, and discussion session.