Stefan Konigorski

h-index15

7papers

29citations

Novelty44%

AI Score47

Ranked #30,609 of 194,257 authors (top 16%)#7,253 in LG (top 18%)

7 Papers

3.8LGSep 25, 2023Code

Designing and evaluating an online reinforcement learning agent for physical exercise recommendations in N-of-1 trials

Dominik Meier, Ipek Ensari, Stefan Konigorski

Personalized adaptive interventions offer the opportunity to increase patient benefits, however, there are challenges in their planning and implementation. Once implemented, it is an important question whether personalized adaptive interventions are indeed clinically more effective compared to a fixed gold standard intervention. In this paper, we present an innovative N-of-1 trial study design testing whether implementing a personalized intervention by an online reinforcement learning agent is feasible and effective. Throughout, we use a new study on physical exercise recommendations to reduce pain in endometriosis for illustration. We describe the design of a contextual bandit recommendation agent and evaluate the agent in simulation studies. The results show that, first, implementing a personalized intervention by an online reinforcement learning agent is feasible. Second, such adaptive interventions have the potential to improve patients' benefits even if only few observations are available. As one challenge, they add complexity to the design and implementation process. In order to quantify the expected benefit, data from previous interventional studies is required. We expect our approach to be transferable to other interventions and clinical interventions.

2.7LGFeb 10Code

Linear-LLM-SCM: Benchmarking LLMs for Coefficient Elicitation in Linear-Gaussian Causal Models

Kanta Yamaoka, Sumantrak Mukherjee, Thomas Gärtner et al.

Large language models (LLMs) have shown potential in identifying qualitative causal relations, but their ability to perform quantitative causal reasoning -- estimating effect sizes that parametrize functional relationships -- remains underexplored in continuous domains. We introduce Linear-LLM-SCM, a plug-and-play benchmarking framework for evaluating LLMs on linear Gaussian structural causal model (SCM) parametrization when the DAG is given. The framework decomposes a DAG into local parent-child sets and prompts an LLM to produce a regression-style structural equation per node, which is aggregated and compared against available ground-truth parameters. Our experiments show several challenges in such benchmarking tasks, namely, strong stochasticity in the results in some of the models and susceptibility to DAG misspecification via spurious edges in the continuous domains. Across models, we observe substantial variability in coefficient estimates for some settings and sensitivity to structural and semantic perturbations, highlighting current limitations of LLMs as quantitative causal parameterizers. We also open-sourced the benchmarking framework so that researchers can utilize their DAGs and any off-the-shelf LLMs plug-and-play for evaluation in their domains effortlessly.

6.4HCJul 31, 2021Code

StudyMe: A New Mobile App for User-Centric N-of-1 Trials

Alexander M. Zenner, Erwin Böttinger, Stefan Konigorski

N-of-1 trials are multi-crossover self-experiments that allow individuals to systematically evaluate the effect of interventions on their personal health goals. Although several tools for N-of-1 trials exist, none support non-experts in conducting their own user-centric trials. In this study we present StudyMe, an open-source mobile application that is freely available from https://play.google.com/store/apps/details?id=health.studyu.me and offers users flexibility and guidance in configuring every component of their trials. We also present research that informed the development of StudyMe. Through an initial survey with 272 participants, we learned that individuals are interested in a variety of personal health aspects and have unique ideas on how to improve them. In an iterative, user-centered development process with intermediate user tests we developed StudyMe that also features an educational part to communicate N-of-1 trial concepts. A final empirical evaluation of StudyMe showed that all participants were able to create their own trials successfully using StudyMe and the app achieved a very good usability rating. Our findings suggest that StudyMe provides a significant step towards enabling individuals to apply a systematic science-oriented approach to personalize health-related interventions and behavior modifications in their everyday lives.

5.8HCDec 28, 2020Code

StudyU: a platform for designing and conducting innovative digital N-of-1 trials

Stefan Konigorski, Sarah Wernicke, Tamara Slosarek et al.

N-of-1 trials are the gold standard study design to evaluate individual treatment effects and derive personalized treatment strategies. Digital tools have the potential to initiate a new era of N-of-1 trials in terms of scale and scope, but fully-functional platforms are not yet available. Here, we present the open source StudyU platform which includes the StudyU designer and StudyU app. With the StudyU designer, scientists are given a collaborative web application to digitally specify, publish, and conduct N-of-1 trials. The StudyU app is a smartphone application with innovative user-centric elements for participants to partake in the published trials and assess the effects of different interventions on their health. Thereby, the StudyU platform allows clinicians and researchers worldwide to easily design and conduct digital N-of-1 trials in a safe manner. We envision that StudyU can change the landscape of personalized treatments both for patients and healthy individuals, democratize and personalize evidence generation for self-optimization and medicine, and can be integrated in clinical practice.

4.4AIJan 7

Personalization of Large Foundation Models for Health Interventions

Stefan Konigorski, Johannes E. Vedder, Babajide Alamu Owoyele et al.

Large foundation models (LFMs) transform healthcare AI in prevention, diagnostics, and treatment. However, whether LFMs can provide truly personalized treatment recommendations remains an open question. Recent research has revealed multiple challenges for personalization, including the fundamental generalizability paradox: models achieving high accuracy in one clinical study perform at chance level in others, demonstrating that personalization and external validity exist in tension. This exemplifies broader contradictions in AI-driven healthcare: the privacy-performance paradox, scale-specificity paradox, and the automation-empathy paradox. As another challenge, the degree of causal understanding required for personalized recommendations, as opposed to mere predictive capacities of LFMs, remains an open question. N-of-1 trials -- crossover self-experiments and the gold standard for individual causal inference in personalized medicine -- resolve these tensions by providing within-person causal evidence while preserving privacy through local experimentation. Despite their impressive capabilities, this paper argues that LFMs cannot replace N-of-1 trials. We argue that LFMs and N-of-1 trials are complementary: LFMs excel at rapid hypothesis generation from population patterns using multimodal data, while N-of-1 trials excel at causal validation for a given individual. We propose a hybrid framework that combines the strengths of both to enable personalization and navigate the identified paradoxes: LFMs generate ranked intervention candidates with uncertainty estimates, which trigger subsequent N-of-1 trials. Clarifying the boundary between prediction and causation and explicitly addressing the paradoxical tensions are essential for responsible AI integration in personalized medicine.

4.1LGDec 14, 2025

Co-Exploration and Co-Exploitation via Shared Structure in Multi-Task Bandits

Sumantrak Mukherjee, Serafima Lebedeva, Valentin Margraf et al.

We propose a novel Bayesian framework for efficient exploration in contextual multi-task multi-armed bandit settings, where the context is only observed partially and dependencies between reward distributions are induced by latent context variables. In order to exploit these structural dependencies, our approach integrates observations across all tasks and learns a global joint distribution, while still allowing personalised inference for new tasks. In this regard, we identify two key sources of epistemic uncertainty, namely structural uncertainty in the latent reward dependencies across arms and tasks, and user-specific uncertainty due to incomplete context and limited interaction history. To put our method into practice, we represent the joint distribution over tasks and rewards using a particle-based approximation of a log-density Gaussian process. This representation enables flexible, data-driven discovery of both inter-arm and inter-task dependencies without prior assumptions on the latent variables. Empirically, we demonstrate that our method outperforms baselines such as hierarchical model bandits, especially in settings with model misspecification or complex latent heterogeneity.

1.0MLDec 2, 2018

Integrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimer's disease

Stefan Konigorski, Shahryar Khorasani, Christoph Lippert

For precision medicine and personalized treatment, we need to identify predictive markers of disease. We focus on Alzheimer's disease (AD), where magnetic resonance imaging scans provide information about the disease status. By combining imaging with genome sequencing, we aim at identifying rare genetic markers associated with quantitative traits predicted from convolutional neural networks (CNNs), which traditionally have been derived manually by experts. Kernel-based tests are a powerful tool for associating sets of genetic variants, but how to optimally model rare genetic variants is still an open research question. We propose a generalized set of kernels that incorporate prior information from various annotations and multi-omics data. In the analysis of data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), we evaluate whether (i) CNNs yield precise and reliable brain traits, and (ii) the novel kernel-based tests can help to identify loci associated with AD. The results indicate that CNNs provide a fast, scalable and precise tool to derive quantitative AD traits and that new kernels integrating domain knowledge can yield higher power in association tests of very rare variants.