Stefan M. Herzog

h-index29

4papers

5citations

Novelty43%

AI Score42

Ranked #61,502 of 194,257 authors (top 32%)#372 in HC (top 15%)

4 Papers

7.7HCMay 27

Fostering human learning is crucial for boosting human-AI synergy

Julian Berger, Jason W. Burton, Ralph Hertwig et al.

The collaboration between humans and artificial intelligence (AI) holds the promise of achieving superior outcomes compared to either acting alone-a phenomenon called human-AI synergy. Nevertheless, our understanding of the conditions that facilitate such human-AI synergy when humans are advised by AI remains limited. A recent meta-analysis showed that, on average, human-AI combinations do not outperform the better individual agent. We argue that this pessimistic conclusion arises from insufficient attention to human learning in the experimental designs. To substantiate this claim, we re-analyzed all 74 studies included in the original meta-analysis, yielding two new findings. First, most previous research overlooked design features that foster human learning, such as providing outcome feedback to participants. Second, our re-analysis demonstrated that studies providing outcome feedback show tentatively higher synergy than those without outcome feedback. Crucially, feedback paired with AI explanations tends to yield positive synergy, while explanations without feedback were linked to negative synergy-indicating that explanations increase synergy only when humans can learn to verify the AI's reliability through feedback. We conclude that the current literature underestimates the potential of human-AI collaboration because it predominantly relies on paradigms that do not facilitate human learning, thus hindering humans from effectively adapting their collaboration strategies. We therefore advocate for a paradigm shift in human-AI interaction research that explicitly addresses human learning and thus enhances our understanding of and support for successful human-AI collaboration.

8.8HCMay 29

Boosting metacognition in entangled human-AI interaction to navigate cognitive-behavioral drift

Ezequiel Lopez-Lopez, Christoph M. Abels, Philipp Lorenz-Spreen et al.

People navigate complex environments using cues, heuristics, and other strategies, which are often adaptive in stable settings. However, as AI increasingly permeates society's information environments, those become more adaptive and evolving: LLM-based chatbots participate in extended interaction, maintain conversational histories, mirror social cues, and can hypercustomize responses, thereby shaping not only what information is accessed but how questions are framed, how evidence is interpreted, and when action feels warranted. Here we propose a framework for sustained human-AI interaction that rests on invariant features of human cognition and human--AI interaction and centers on three interlinked phenomena: entanglement between users and AI systems, the emergence of cognitive and behavioral drift over repeated interactions, and the role of metacognition in the awareness and regulation of these dynamics. As conversational agents provide cues (e.g., fluency, coherence, responsiveness) that people treat as informative, subjective confidence and action readiness may increase without corresponding gains in epistemic reliability, making drift difficult to detect and correct. We describe these dynamics across micro-, meso-, and macro-levels. The framework identifies four metacognitive intervention points and psychologically informed interventions that provide metacognitive scaffolding (boosting and self-nudging). Finally, we outline a long-horizon research agenda for scientific foresight.

5.8AIJun 21, 2024Code

Human-AI collectives produce the most accurate differential diagnoses

N. Zöller, J. Berger, I. Lin et al.

Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 medical cases. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience, and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.

1.2SIJun 25, 2017

A preference elicitation interface for collecting dense recommender datasets with rich user information

Pantelis P. Analytis, Tobias Schnabel, Stefan Herzog et al.

We present an interface that can be leveraged to quickly and effortlessly elicit people's preferences for visual stimuli, such as photographs, visual art and screensavers, along with rich side-information about its users. We plan to employ the new interface to collect dense recommender datasets that will complement existing sparse industry-scale datasets. The new interface and the collected datasets are intended to foster integration of research in recommender systems with research in social and behavioral sciences. For instance, we will use the datasets to assess the diversity of human preferences in different domains of visual experience. Further, using the datasets we will be able to measure crucial psychological effects, such as preference consistency, scale acuity and anchoring biases. Last, we the datasets will facilitate evaluation in counterfactual learning experiments.