AI MMAug 14, 2025

Modeling Human Responses to Multimodal AI Content

Zhiqi Shen, Shaojing Fan, Danni Xu, Terence Sim, Mohan Kankanhalli

arXiv:2508.10769v1h-index: 2

Originality Incremental advance

AI Analysis

This work addresses the risk of AI-driven misinformation by providing tools to predict and align with human responses, though it is incremental in building on existing multimodal and LLM frameworks.

The paper tackles the problem of how AI-generated content influences human perception and behavior, particularly in contexts like trading, by introducing the MhAIM Dataset with 154,552 posts and showing that people are better at identifying AI content when posts include both text and visuals, especially with inconsistencies. It proposes metrics like trustworthiness and the T-Lens system to enhance LLMs with human-awareness, aiming to mitigate misinformation risks.

As AI-generated content becomes widespread, so does the risk of misinformation. While prior research has primarily focused on identifying whether content is authentic, much less is known about how such content influences human perception and behavior. In domains like trading or the stock market, predicting how people react (e.g., whether a news post will go viral), can be more critical than verifying its factual accuracy. To address this, we take a human-centered approach and introduce the MhAIM Dataset, which contains 154,552 online posts (111,153 of them AI-generated), enabling large-scale analysis of how people respond to AI-generated content. Our human study reveals that people are better at identifying AI content when posts include both text and visuals, particularly when inconsistencies exist between the two. We propose three new metrics: trustworthiness, impact, and openness, to quantify how users judge and engage with online content. We present T-Lens, an LLM-based agent system designed to answer user queries by incorporating predicted human responses to multimodal information. At its core is HR-MCP (Human Response Model Context Protocol), built on the standardized Model Context Protocol (MCP), enabling seamless integration with any LLM. This integration allows T-Lens to better align with human reactions, enhancing both interpretability and interaction capabilities. Our work provides empirical insights and practical tools to equip LLMs with human-awareness capabilities. By highlighting the complex interplay among AI, human cognition, and information reception, our findings suggest actionable strategies for mitigating the risks of AI-driven misinformation.

View on arXiv PDF

Similar