72.4CVMar 30Code
Navigating the Mirage: A Dual-Path Agentic Framework for Robust Misleading Chart Question AnsweringYanjie Zhang, Yafei Li, Rui Sheng et al.
Despite the success of Vision-Language Models (VLMs), misleading charts remain a significant challenge due to their deceptive visual structures and distorted data representations. We present ChartCynics, an agentic dual-path framework designed to unmask visual deception via a "skeptical" reasoning paradigm. Unlike holistic models, ChartCynics decouples perception from verification: a Diagnostic Vision Path captures structural anomalies (e.g., inverted axes) through strategic ROI cropping, while an OCR-Driven Data Path ensures numerical grounding. To resolve cross-modal conflicts, we introduce an Agentic Summarizer optimized via a two-stage protocol: Oracle-Informed SFT for reasoning distillation and Deception-Aware GRPO for adversarial alignment. This pipeline effectively penalizes visual traps and enforces logical consistency. Evaluations on two benchmarks show that ChartCynics achieves 74.43% and 64.55% accuracy, providing an absolute performance boost of ~29% over the Qwen3-VL-8B backbone, outperforming state-of-the-art proprietary models. Our results demonstrate that specialized agentic workflows can grant smaller open-source models superior robustness, establishing a new foundation for trustworthy chart interpretation.
47.2CLApr 25
VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge GraphsYurui Xiang, Xingyi Mao, Rui Sheng et al.
Large language models (LLMs) show promise in medical diagnosis, but real-world deployment remains challenging due to high-stakes clinical decisions and imperfect reasoning reliability. As a result, careful inspection of model behavior is essential for assessing whether diagnostic reasoning is reliable and clinically grounded. However, debugging medical LLMs remains difficult. First, developers often lack sufficient medical domain expertise to interpret model errors in clinically meaningful terms. Second, models can fail across a large and diverse set of instances involving different input types, tasks, and reasoning steps, making it challenging for developers to prioritize which errors deserve focused inspection. Third, developers struggle to identify recurring error patterns across cases, as existing debugging practices are largely instance-centric and rely on manual inspection of isolated failures. To address these challenges, we present VeriLLMed, a visual analytics system that integrates external biomedical knowledge to audit and debug medical LLM diagnostic reasoning. VeriLLMed transforms model outputs into comparable reasoning paths, constructs knowledge graph-grounded reference paths, and identifies three recurring classes of diagnosis errors: relation errors, branch errors, and missing errors. Case studies and expert evaluation demonstrate that VeriLLMed helps developers identify clinically implausible reasoning and generate actionable insights that can inform the improvement of medical LLMs.
CLMar 23, 2025
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question AnsweringZixin Chen, Sicheng Song, Kashun Shum et al.
Misleading visualizations, which manipulate chart representations to support specific claims, can distort perception and lead to incorrect conclusions. Despite decades of research, they remain a widespread issue, posing risks to public understanding and raising safety concerns for AI systems involved in data-driven communication. While recent multimodal large language models (MLLMs) show strong chart comprehension abilities, their capacity to detect and interpret misleading charts remains unexplored. We introduce Misleading ChartQA benchmark, a large-scale multimodal dataset designed to evaluate MLLMs on misleading chart reasoning. It contains 3,026 curated examples spanning 21 misleader types and 10 chart types, each with standardized chart code, CSV data, multiple-choice questions, and labeled explanations, validated through iterative MLLM checks and expert human review. We benchmark 24 state-of-the-art MLLMs, analyze their performance across misleader types and chart formats, and propose a novel region-aware reasoning pipeline that enhances model accuracy. Our work lays the foundation for developing MLLMs that are robust, trustworthy, and aligned with the demands of responsible visual communication.
HCMar 6
Capability at a Glance: Design Guidelines for Intuitive Avatars Communicating Augmented Actions in Virtual RealityYang Lu, Tianyu Zhang, Jiamu Tang et al.
Virtual Reality (VR) enables users to engage with capabilities beyond human limitations, but it is not always obvious how to trigger these capabilities. Taking the lens of Affordance, we believe avatar design is the key to solving this issue, which ideally should communicate its capabilities and how to activate them. To understand the current practice, we selected eight capabilities across four categories and invited twelve professional designers to design avatars that communicate the capabilities and their corresponding interactions. From the resulting designs, we formed 16 guidelines to provide general and category-specific recommendations. Then, we validated these guidelines by letting two groups of twelve participants design avatars with and without guidelines. Participants rated the guidelines' clarity and usefulness highly. External judges confirmed that avatars designed with the guidelines were more intuitive in conveying the capabilities and interaction methods. Finally, we demonstrated the applicability of the guidelines in avatar design for four VR applications.
HCAug 4, 2021
VBridge: Connecting the Dots Between Features and Data to Explain Healthcare ModelsFurui Cheng, Dongyu Liu, Fan Du et al.
Machine learning (ML) is increasingly applied to Electronic Health Records (EHRs) to solve clinical prediction tasks. Although many ML models perform promisingly, issues with model transparency and interpretability limit their adoption in clinical practice. Directly using existing explainable ML techniques in clinical settings can be challenging. Through literature surveys and collaborations with six clinicians with an average of 17 years of clinical experience, we identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence. Following an iterative design process, we further designed and developed VBridge, a visual analytics tool that seamlessly incorporates ML explanations into clinicians' decision-making workflow. The system includes a novel hierarchical display of contribution-based feature explanations and enriched interactions that connect the dots between ML features, explanations, and data. We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians, showing that visually associating model explanations with patients' situational records can help clinicians better interpret and use model predictions when making clinician decisions. We further derived a list of design implications for developing future explainable ML tools to support clinical decision-making.
HCJul 30, 2020
Composition and Configuration Patterns in Multiple-View VisualizationsXi Chen, Wei Zeng, Yanna Lin et al.
Multiple-view visualization (MV) is a layout design technique often employed to help users see a large number of data attributes and values in a single cohesive representation. Because of its generalizability, the MV design has been widely adopted by the visualization community to help users examine and interact with large, complex, and high-dimensional data. However, although ubiquitous, there has been little work to categorize and analyze MVs in order to better understand its design space. As a result, there has been little to no guideline in how to use the MV design effectively. In this paper, we present an in-depth study of how MVs are designed in practice. We focus on two fundamental measures of multiple-view patterns: composition, which quantifies what view types and how many are there; and configuration, which characterizes spatial arrangement of view layouts in the display space. We build a new dataset containing 360 images of MVs collected from IEEE VIS, EuroVis, and PacificVis publications 2011 to 2019, and make fine-grained annotations of view types and layouts for these visualization images. From this data we conduct composition and configuration analyses using quantitative metrics of term frequency and layout topology. We identify common practices around MVs, including relationship of view types, popular view layouts, and correlation between view types and layouts. We combine the findings into a MV recommendation system, providing interactive tools to explore the design space, and support example-based design.