87.8CLJun 3
Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg GameChensong Huang, Changyu Chen, Chenwei Lin et al.
LLMs can appear cautious in risk decision-making tasks, yet cautious-looking outputs do not necessarily indicate alignment with human decision-making mechanisms. We investigate this distinction using the St. Petersburg game as a controlled testbed, a classical paradox in which the expected payoff is infinite, yet humans typically report low, finite willingness to pay. We evaluate 28 LLMs with a structured prompt suite that includes the original game; controlled decision variants that perturb truncation, repeated play, numeric endowment, and occupational identity; a human-perspective prompt that asks models to reason as human decision makers; and paired comparisons between base models and their instruction-tuned counterparts. In the original game, most models generate finite bids, creating the appearance of human-like risk behavior. However, this outcome-level resemblance masks substantial mechanism-level differences. The controlled variants reveal that rather than maintaining human-like behavior seen in the original game, models often shift to conditionally and computationally rational behavior. Human-cue prompting and instruction tuning often lower bids and reduce some visible pathologies, but most mechanism-level response patterns remain largely unchanged. These findings show that behavioral alignment in risk decision-making can be surface-level: LLMs may produce human-like risk decisions without exhibiting human-consistent mechanisms. High-stakes evaluations of LLM decision-making should therefore move beyond outcome similarity and examine whether the alignment is supported by mechanism-level consistency.
CVJun 13, 2024Code
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in InsuranceChenwei Lin, Hanjia Lyu, Xian Xu et al.
Large Vision-Language Models (LVLMs) and Multimodal Large Language Models (MLLMs) have demonstrated outstanding performance in various general multimodal applications and have shown increasing promise in specialized domains. However, their potential in the insurance domain-characterized by diverse application scenarios and rich multimodal data-remains largely underexplored. To date, there is no systematic review of multimodal tasks, nor a benchmark specifically designed to assess the capabilities of LVLMs in insurance. This gap hinders the development of LVLMs within the insurance industry. This study systematically reviews and categorizes multimodal tasks for 4 representative types of insurance: auto, property, health, and agricultural. We introduce INS-MMBench, the first hierarchical benchmark tailored for the insurance domain. INS-MMBench encompasses 22 fundamental tasks, 12 meta-tasks and 5 scenario tasks, enabling a comprehensive and progressive assessment from basic capabilities to real-world use cases. We benchmark 11 leading LVLMs, including closed-source models such as GPT-4o and open-source models like LLaVA. Our evaluation validates the effectiveness of INS-MMBench and offers detailed insights into the strengths and limitations of current LVLMs on a variety of insurance-related multimodal tasks. We hope that INS-MMBench will accelerate the integration of LVLMs into the insurance industry and foster interdisciplinary research. Our dataset and evaluation code are available at https://github.com/FDU-INS/INS-MMBench.
CVApr 15, 2024
Harnessing GPT-4V(ision) for Insurance: A Preliminary ExplorationChenwei Lin, Hanjia Lyu, Jiebo Luo et al.
The emergence of Large Multimodal Models (LMMs) marks a significant milestone in the development of artificial intelligence. Insurance, as a vast and complex discipline, involves a wide variety of data forms in its operational processes, including text, images, and videos, thereby giving rise to diverse multimodal tasks. Despite this, there has been limited systematic exploration of multimodal tasks specific to insurance, nor a thorough investigation into how LMMs can address these challenges. In this paper, we explore GPT-4V's capabilities in the insurance domain. We categorize multimodal tasks by focusing primarily on visual aspects based on types of insurance (e.g., auto, household/commercial property, health, and agricultural insurance) and insurance stages (e.g., risk assessment, risk monitoring, and claims processing). Our experiment reveals that GPT-4V exhibits remarkable abilities in insurance-related tasks, demonstrating not only a robust understanding of multimodal content in the insurance domain but also a comprehensive knowledge of insurance scenarios. However, there are notable shortcomings: GPT-4V struggles with detailed risk rating and loss assessment, suffers from hallucination in image understanding, and shows variable support for different languages. Through this work, we aim to bridge the insurance domain with cutting-edge LMM technology, facilitate interdisciplinary exchange and development, and provide a foundation for the continued advancement and evolution of future research endeavors.
ROJun 3, 2025
Multi Layered Autonomy and AI Ecologies in Robotic Art InstallationsBaoyang Chen, Xian Xu, Huamin Qu
This paper presents Symbiosis of Agents, is a large-scale installation by Baoyang Chen (baoyangchen.com), that embeds AI-driven robots in an immersive, mirror-lined arena, probing the tension between machine agency and artistic authorship. Drawing on early cybernetics, rule-based conceptual art, and seminal robotic works, it orchestrates fluid exchanges among robotic arms, quadruped machines, their environment, and the public. A three tier faith system pilots the ecology: micro-level adaptive tactics, meso-level narrative drives, and a macro-level prime directive. This hierarchy lets behaviors evolve organically in response to environmental cues and even a viewer's breath, turning spectators into co-authors of the unfolding drama. Framed by a speculative terraforming scenario that recalls the historical exploitation of marginalized labor, the piece asks who bears responsibility in AI-mediated futures. Choreographed motion, AI-generated scripts, reactive lighting, and drifting fog cast the robots as collaborators rather than tools, forging a living, emergent artwork. Exhibited internationally, Symbiosis of Agents shows how cybernetic feedback, robotic experimentation, and conceptual rule-making can converge to redefine agency, authorship, and ethics in contemporary art.
ROOct 9, 2025
Airy: Reading Robot Intent through Height and SkyBaoyang Chen, Xian Xu, Huamin Qu
As industrial robots move into shared human spaces, their opaque decision making threatens safety, trust, and public oversight. This artwork, Airy, asks whether complex multi agent AI can become intuitively understandable by staging a competition between two reinforcement trained robot arms that snap a bedsheet skyward. Building on three design principles, competition as a clear metric (who lifts higher), embodied familiarity (audiences recognize fabric snapping), and sensor to sense mapping (robot cooperation or rivalry shown through forest and weather projections), the installation gives viewers a visceral way to read machine intent. Observations from five international exhibitions indicate that audiences consistently read the robots' strategies, conflict, and cooperation in real time, with emotional reactions that mirror the system's internal state. The project shows how sensory metaphors can turn a black box into a public interface.
CVAug 2, 2025
OCSplats: Observation Completeness Quantification and Label Noise Separation in 3DGSHan Ling, Xian Xu, Yinghui Sun et al.
3D Gaussian Splatting (3DGS) has become one of the most promising 3D reconstruction technologies. However, label noise in real-world scenarios-such as moving objects, non-Lambertian surfaces, and shadows-often leads to reconstruction errors. Existing 3DGS-Bsed anti-noise reconstruction methods either fail to separate noise effectively or require scene-specific fine-tuning of hyperparameters, making them difficult to apply in practice. This paper re-examines the problem of anti-noise reconstruction from the perspective of epistemic uncertainty, proposing a novel framework, OCSplats. By combining key technologies such as hybrid noise assessment and observation-based cognitive correction, the accuracy of noise classification in areas with cognitive differences has been significantly improved. Moreover, to address the issue of varying noise proportions in different scenarios, we have designed a label noise classification pipeline based on dynamic anchor points. This pipeline enables OCSplats to be applied simultaneously to scenarios with vastly different noise proportions without adjusting parameters. Extensive experiments demonstrate that OCSplats always achieve leading reconstruction performance and precise label noise classification in scenes of different complexity levels.
HCFeb 6, 2022
From `Wow' to `Why': Guidelines for Creating the Opening of a Data Video with Cinematic StylesXian Xu, Leni Yang, David Yip et al.
Data videos are an increasingly popular storytelling form. The opening of a data video critically influences its success as the opening either attracts the audience to continue watching or bores them to abandon watching. However, little is known about how to create an attractive opening. We draw inspiration from the openings of famous films to facilitate designing data video openings. First, by analyzing over 200 films from several sources, we derived six primary cinematic opening styles adaptable to data videos. Then, we consulted eight experts from the film industry to formulate 28 guidelines. To validate the usability and effectiveness of the guidelines, we asked participants to create data video openings with and without the guidelines, which were then evaluated by experts and the general public. Results showed that the openings designed with the guidelines were perceived to be more attractive, and the guidelines were praised for clarity and inspiration.