Caishun Chen

AI
h-index25
7papers
71citations
Novelty62%
AI Score46

7 Papers

71.0MAMay 19
LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions

Xin He, Junxi Shen, Yuchen Mou et al.

Classical models of opinion dynamics assume human participants with bounded rationality and limited coordination. The rise of LLM-based agents introduces a qualitative shift: agents can now participate in online discussions at scale, maintain consistent persuasion strategies, and coordinate systematically. This paper argues that LLM agents make collective belief dynamics programmable, enabling deliberate steering of population-level beliefs. We term this emerging problem programmable collective belief control. Through controlled multi-agent simulations, we provide proof-of-concept evidence that coordinated AI agents can induce measurable belief shifts that stabilize within a few interaction rounds. We identify four structural properties (indistinguishability, persistence, contextuality, and configurability) that make detection and defense fundamentally difficult. Based on these findings, we outline a research agenda spanning theoretical foundations for adversarial belief dynamics, operational methods for system-level detection and intervention, and simulation infrastructure for scalable experimentation. Our goal is not to present a complete solution, but to articulate why this problem demands urgent attention and to provide a conceptual foundation for future work.

CVNov 23, 2022
Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap

Mengmi Zhang, Elisa Pavarino, Xiao Liu et al.

As AI becomes increasingly embedded in daily life, ascertaining whether an agent is human is critical. We systematically benchmark AI's ability to imitate humans in three language tasks (image captioning, word association, conversation) and three vision tasks (color estimation, object detection, attention prediction), collecting data from 636 humans and 37 AI agents. Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges. Current AIs are approaching the ability to convincingly impersonate humans and deceive human judges in both language and vision. Even simple AI judges outperformed humans in distinguishing AI from human responses. Imitation ability showed minimal correlation with conventional AI performance metrics, suggesting that passing as human is an important independent evaluation criterion. The large-scale Turing datasets and metrics introduced here offer valuable benchmarks for assessing human-likeness in AI and highlight the importance of rigorous, quantitative imitation tests for AI development.

81.3AIMay 5
Automated Large-scale CVRP Solver Design via LLM-assisted Flexible MCTS

Tong Guo, Caishun Chen, Yew Soon Ong

Solving large-scale CVRP (LSCVRP) with hundreds to thousands of nodes remains difficult for even state-of-the-art solvers. Divide-and-conquer can scale by decomposing the instance into size-reduced subproblems, but designing decomposition logic and configuring sub-solvers is highly expertise- and labor-intensive. Large Language Models (LLMs) have emerged as promising tools for automated algorithm design. However, existing LLM-driven approaches struggle with LSCVRP primarily due to the difficulty in generating sophisticated search strategies within a limited context window. To bridge this gap, we propose the LLM-assisted Flexible Monte Carlo Tree Search (LaF-MCTS), a novel framework that automates the design of high-performance LSCVRP solvers. We develop a three-tier decision hierarchy to enable incremental design of decomposition policies and sub-solvers for LSCVRP. To enable efficient search within the algorithmic hypothesis space, we introduce semantic pruning to eliminate semantically and structurally redundant codes, and branch regrowth to regenerate codes and preserve diversity. Extensive experiments on CVRPLib demonstrate that LaF-MCTS autonomously composes and optimizes decomposition-enhanced solvers that surpasses various state-of-the-art CVRP solvers.

IRApr 2, 2024
Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation

Shanshan Feng, Haoming Lyu, Caishun Chen et al.

Next Point-of-interest (POI) recommendation provides valuable suggestions for users to explore their surrounding environment. Existing studies rely on building recommendation models from large-scale users' check-in data, which is task-specific and needs extensive computational resources. Recently, the pretrained large language models (LLMs) have achieved significant advancements in various NLP tasks and have also been investigated for recommendation scenarios. However, the generalization abilities of LLMs still are unexplored to address the next POI recommendations, where users' geographical movement patterns should be extracted. Although there are studies that leverage LLMs for next-item recommendations, they fail to consider the geographical influence and sequential transitions. Hence, they cannot effectively solve the next POI recommendation task. To this end, we design novel prompting strategies and conduct empirical studies to assess the capability of LLMs, e.g., ChatGPT, for predicting a user's next check-in. Specifically, we consider several essential factors in human movement behaviors, including user geographical preference, spatial distance, and sequential transitions, and formulate the recommendation task as a ranking problem. Through extensive experiments on two widely used real-world datasets, we derive several key findings. Empirical evaluations demonstrate that LLMs have promising zero-shot recommendation abilities and can provide accurate and reasonable predictions. We also reveal that LLMs cannot accurately comprehend geographical context information and are sensitive to the order of presentation of candidate POIs, which shows the limitations of LLMs and necessitates further research on robust human mobility reasoning mechanisms.

AIMay 18, 2025
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs

Junxiao Yang, Jinzhe Tu, Haoran Liu et al. · tsinghua

Recent advances in Large Reasoning Models (LRMs) have shown impressive capabilities in mathematical and logical reasoning. However, current LRMs rarely admit ignorance or respond with "I don't know". Instead, they often produce incorrect answers while showing undue confidence, raising concerns about their factual reliability. In this work, we identify two pathological reasoning patterns characterized by overthinking that contribute to the overconfident and incorrect answers: last-minute guessing and second-thought spiraling. To address these issues, we propose BARREL-a novel framework that promotes concise and boundary-aware factual reasoning. Our experiments show that BARREL-training increases the reliability of DeepSeek-R1-Distill-Llama-8B from 39.33% to 61.48%, while still achieving accuracy comparable to models finetuned on reasoning data generated by R1. These results demonstrate that our pilot study is inspiring to build more reliable and factual System 2 LRMs.

CVMar 19, 2024
Precise-Physics Driven Text-to-3D Generation

Qingshan Xu, Jiao Liu, Melvin Wong et al.

Text-to-3D generation has shown great promise in generating novel 3D content based on given text prompts. However, existing generative methods mostly focus on geometric or visual plausibility while ignoring precise physics perception for the generated 3D shapes. This greatly hinders the practicality of generated 3D shapes in real-world applications. In this work, we propose Phy3DGen, a precise-physics-driven text-to-3D generation method. By analyzing the solid mechanics of generated 3D shapes, we reveal that the 3D shapes generated by existing text-to-3D generation methods are impractical for real-world applications as the generated 3D shapes do not conform to the laws of physics. To this end, we leverage 3D diffusion models to provide 3D shape priors and design a data-driven differentiable physics layer to optimize 3D shape priors with solid mechanics. This allows us to optimize geometry efficiently and learn precise physics information about 3D shapes at the same time. Experimental results demonstrate that our method can consider both geometric plausibility and precise physics perception, further bridging 3D virtual modeling and precise physical worlds.

LGMay 24, 2023
Prompt Evolution for Generative AI: A Classifier-Guided Approach

Melvin Wong, Yew-Soon Ong, Abhishek Gupta et al.

Synthesis of digital artifacts conditioned on user prompts has become an important paradigm facilitating an explosion of use cases with generative AI. However, such models often fail to connect the generated outputs and desired target concepts/preferences implied by the prompts. Current research addressing this limitation has largely focused on enhancing the prompts before output generation or improving the model's performance up front. In contrast, this paper conceptualizes prompt evolution, imparting evolutionary selection pressure and variation during the generative process to produce multiple outputs that satisfy the target concepts/preferences better. We propose a multi-objective instantiation of this broader idea that uses a multi-label image classifier-guided approach. The predicted labels from the classifiers serve as multiple objectives to optimize, with the aim of producing diversified images that meet user preferences. A novelty of our evolutionary algorithm is that the pre-trained generative model gives us implicit mutation operations, leveraging the model's stochastic generative capability to automate the creation of Pareto-optimized images more faithful to user preferences.