AIDec 3, 2025
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using ConcordiaChandler Smith, Marwa Abdulhai, Manfred Diaz et al.
Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human and artificial agents. These interactions represent a critical frontier for LLM-based agents, yet existing evaluation methods fail to measure how well these capabilities generalize to novel social situations. In this paper, we introduce a method for evaluating the ability of LLM-based agents to cooperate in zero-shot, mixed-motive environments using Concordia, a natural language multi-agent simulation environment. Our method measures general cooperative intelligence by testing an agent's ability to identify and exploit opportunities for mutual gain across diverse partners and contexts. We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains across a suite of diverse scenarios ranging from negotiation to collective action problems. Our findings reveal significant gaps between current agent capabilities and the robust generalization required for reliable cooperation, particularly in scenarios demanding persuasion and norm enforcement.
AIMay 2
DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level DiagramsJincheng Lou, Ruohan Xu, Jiapeng Li et al.
System-level diagrams encode the architectural blueprint of chip design, specifying module functions, dataflows, and interface protocols. However, non-standardized symbols and the scarcity of structured training data hinder existing multimodal large language models (MLLMs) from recognizing these diagrams. To address this gap, we introduce DiagramNet, the first multimodal dataset for system-level diagrams, comprising 10,977 connection annotations and 15,515 chain-of-thought QA pairs across four tasks: Listing, Localization, Connection, and Circuit QA. Building on this dataset, we propose a progressive training pipeline together with a decoupled multi-agent workflow that decomposes complex visual reasoning into Perception, Reasoning, and Knowledge stages. On the DiagramNet benchmark, integrating our 3B-parameter model with the proposed workflow surpasses the 2025 EDA Elite Challenge winner and outperforms GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x in end-to-end evaluation. Notably, the workflow generalizes beyond our model, boosting Task 1 performance by 128.7x for Gemini-2.5-Pro and 12.4x for GPT-5. Furthermore, with only 60 images for detector adaptation, the method transfers effectively to AMSBench, achieving zero-shot connectivity reasoning on par with GPT-5 and Claude-Sonnet-4 while surpassing the AMS state-of-the-art method Netlistify.
CYMay 17
You Can't Fool Us: Understanding the Resilience of LLM-driven Agent Communities to MisinformationChichen Lin, Yijie Jin, Kangbo Hu et al.
Misinformation resilience is a dynamic community process: communities differ not only in whether they initially trust false claims, but also in how they recover through interaction, questioning, correction, and support withdrawal. We study this process with an LLM-based agent simulation that constructs synthetic communities along two theoretically motivated dimensions: Actively Open-minded Thinking (AOT), which captures evidence-seeking and willingness to revise beliefs, and Political Ideology (PI), which captures identity-based interpretation of contested claims. These two traits allow us to examine how evidence-oriented reasoning and ideological alignment jointly shape community responses to credible misinformation shocks. Across systematically varied AOT-PI communities, we find that higher AOT improves both resistance to misinformation uptake and recovery after trust peaks. PI shapes the recovery pathway: ideologically moderate communities recover more reliably, while polarized communities retain more residual support. Stance-level analysis shows that resilience depends on whether agents move from questioning a claim to denying or correcting it and withdrawing prior support. Intervention experiments further show that persuasion and fact checking better support post-peak correction, whereas accuracy prompts mainly induce early caution and source warnings have weaker effects. Together, this work provides a mechanism-level account of community misinformation resilience, showing how psychological composition and intervention design shape whether communities move from misinformation exposure toward correction or persistent support.
LGApr 19, 2024
KATO: Knowledge Alignment and Transfer for Transistor Sizing of Different Design and TechnologyWei W. Xing, Weijian Fan, Zhuohua Liu et al.
Automatic transistor sizing in circuit design continues to be a formidable challenge. Despite that Bayesian optimization (BO) has achieved significant success, it is circuit-specific, limiting the accumulation and transfer of design knowledge for broader applications. This paper proposes (1) efficient automatic kernel construction, (2) the first transfer learning across different circuits and technology nodes for BO, and (3) a selective transfer learning scheme to ensure only useful knowledge is utilized. These three novel components are integrated into BO with Multi-objective Acquisition Ensemble (MACE) to form Knowledge Alignment and Transfer Optimization (KATO) to deliver state-of-the-art performance: up to 2x simulation reduction and 1.2x design improvement over the baselines.