Shirong Zeng

h-index1

3papers

6citations

3 Papers

43.1AIJul 15

RxBrain: Embodied Cognition Foundation Model with Joint Language-Visual Reasoning and Imagination

Haotian Liang, Mingkang Chen, Yufei Huang et al.

Embodied cognition requires agents to connect high-level task reasoning with the physical states to be achieved. We introduce Hy-Embodied-RxBrain, an embodied cognition foundation model with joint language-visual reasoning and imagination. Unlike vision-language models that emphasize scene understanding and textual decision making, or generative world models that mainly predict future visual states, RxBrain represents embodied plans in a single planning sequence where language and visual imagination play complementary roles. Language provides the abstract structure of a plan, including task decomposition, planning primitives, constraints, temporal order, and decision logic, while visual imagination grounds this structure through world state prediction and joint subgoal planning, associating each planning step with intermediate and final physical states. RxBrain adopts a unified multimodal Mixture-of-Transformers architecture that supports language, image, and video understanding and generation within one model. To train this capability, we build an automatic pipeline that converts embodied videos into joint text-visual planning supervision by decomposing videos into planning steps and aligning them with visual state transitions. We further introduce RxBrain-Bench to evaluate whether models can represent embodied plans through joint textual and visual components rather than separate understanding or generation. Experiments show that RxBrain maintains embodied understanding and generation abilities, and produces plans with coupled textual reasoning, world state prediction, and joint subgoal planning. We also extend RxBrain to continuous robot action generation, where it shows promising real-robot performance without large-scale action-data pretraining. These results provide an initial step toward foundation models for embodied cognition.

5.8AIMar 27, 2024Code

Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation

Chuwen Wang, Shirong Zeng, Cheng Wang

Large language models (LLMs), in conjunction with various reasoning reinforcement methodologies, have demonstrated remarkable capabilities comparable to humans in fields such as mathematics, law, coding, common sense, and world knowledge. In this paper, we delve into the reasoning abilities of LLMs within complex human systems. We propose a novel reasoning framework, termed ``Mosaic Expert Observation Wall'' (MEOW) exploiting generative-agents-based simulation technique. In the MEOW framework, simulated data are utilized to train an expert model concentrating ``experience'' about a specific task in each independent time of simulation. It is the accumulated ``experience'' through the simulation that makes for an expert on a task in a complex human system. We conduct the experiments within a communication game that mirrors real-world security scenarios. The results indicate that our proposed methodology can cooperate with existing methodologies to enhance the reasoning abilities of LLMs in complex human systems.

7.3AIJan 18, 2024

Next-Generation Simulation Illuminates Scientific Problems of Organised Complexity

Cheng Wang, Chuwen Wang, Wang Zhang et al.

As artificial intelligence becomes increasingly prevalent in scientific research, data-driven methodologies appear to overshadow traditional approaches in resolving scientific problems. In this Perspective, we revisit a classic classification of scientific problems and acknowledge that a series of unresolved problems remain. Throughout the history of researching scientific problems, scientists have continuously formed new paradigms facilitated by advances in data, algorithms, and computational power. To better tackle unresolved problems, especially those of organised complexity, a novel paradigm is necessitated. While recognising that the strengths of new paradigms have expanded the scope of resolvable scientific problems, we aware that the continued advancement of data, algorithms, and computational power alone is hardly to bring a new paradigm. We posit that the integration of paradigms, which capitalises on the strengths of each, represents a promising approach. Specifically, we focus on next-generation simulation (NGS), which can serve as a platform to integrate methods from different paradigms. We propose a methodology, sophisticated behavioural simulation (SBS), to realise it. SBS represents a higher level of paradigms integration based on foundational models to simulate complex systems, such as social systems involving sophisticated human strategies and behaviours. NGS extends beyond the capabilities of traditional mathematical modelling simulations and agent-based modelling simulations, and therefore, positions itself as a potential solution to problems of organised complexity in complex systems.