CVDec 19, 2024Code
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous DrivingShuo Xing, Chengyuan Qian, Yuping Wang et al.
Since the advent of Multimodal Large Language Models (MLLMs), they have made a significant impact across a wide range of real-world applications, particularly in Autonomous Driving (AD). Their ability to process complex visual data and reason about intricate driving scenarios has paved the way for a new paradigm in end-to-end AD systems. However, the progress of developing end-to-end models for AD has been slow, as existing fine-tuning methods demand substantial resources, including extensive computational power, large-scale datasets, and significant funding. Drawing inspiration from recent advancements in inference computing, we propose OpenEMMA, an open-source end-to-end framework based on MLLMs. By incorporating the Chain-of-Thought reasoning process, OpenEMMA achieves significant improvements compared to the baseline when leveraging a diverse range of MLLMs. Furthermore, OpenEMMA demonstrates effectiveness, generalizability, and robustness across a variety of challenging driving scenarios, offering a more efficient and effective approach to autonomous driving. We release all the codes in https://github.com/taco-group/OpenEMMA.
4.2QMMay 1
A Universal Space of Brain Dynamics for Unveiling Cognitive Transitions and Individual DifferencesRonghua Zheng, Chengyuan Qian, Weiyang Ding
Representing dynamical systems through data-driven universal spaces has proven effective; however, achieving this universality for human brain activity remains a significant challenge, further aggravated by diverse cognitive states and individual subjects. Recognizing that spatial properties reflect physical wiring while temporal properties reflect brain function, we develop Universal Brain Dynamics (UBD) to construct a universal space tailored to brain activity and quantify corresponding dynamics using a model-derived Jacobian matrix. Crucially, we validate UBD's universality by accurately predicting functional magnetic resonance imaging (fMRI) signals (Pearson's r > 0.9) across eight states and 963 subjects in the Human Connectome Project (HCP). Through evaluating resting-state fMRI represented within UBD, we gain insight into how infra-slow fluctuation (ISF) underpins brain activity. Furthermore, we reveal a new perspective on structure-function coupling (SFC) by analyzing the temporal sequence of brain dynamics. Extending UBD to task-evoked states, we derive brain dynamics across various cognitive conditions, elucidating the neural mechanisms driving cognitive transitions at a finer granularity. For individual differences, we compare brain dynamics across subjects to identify the neural underpinnings of these variations. Our findings suggest that synergistically integrating spatial and temporal properties of brain activity establishes a universal space for its unfolding, enabling the precise numerical analysis of underlying neural mechanisms across varying conditions.