Siyuan Zhou

h-index14

5papers

90citations

Novelty60%

AI Score39

Ranked #80,726 of 194,257 authors (top 42%)#17,970 in LG (top 45%)

5 Papers

21.5CLOct 24, 2024Code

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Shuhao Gu, Jialing Zhang, Siyuan Zhou et al.

Recently, Vision-Language Models (VLMs) have achieved remarkable progress in multimodal tasks, and multimodal instruction data serves as the foundation for enhancing VLM capabilities. Despite the availability of several open-source multimodal datasets, limitations in the scale and quality of open-source instruction data hinder the performance of VLMs trained on these datasets, leading to a significant gap compared to models trained on closed-source data. To address this challenge, we introduce Infinity-MM, a large-scale multimodal instruction dataset. We collected the available multimodal instruction datasets and performed unified preprocessing, resulting in a dataset with over 40 million samples that ensures diversity and accuracy. Furthermore, to enable large-scale expansion of instruction data and support the continuous acquisition of high-quality data, we propose a synthetic instruction generation method based on a tagging system and open-source VLMs. By establishing correspondences between different types of images and associated instruction types, this method can provide essential guidance during data synthesis. Leveraging this high-quality data, we have trained a 2-billion-parameter Vision-Language Model, Aquila-VL-2B, which achieves state-of-the-art (SOTA) performance among models of similar scale. The data is available at: https://huggingface.co/datasets/BAAI/Infinity-MM.

14.5ROJun 23, 2025

MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk Analysis

Xiaowei Chi, Kuangzhi Ge, Jiaming Liu et al.

Video Generation Models (VGMs) have become powerful backbones for Vision-Language-Action (VLA) models, leveraging large-scale pretraining for robust dynamics modeling. However, current methods underutilize their distribution modeling capabilities for predicting future states. Two challenges hinder progress: integrating generative processes into feature learning is both technically and conceptually underdeveloped, and naive frame-by-frame video diffusion is computationally inefficient for real-time robotics. To address these, we propose Manipulate in Dream (MinD), a dual-system world model for real-time, risk-aware planning. MinD uses two asynchronous diffusion processes: a low-frequency visual generator (LoDiff) that predicts future scenes and a high-frequency diffusion policy (HiDiff) that outputs actions. Our key insight is that robotic policies do not require fully denoised frames but can rely on low-resolution latents generated in a single denoising step. To connect early predictions to actions, we introduce DiffMatcher, a video-action alignment module with a novel co-training strategy that synchronizes the two diffusion models. MinD achieves a 63% success rate on RL-Bench, 60% on real-world Franka tasks, and operates at 11.3 FPS, demonstrating the efficiency of single-step latent features for control signals. Furthermore, MinD identifies 74% of potential task failures in advance, providing real-time safety signals for monitoring and intervention. This work establishes a new paradigm for efficient and reliable robotic manipulation using generative world models.

3.8LGDec 10, 2023

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning

Kunyang Lin, Yufeng Wang, Peihao Chen et al.

Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents should exhibit consistent behaviors is still under-explored. In this paper, we propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents by utilizing intrinsic rewards to learn the optimal policy for each agent. We begin by defining behavior consistency as the divergence in output actions between two agents when provided with the same observation. Subsequently, we introduce dynamic consistency intrinsic reward (DCIR) to stimulate agents to be aware of others' behaviors and determine whether to be consistent with them. Lastly, we devise a dynamic scale network (DSN) that provides learnable scale factors for the agent at every time step to dynamically ascertain whether to award consistent behavior and the magnitude of rewards. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement, demonstrating its efficacy.

11.9LGMar 19, 2021

Learning Task Decomposition with Ordered Memory Policy Network

Yuchen Lu, Yikang Shen, Siyuan Zhou et al.

Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the learning process and achieve better generalization. In this work, we study the inductive bias and propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration. Experiments on Craft and Dial demonstrate that our modelcan achieve higher task decomposition performance under both unsupervised and weakly supervised settings, comparing with strong baselines. OMPN can also bedirectly applied to partially observable environments and still achieve higher task decomposition performance. Our visualization further confirms that the subtask hierarchy can emerge in our model.

3.0SEAug 22, 2020

MLD: An Intelligent Memory Leak Detection Scheme Based on Defect Modes in Smart Grids

Ling Yuan, Siyuan Zhou, Neal Xiong

With the expansion of the software scale and complexity of smart grid systems, the detection of smart grid software defects has become a research hotspot. Because of the large scale of the existing smart grid software code, the efficiency and accuracy of the existing smart grid defect detection algorithms are not high. We propose an intelligent memory leak detection scheme based on defect modes MLD in smart grid. Based on the analysis of existing memory leak defect modes, we summarize memory operation behaviors (allocation, release and transfer) and present a state machine model. We employ a fuzzy matching algorithm based on regular expression to determine the memory operation behaviors and then analyze the change in the state machine to assess the vulnerability in the source code. To improve the efficiency of detection and solve the problem of repeated detection at the function call point, we propose a function summary method for memory operation behaviors. The experimental results demonstrate that the method we proposed has high detection speed and accuracy. The algorithm we proposed can identify the defects of the smart grid operation software and ensure the safe operation of the grid.