Bin Jiang

16.4CVSep 10, 2024

Enhancing Long Video Understanding via Hierarchical Event-Based Memory

Dingxin Cheng, Mingda Li, Jingyu Liu et al.

Recently, integrating visual foundation models into large language models (LLMs) to form video understanding systems has attracted widespread attention. Most of the existing models compress diverse semantic information within the whole video and feed it into LLMs for content comprehension. While this method excels in short video understanding, it may result in a blend of multiple event information in long videos due to coarse compression, which causes information redundancy. Consequently, the semantics of key events might be obscured within the vast information that hinders the model's understanding capabilities. To address this issue, we propose a Hierarchical Event-based Memory-enhanced LLM (HEM-LLM) for better understanding of long videos. Firstly, we design a novel adaptive sequence segmentation scheme to divide multiple events within long videos. In this way, we can perform individual memory modeling for each event to establish intra-event contextual connections, thereby reducing information redundancy. Secondly, while modeling current event, we compress and inject the information of the previous event to enhance the long-term inter-event dependencies in videos. Finally, we perform extensive experiments on various video understanding tasks and the results show that our model achieves state-of-the-art performances.

1.2CHEM-PHMay 10, 2025

Efficient Parallelization of Message Passing Neural Network Potentials for Large-scale Molecular Dynamics

Junfan Xia, Bin Jiang

Machine learning potentials have achieved great success in accelerating atomistic simulations. Many of them relying on atom-centered local descriptors are natural for parallelization. More recent message passing neural network (MPNN) models have demonstrated their superior accuracy and become increasingly popular. However, efficiently parallelizing MPNN models across multiple nodes remains challenging, limiting their practical applications in large-scale simulations. Here, we propose an efficient parallel algorithm for MPNN models, in which additional data communication is minimized among local atoms only in each MP layer without redundant computation, thus scaling linearly with the layer number. Integrated with our recursively embedded atom neural network model, this algorithm demonstrates excellent strong scaling and weak scaling behaviors in several benchmark systems. This approach enables massive molecular dynamics simulations on MPNN models as fast as on strictly local models for over 100 million atoms, vastly extending the applicability of the MPNN potential to an unprecedented scale. This general parallelization framework can empower various MPNN models to efficiently simulate very large and complex systems.

Bin Jiang

2 Papers