CVJan 22
Event-VStream: Event-Driven Real-Time Understanding for Long Video StreamsZhenghui Guo, Yuanbin Man, Junyuan Sheng et al.
Real-time understanding of long video streams remains challenging for multimodal large language models (VLMs) due to redundant frame processing and rapid forgetting of past context. Existing streaming systems rely on fixed-interval decoding or cache pruning, which either produce repetitive outputs or discard crucial temporal information. We introduce Event-VStream, an event-aware framework that represents continuous video as a sequence of discrete, semantically coherent events. Our system detects meaningful state transitions by integrating motion, semantic, and predictive cues, and triggers language generation only at those boundaries. Each event embedding is consolidated into a persistent memory bank, enabling long-horizon reasoning while maintaining low latency. Across OVOBench-Realtime, and long-form Ego4D evaluations, Event-VStream achieves competitive performance. It improves over a VideoLLM-Online-8B baseline by +10.4 points on OVOBench-Realtime, achieves performance close to Flash-VStream-7B despite using only a general-purpose LLaMA-3-8B text backbone, and maintains around 70% GPT-5 win rate on 2-hour Ego4D streams.
CVAug 27, 2025
DNP-Guided Contrastive Reconstruction with a Reverse Distillation Transformer for Medical Anomaly DetectionLuhu Li, Bowen Lin, Mukhtiar Khan et al.
Anomaly detection in medical images is challenging due to limited annotations and a domain gap compared to natural images. Existing reconstruction methods often rely on frozen pre-trained encoders, which limits adaptation to domain-specific features and reduces localization accuracy. Prototype-based learning offers interpretability and clustering benefits but suffers from prototype collapse, where few prototypes dominate training, harming diversity and generalization. To address this, we propose a unified framework combining a trainable encoder with prototype-guided reconstruction and a novel Diversity-Aware Alignment Loss. The trainable encoder, enhanced by a momentum branch, enables stable domain-adaptive feature learning. A lightweight Prototype Extractor mines informative normal prototypes to guide the decoder via attention for precise reconstruction. Our loss enforces balanced prototype use through diversity constraints and per-prototype normalization, effectively preventing collapse. Experiments on multiple medical imaging benchmarks show significant improvements in representation quality and anomaly localization, outperforming prior methods. Visualizations and prototype assignment analyses further validate the effectiveness of our anti-collapse mechanism and enhanced interpretability.
CVMar 23, 2019
Fast LLMMSE filter for low-dose CT imagingFengling Wang, Bowen Lin, Shujun Fu et al.
Low-dose X-ray CT technology is one of important directions of current research and development of medical imaging equipment. A fast algorithm of blockwise sinogram filtering is presented for realtime low-dose CT imaging. A nonstationary Gaussian noise model of low-dose sinogram data is proposed in the low-mA (tube current) CT protocol. Then, according to the linear minimum mean square error principle, an adaptive blockwise algorithm is built to filter contaminated sinogram data caused by photon starvation. A moving sum technique is used to speed the algorithm into a linear time one, regardless of the block size and thedata range. The proposedfast filtering givesa better performance in noise reduction and detail preservation in the reconstructed images,which is verified in experiments on simulated and real data compared with some related filtering methods.
CVJan 2, 2019
Optical Fringe Patterns Filtering Based on Multi-Stage Convolution Neural NetworkBowen Lin, Shujun Fu, Caiming Zhang et al.
Optical fringe patterns are often contaminated by speckle noise, making it difficult to accurately and robustly extract their phase fields. To deal with this problem, we propose a filtering method based on deep learning, called optical fringe patterns denoising convolutional neural network (FPD-CNN), for directly removing speckle from the input noisy fringe patterns. Regularization technology is integrated into the design of deep architecture. Specifically, the FPD-CNN method is divided into multiple stages, each stage consists of a set of convolutional layers along with batch normalization and leaky rectified linear unit (Leaky ReLU) activation function. The end-to-end joint training is carried out using the Euclidean loss. Extensive experiments on simulated and experimental optical fringe patterns,especially finer ones with high-density regions, show that the proposed method is competitive with some state-of-the-art denoising techniques in spatial or transform domains, efficiently preserving main features of fringe at a fairly fast speed.