CVDec 4, 2025Code
EgoLCD: Egocentric Video Generation with Long Context DiffusionLiuzhou Zhang, Jiarui Ye, Yuanlei Wang et al.
Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from content drift, where object identity and scene semantics degrade over time. To address this challenge, we introduce EgoLCD, an end-to-end framework for egocentric long-context video generation that treats long video synthesis as a problem of efficient and stable memory management. EgoLCD combines a Long-Term Sparse KV Cache for stable global context with an attention-based short-term memory, extended by LoRA for local adaptation. A Memory Regulation Loss enforces consistent memory usage, and Structured Narrative Prompting provides explicit temporal guidance. Extensive experiments on the EgoVid-5M benchmark demonstrate that EgoLCD achieves state-of-the-art performance in both perceptual quality and temporal consistency, effectively mitigating generative forgetting and representing a significant step toward building scalable world models for embodied AI. Code: https://github.com/AIGeeksGroup/EgoLCD. Website: https://aigeeksgroup.github.io/EgoLCD.
LGApr 29, 2025
Multimodal Large Language Models for Medicine: A Comprehensive SurveyJiarui Ye, Hao Tang
MLLMs have recently become a focal point in the field of artificial intelligence research. Building on the strong capabilities of LLMs, MLLMs are adept at addressing complex multi-modal tasks. With the release of GPT-4, MLLMs have gained substantial attention from different domains. Researchers have begun to explore the potential of MLLMs in the medical and healthcare domain. In this paper, we first introduce the background and fundamental concepts related to LLMs and MLLMs, while emphasizing the working principles of MLLMs. Subsequently, we summarize three main directions of application within healthcare: medical reporting, medical diagnosis, and medical treatment. Our findings are based on a comprehensive review of 330 recent papers in this area. We illustrate the remarkable capabilities of MLLMs in these domains by providing specific examples. For data, we present six mainstream modes of data along with their corresponding evaluation benchmarks. At the end of the survey, we discuss the challenges faced by MLLMs in the medical and healthcare domain and propose feasible methods to mitigate or overcome these issues.