Xinwei Huang

CV
h-index14
4papers
17citations
Novelty59%
AI Score51

4 Papers

CVFeb 2Code
FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

FSVideo Team, Qingyu Chen, Zhiyuan Fang et al.

We introduce FSVideo, a fast speed transformer-based image-to-video (I2V) diffusion framework. We build our framework on the following key components: 1.) a new video autoencoder with highly-compressed latent space ($64\times64\times4$ spatial-temporal downsampling ratio), achieving competitive reconstruction quality; 2.) a diffusion transformer (DIT) architecture with a new layer memory design to enhance inter-layer information flow and context reuse within DIT, and 3.) a multi-resolution generation strategy via a few-step DIT upsampler to increase video fidelity. Our final model, which contains a 14B DIT base model and a 14B DIT upsampler, achieves competitive performance against other popular open-source models, while being an order of magnitude faster. We discuss our model design as well as training strategies in this report.

CVMar 4
Helios: Real Real-Time Long Video Generation Model

Shenghai Yuan, Yuanyang Yin, Zongjian Li et al.

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time generation without standard acceleration techniques such as KV-cache, sparse/linear attention, or quantization; and (3) training without parallelism or sharding frameworks, enabling image-diffusion-scale batch sizes while fitting up to four 14B models within 80 GB of GPU memory. Specifically, Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V tasks. To mitigate drifting in long-video generation, we characterize typical failure modes and propose simple yet effective training strategies that explicitly simulate drifting during training, while eliminating repetitive motion at its source. For efficiency, we heavily compress the historical and noisy context and reduce the number of sampling steps, yielding computational costs comparable to -- or lower than -- those of 1.3B video generative models. Moreover, we introduce infrastructure-level optimizations that accelerate both inference and training while reducing memory consumption. Extensive experiments demonstrate that Helios consistently outperforms prior methods on both short- and long-video generation. We plan to release the code, base model, and distilled model to support further development by the community.

LGOct 23, 2025
Understanding Mechanistic Role of Structural and Functional Connectivity in Tau Propagation Through Multi-Layer Modeling

Tingting Dan, Xinwei Huang, Jiaqi Ding et al.

Emerging neuroimaging evidence shows that pathological tau proteins build up along specific brain networks, suggesting that large-scale network architecture plays a key role in the progression of Alzheimer's disease (AD). However, how structural connectivity (SC) and functional connectivity (FC) interact to influence tau propagation remains unclear. Leveraging an unprecedented volume of longitudinal neuroimaging data, we examine SC-FC interactions through a multi-layer graph diffusion model. Beyond showing that connectome architecture constrains tau spread, our model reveals a regionally asymmetric contribution of SC and FC. Specifically, FC predominantly drives tau spread in subcortical areas, the insula, frontal and temporal cortices, whereas SC plays a larger role in occipital, parietal, and limbic regions. The relative dominance of SC versus FC shifts over the course of disease, with FC generally prevailing in early AD and SC becoming primary in later stages. Spatial patterns of SC- and FC-dominant regions strongly align with the regional expression of AD-associated genes involved in inflammation, apoptosis, and lysosomal function, including CHUK (IKK-alpha), TMEM106B, MCL1, NOTCH1, and TH. In parallel, other non-modifiable risk factors (e.g., APOE genotype, sex) and biological mechanisms (e.g., amyloid deposition) selectively reshape tau propagation by shifting dominant routes between anatomical and functional pathways in a region-specific manner. Findings are validated in an independent AD cohort.

CLJun 29, 2025
Pipelined Decoder for Efficient Context-Aware Text Generation

Zixian Huang, Chenxu Niu, Yu Gu et al.

As the basis of generative AI, an autoregressive model requires the generation of a new token depending on all the previously generated tokens, which brings high quality but also restricts the model to generate tokens one by one, forming a bottleneck limiting the generation speed. In this paper, we propose a new decoder architecture that efficiently generates text in parallel for context-aware generation tasks. Our proposed pipelined decoder initiates the generation of multiple subsequences simultaneously, and, at each time-step, it generates a new token for each subsequence to realize parallelism. Experiments on multiple text generation tasks, including question answering, text summarization, and keyphrase generation, show that our pipelined decoder significantly improves the generation speed without a significant loss of generation quality or additional memory consumption.