Mengqi Li

LG
h-index47
5papers
52citations
Novelty62%
AI Score55

5 Papers

IVDec 15, 2022Code
Two-stage Contextual Transformer-based Convolutional Neural Network for Airway Extraction from CT Images

Yanan Wu, Shuiqing Zhao, Shouliang Qi et al.

Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in COPD. We propose a novel two-stage 3D contextual transformer-based U-Net for airway segmentation using CT images. The method consists of two stages, performing initial and refined airway segmentation. The two-stage model shares the same subnetwork with different airway masks as input. Contextual transformer block is performed both in the encoder and decoder path of the subnetwork to finish high-quality airway segmentation effectively. In the first stage, the total airway mask and CT images are provided to the subnetwork, and the intrapulmonary airway mask and corresponding CT scans to the subnetwork in the second stage. Then the predictions of the two-stage method are merged as the final prediction. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analysis demonstrate that our proposed method extracted much more branches and lengths of the tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https://github.com/zhaozsq/airway_segmentation.

LGJun 3, 2025Code
StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs

Qijun Luo, Mengqi Li, Lei Zhao et al.

Training language models on long sequence data is a demanding requirement for enhancing the model's capability on complex tasks, e.g., long-chain reasoning. However, as the sequence length scales up, the memory cost for storing activation values becomes huge during the Backpropagation (BP) process, even with the application of gradient checkpointing technique. To tackle this challenge, we propose a memory-efficient and exact BP method called StreamBP, which performs a linear decomposition of the chain rule along the sequence dimension in a layer-wise manner, significantly reducing the memory cost of activation values and logits. The proposed method is applicable to common objectives such as SFT, GRPO, and DPO. From an implementation perspective, StreamBP achieves less computational FLOPs and faster BP speed by leveraging the causal structure of the language model. Compared to gradient checkpointing, StreamBP scales up the maximum sequence length of BP by 2.8-5.5 times larger, while using comparable or even less BP time. Note that StreamBP's sequence length scaling ability can be directly transferred to batch size scaling for accelerating training. We further develop a communication-efficient distributed StreamBP to effectively support multi-GPU training and broaden its applicability. Our code can be easily integrated into the training pipeline of any transformer models and is available at https://github.com/Ledzy/StreamBP.

LGOct 21, 2025Code
Online SFT for LLM Reasoning: Surprising Effectiveness of Self-Tuning without Rewards

Mengqi Li, Lei Zhao, Anthony Man-Cho So et al.

We present a simple, self-help online supervised finetuning (OSFT) paradigm for LLM reasoning. In this paradigm, the model generates its own responses and is immediately finetuned on this self-generated data. OSFT is a highly efficient training strategy for LLM reasoning, as it is reward-free and uses just one rollout by default. Experiment results show that OSFT achieves downstream performance on challenging mathematical reasoning tasks comparable to strong reinforcement learning with verifiable rewards (RLVR) methods such as GRPO. Our ablation study further demonstrates the efficiency and robustness of OSFT. The major mechanism of OSFT lies in facilitating the model's own existing preference (latent knowledge) learned from pretraining, which leads to reasoning ability improvement. We believe that OSFT offers an efficient and promising alternative to more complex, reward-based training paradigms. Our code is available at https://github.com/ElementQi/OnlineSFT.

32.3AIMay 9
FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

Mengqi Li, Wensheng Lin, Jinshuai Yang et al.

Effective sequence modeling fundamentally requires balancing the retention of unbounded history with the high-resolution detection of abrupt short-term variations common in real-world phenomena. However, existing state space models (SSMs) relying on high-order polynomial projection operators (HiPPO) face a critical trade-off where uniform measures dilute recent information to maintain timescale invariance, while exponential measures sacrifice global context to capture local dynamics. This paper proposes a Fractional Recurrent Architecture for Computational Temporal Analysis of Long sequences (FRACTAL), a novel architecture integrating fractional measure theory into recursive memory updates to address this limitation. By deriving projection operators with analytically characterized spectral properties and a tunable singularity index, the proposed method amplifies sensitivity to recent signal perturbations while preserving the spectral structure that encodes scale-invariant memory dynamics. This theoretical innovation is instantiated within a simplified diagonalized state space framework by modulating input projection initialization to enable simultaneous capture of multi-scale temporal features. FRACTAL achieves an average score of 87.11\% on the Long Range Arena benchmark, including 61.85\% on the ListOps task, outperforming the S5 model.

CVDec 13, 2024
GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark

Sitong Su, Xiao Cai, Lianli Gao et al.

Recent advances in General Text-to-3D (GT23D) have been significant. However, the lack of a benchmark has hindered systematic evaluation and progress due to issues in datasets and metrics: 1) The largest 3D dataset Objaverse suffers from omitted annotations, disorganization, and low-quality. 2) Existing metrics only evaluate textual-image alignment without considering the 3D-level quality. To this end, we are the first to present a comprehensive benchmark for GT23D called GT23D-Bench consisting of: 1) a 400k high-fidelity and well-organized 3D dataset that curated issues in Objaverse through a systematical annotation-organize-filter pipeline; and 2) comprehensive 3D-aware evaluation metrics which encompass 10 clearly defined metrics thoroughly accounting for multi-dimension of GT23D. Notably, GT23D-Bench features three properties: 1) Multimodal Annotations. Our dataset annotates each 3D object with 64-view depth maps, normal maps, rendered images, and coarse-to-fine captions. 2) Holistic Evaluation Dimensions. Our metrics are dissected into a) Textual-3D Alignment measures textual alignment with multi-granularity visual 3D representations; and b) 3D Visual Quality which considers texture fidelity, multi-view consistency, and geometry correctness. 3) Valuable Insights. We delve into the performance of current GT23D baselines across different evaluation dimensions and provide insightful analysis. Extensive experiments demonstrate that our annotations and metrics are aligned with human preferences.