Yuxiang Luo

CV
h-index25
13papers
6,966citations
Novelty59%
AI Score65

13 Papers

CLJan 22, 2025Code
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Daya Guo, Dejian Yang et al. · stanford, tsinghua

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

CVDec 9, 2025Code
LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-training

Qing Xu, Kun Yuan, Yuxiang Luo et al.

Surgical segmentation is pivotal for scene understanding yet remains hindered by annotation scarcity and semantic inconsistency across diverse procedures. Existing approaches typically fine-tune natural foundation models (e.g., SAM) with limited supervision, functioning merely as domain adapters rather than surgical foundation models. Consequently, they struggle to generalize across the vast variability of surgical targets. To bridge this gap, we present LapFM, a foundation model designed to evolve robust segmentation capabilities from massive unlabeled surgical images. Distinct from medical foundation models relying on inefficient self-supervised proxy tasks, LapFM leverages a Hierarchical Concept Evolving Pre-training paradigm. First, we establish a Laparoscopic Concept Hierarchy (LCH) via a hierarchical mask decoder with parent-child query embeddings, unifying diverse entities (i.e., Anatomy, Tissue, and Instrument) into a scalable knowledge structure with cross-granularity semantic consistency. Second, we propose a Confidence-driven Evolving Labeling that iteratively generates and filters pseudo-labels based on hierarchical consistency, progressively incorporating reliable samples from unlabeled images into training. This process yields LapBench-114K, a large-scale benchmark comprising 114K image-mask pairs. Extensive experiments demonstrate that LapFM significantly outperforms state-of-the-art methods, establishing new standards for granularity-adaptive generalization in universal laparoscopic segmentation. The source code is available at https://github.com/xq141839/LapFM.

CVNov 15, 2025Code
TM-UNet: Token-Memory Enhanced Sequential Modeling for Efficient Medical Image Segmentation

Yaxuan Jiao, Qing Xu, Yuxiang Luo et al.

Medical image segmentation is essential for clinical diagnosis and treatment planning. Although transformer-based methods have achieved remarkable results, their high computational cost hinders clinical deployment. To address this issue, we propose TM-UNet, a novel lightweight framework that integrates token sequence modeling with an efficient memory mechanism for efficient medical segmentation. Specifically, we introduce a multi-scale token-memory (MSTM) block that transforms 2D spatial features into token sequences through strategic spatial scanning, leveraging matrix memory cells to selectively retain and propagate discriminative contextual information across tokens. This novel token-memory mechanism acts as a dynamic knowledge store that captures long-range dependencies with linear complexity, enabling efficient global reasoning without redundant computation. Our MSTM block further incorporates exponential gating to identify token effectiveness and multi-scale contextual extraction via parallel pooling operations, enabling hierarchical representation learning without computational overhead. Extensive experiments demonstrate that TM-UNet outperforms state-of-the-art methods across diverse medical segmentation tasks with substantially reduced computation cost. The code is available at https://github.com/xq141839/TM-UNet.

CLDec 2, 2025
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-AI, Aixin Liu, Aoxue Mei et al.

We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5. Notably, our high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This methodology facilitates scalable agentic post-training, yielding substantial improvements in generalization and instruction-following robustness within complex, interactive environments.

CVJun 20, 2025Code
Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation

Qing Xu, Yuxiang Luo, Wenting Duan et al.

Medical image analysis is critical yet challenged by the need of jointly segmenting organs or tissues, and numerous instances for anatomical structures and tumor microenvironment analysis. Existing studies typically formulated different segmentation tasks in isolation, which overlooks the fundamental interdependencies between these tasks, leading to suboptimal segmentation performance and insufficient medical image understanding. To address this issue, we propose a Co-Seg++ framework for versatile medical segmentation. Specifically, we introduce a novel co-segmentation paradigm, allowing semantic and instance segmentation tasks to mutually enhance each other. We first devise a spatio-temporal prompt encoder (STP-Encoder) to capture long-range spatial and temporal relationships between segmentation regions and image embeddings as prior spatial constraints. Moreover, we devise a multi-task collaborative decoder (MTC-Decoder) that leverages cross-guidance to strengthen the contextual consistency of both tasks, jointly computing semantic and instance segmentation masks. Extensive experiments on diverse CT and histopathology datasets demonstrate that the proposed Co-Seg++ outperforms state-of-the-arts in the semantic, instance, and panoptic segmentation of dental anatomical structures, histopathology tissues, and nuclei instances. The source code is available at https://github.com/xq141839/Co-Seg-Plus.

CVOct 12, 2025Code
MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation

Yuxiang Luo, Qing Xu, Hai Huang et al.

Multi-modal brain tumor segmentation is critical for clinical diagnosis, and it requires accurate identification of distinct internal anatomical subregions. While the recent prompt-based segmentation paradigms enable interactive experiences for clinicians, existing methods ignore cross-modal correlations and rely on labor-intensive category-specific prompts, limiting their applicability in real-world scenarios. To address these issues, we propose a MSM-Seg framework for multi-modal brain tumor segmentation. The MSM-Seg introduces a novel dual-memory segmentation paradigm that synergistically integrates multi-modal and inter-slice information with the efficient category-agnostic prompt for brain tumor understanding. To this end, we first devise a modality-and-slice memory attention (MSMA) to exploit the cross-modal and inter-slice relationships among the input scans. Then, we propose a multi-scale category-agnostic prompt encoder (MCP-Encoder) to provide tumor region guidance for decoding. Moreover, we devise a modality-adaptive fusion decoder (MF-Decoder) that leverages the complementary decoding information across different modalities to improve segmentation accuracy. Extensive experiments on different MRI datasets demonstrate that our MSM-Seg framework outperforms state-of-the-art methods in multi-modal metastases and glioma tumor segmentation. The code is available at https://github.com/xq141839/MSM-Seg.

CLDec 27, 2024Code
DeepSeek-V3 Technical Report

DeepSeek-AI, Aixin Liu, Bei Feng et al. · stanford, tsinghua

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.

84.2LGMay 8
Recovering Physical Dynamics from Discrete Observations via Intrinsic Differential Consistency

Yuxiang Luo, Andrew Perrault

Recovering continuous-time dynamics from discrete observations is difficult because local supervision (e.g., pointwise regression targets, derivative approximations, or equation residuals) loses fidelity as the observation interval grows. We replace local supervision with a global structural constraint: any flow representing autonomous dynamics must satisfy the semi-group property under time translation. We train a time-conditioned secant velocity field whose deviation from this property, which we call Symmetry Rupture, serves two purposes. As a training regularizer, it confines the hypothesis space to flows that compose consistently across temporal scales. As an inference oracle, it lets the solver select the largest step size that preserves internal consistency, replacing the local truncation error that conventional adaptive solvers depend on. On the diffusion-reaction benchmark under time-informed inference, our method reduces rollout RMSE by 87\% while using 5x fewer function evaluations than a Neural ODE baseline. In the more demanding direct auto-regressive setting, where the model must predict distant future frames without intermediate temporal cues, our adaptive solver allocates compute based on local geometric complexity -- maintaining the lowest rollout RMSE on two of three PDE benchmarks while baselines either diverge or require up to an order of magnitude more function evaluations to remain stable.

CVJan 26
Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing

Chao Wang, Xuanying Li, Cheng Dai et al.

Wireframe parsing aims to recover line segments and their junctions to form a structured geometric representation useful for downstream tasks such as Simultaneous Localization and Mapping (SLAM). Existing methods predict lines and junctions separately and reconcile them post-hoc, causing mismatches and reduced robustness. We present Co-PLNet, a point-line collaborative framework that exchanges spatial cues between the two tasks, where early detections are converted into spatial prompts via a Point-Line Prompt Encoder (PLP-Encoder), which encodes geometric attributes into compact and spatially aligned maps. A Cross-Guidance Line Decoder (CGL-Decoder) then refines predictions with sparse attention conditioned on complementary prompts, enforcing point-line consistency and efficiency. Experiments on Wireframe and YorkUrban show consistent improvements in accuracy and robustness, together with favorable real-time efficiency, demonstrating our effectiveness for structured geometry perception.

AINov 27, 2025
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Zhihong Shao, Yuxiang Luo, Chengda Lu et al.

Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. However, this approach faces fundamental limitations. Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning. Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable. To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions. Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving. We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier. Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute.

IRJul 10, 2020
GLOW : Global Weighted Self-Attention Network for Web Search

Xuan Shan, Chuanjie Liu, Yiqian Xia et al.

Deep matching models aim to facilitate search engines retrieving more relevant documents by mapping queries and documents into semantic vectors in the first-stage retrieval. When leveraging BERT as the deep matching model, the attention score across two words are solely built upon local contextualized word embeddings. It lacks prior global knowledge to distinguish the importance of different words, which has been proved to play a critical role in information retrieval tasks. In addition to this, BERT only performs attention across sub-words tokens which weakens whole word attention representation. We propose a novel Global Weighted Self-Attention (GLOW) network for web document search. GLOW fuses global corpus statistics into the deep matching model. By adding prior weights into attention generation from global information, like BM25, GLOW successfully learns weighted attention scores jointly with query matrix $Q$ and key matrix $K$. We also present an efficient whole word weight sharing solution to bring prior whole word knowledge into sub-words level attention. It aids Transformer to learn whole word level attention. To make our models applicable to complicated web search scenarios, we introduce combined fields representation to accommodate documents with multiple fields even with variable number of instances. We demonstrate GLOW is more efficient to capture the topical and semantic representation both in queries and documents. Intrinsic evaluation and experiments conducted on public data sets reveal GLOW to be a general framework for document retrieve task. It significantly outperforms BERT and other competitive baselines by a large margin while retaining the same model complexity with BERT.

IRJul 3, 2020
MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks

Yusi Zhang, Chuanjie Liu, Angen Luo et al.

We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space. However, all the exiting encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for each document from its co-click neighbour to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information in deep model while meet the demands of billion-scale data size for real time online inference. To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side, which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on both public dataset and private web-scale dataset from two major commercial search engines demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.

CRJun 23, 2020
ACOUSTIC-TURF: Acoustic-based Privacy-Preserving COVID-19 Contact Tracing

Yuxiang Luo, Cheng Zhang, Yunqi Zhang et al.

In this paper, we propose a new privacy-preserving, automated contact tracing system, ACOUSTIC-TURF, to fight COVID-19 using acoustic signals sent from ubiquitous mobile devices. At a high level, ACOUSTIC-TURF adaptively broadcasts inaudible ultrasonic signals with randomly generated IDs in the vicinity. Simultaneously, the system receives other ultrasonic signals sent from nearby (e.g., 6 feet) users. In such a system, individual user IDs are not disclosed to others and the system can accurately detect encounters in physical proximity with 6-foot granularity. We have implemented a prototype of ACOUSTIC-TURF on Android and evaluated its performance in terms of acoustic-signal-based encounter detection accuracy and power consumption at different ranges and under various occlusion scenarios. Experimental results show that ACOUSTIC-TURF can detect multiple contacts within a 6-foot range for mobile phones placed in pockets and outside pockets. Furthermore, our acoustic-signal-based system achieves greater precision than wireless-signal-based approaches when contact tracing is performed through walls. ACOUSTIC-TURF correctly determines that people on opposite sides of a wall are not in contact with one another, whereas the Bluetooth-based approaches detect nonexistent contacts among them.