IVJul 25, 2023
One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI ReconstructionZi Wang, Xiaotong Yu, Chengyan Wang et al.
Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep Learning (DL) has proven effective for fast MRI image reconstruction, its broader applicability across various imaging scenarios has been constrained. Challenges include the high cost and privacy restrictions associated with acquiring large-scale, diverse training data, coupled with the inherent difficulty of addressing mismatches between training and target data in existing DL methodologies. Here, we present a novel Physics-Informed Synthetic data learning framework for Fast MRI, called PISF. PISF marks a breakthrough by enabling generalized DL for multi-scenario MRI reconstruction through a single trained model. Our approach separates the reconstruction of a 2D image into many 1D basic problems, commencing with 1D data synthesis to facilitate generalization. We demonstrate that training DL models on synthetic data, coupled with enhanced learning techniques, yields in vivo MRI reconstructions comparable to or surpassing those of models trained on matched realistic datasets, reducing the reliance on real-world MRI data by up to 96%. Additionally, PISF exhibits remarkable generalizability across multiple vendors and imaging centers. Its adaptability to diverse patient populations has been validated through evaluations by ten experienced medical professionals. PISF presents a feasible and cost-effective way to significantly boost the widespread adoption of DL in various fast MRI applications.
MED-PHJun 16, 2023
Magnetic Resonance Spectroscopy Quantification Aided by Deep Estimations of Imperfection Factors and Macromolecular SignalDicheng Chen, Meijin Lin, Huiting Liu et al.
Objective: Magnetic Resonance Spectroscopy (MRS) is an important technique for biomedical detection. However, it is challenging to accurately quantify metabolites with proton MRS due to serious overlaps of metabolite signals, imperfections because of non-ideal acquisition conditions, and interference with strong background signals mainly from macromolecules. The most popular method, LCModel, adopts complicated non-linear least square to quantify metabolites and addresses these problems by designing empirical priors such as basis-sets, imperfection factors. However, when the signal-to-noise ratio of MRS signal is low, the solution may have large deviation. Methods: Linear Least Squares (LLS) is integrated with deep learning to reduce the complexity of solving this overall quantification. First, a neural network is designed to explicitly predict the imperfection factors and the overall signal from macromolecules. Then, metabolite quantification is solved analytically with the introduced LLS. In our Quantification Network (QNet), LLS takes part in the backpropagation of network training, which allows the feedback of the quantification error into metabolite spectrum estimation. This scheme greatly improves the generalization to metabolite concentrations unseen for training compared to the end-to-end deep learning method. Results: Experiments show that compared with LCModel, the proposed QNet, has smaller quantification errors for simulated data, and presents more stable quantification for 20 healthy in vivo data at a wide range of signal-to-noise ratio. QNet also outperforms other end-to-end deep learning methods. Conclusion: This study provides an intelligent, reliable and robust MRS quantification. Significance: QNet is the first LLS quantification aided by deep learning.
CVSep 8, 2024Code
Unsupervised Multimodal 3D Medical Image Registration with Multilevel Correlation Balanced OptimizationJiazheng Wang, Xiang Chen, Yuxi Zhang et al.
Surgical navigation based on multimodal image registration has played a significant role in providing intraoperative guidance to surgeons by showing the relative position of the target area to critical anatomical structures during surgery. However, due to the differences between multimodal images and intraoperative image deformation caused by tissue displacement and removal during surgery, effective registration of preoperative and intraoperative multimodal images faces significant challenges. To address the multimodal image registration challenges in Learn2Reg 2024, an unsupervised multimodal medical image registration method based on multilevel correlation balanced optimization (MCBO) is designed to solve these problems. First, the features of each modality are extracted based on the modality independent neighborhood descriptor, and the multimodal images are mapped to the feature space. Second, a multilevel pyramidal fusion optimization mechanism is designed to achieve global optimization and local detail complementation of the deformation field through dense correlation analysis and weight-balanced coupled convex optimization for input features at different scales. For preoperative medical images in different modalities, the alignment and stacking of valid information between different modalities is achieved by the maximum fusion between deformation fields. Our method focuses on the ReMIND2Reg task in Learn2Reg 2024, and to verify the generality of the method, we also tested it on the COMULIS3DCLEM task. Based on the results, our method achieved second place in the validation of both two tasks. The code is available at https://github.com/wjiazheng/MCBO.
CVFeb 6Code
Unsupervised MR-US Multimodal Image Registration with Multilevel Correlation Pyramidal OptimizationJiazheng Wang, Zeyu Liu, Min Liu et al.
Surgical navigation based on multimodal image registration has played a significant role in providing intraoperative guidance to surgeons by showing the relative position of the target area to critical anatomical structures during surgery. However, due to the differences between multimodal images and intraoperative image deformation caused by tissue displacement and removal during the surgery, effective registration of preoperative and intraoperative multimodal images faces significant challenges. To address the multimodal image registration challenges in Learn2Reg 2025, an unsupervised multimodal medical image registration method based on Multilevel Correlation Pyramidal Optimization (MCPO) is designed to solve these problems. First, the features of each modality are extracted based on the modality independent neighborhood descriptor, and the multimodal images is mapped to the feature space. Second, a multilevel pyramidal fusion optimization mechanism is designed to achieve global optimization and local detail complementation of the displacement field through dense correlation analysis and weight-balanced coupled convex optimization for input features at different scales. Our method focuses on the ReMIND2Reg task in Learn2Reg 2025. Based on the results, our method achieved the first place in the validation phase and test phase of ReMIND2Reg. The MCPO is also validated on the Resect dataset, achieving an average TRE of 1.798 mm. This demonstrates the broad applicability of our method in preoperative-to-intraoperative image registration. The code is available at https://github.com/wjiazheng/MCPO.
IVOct 20, 2022
Physics-informed Deep Diffusion MRI Reconstruction with Synthetic Data: Break Training Data Bottleneck in Artificial IntelligenceChen Qian, Haoyu Zhang, Yuncheng Gao et al.
Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion weighted imaging (DWI) MRI acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot motion-induced artifacts. These artifacts cannot be removed prospectively, leading to the absence of artifact-free training labels. Thus, the potential of deep learning in multi-shot DWI reconstruction remains largely untapped. To break the training data bottleneck, here, we propose a Physics-Informed Deep DWI reconstruction method (PIDD) to synthesize high-quality paired training data by leveraging the physical diffusion model (magnitude synthesis) and inter-shot motion-induced phase model (motion phase synthesis). The network is trained only once with 100,000 synthetic samples, achieving encouraging results on multiple realistic in vivo data reconstructions. Advantages over conventional methods include: (a) Better motion artifact suppression and reconstruction stability; (b) Outstanding generalization to multi-scenario reconstructions, including multi-resolution, multi-b-value, multi-under-sampling, multi-vendor, and multi-center; (c) Excellent clinical adaptability to patients with verifications by seven experienced doctors (p<0.001). In conclusion, PIDD presents a novel deep learning framework by exploiting the power of MRI physics, providing a cost-effective and explainable way to break the data bottleneck in deep learning medical imaging.
IVJun 24, 2025Code
VoxelOpt: Voxel-Adaptive Message Passing for Discrete Optimization in Deformable Abdominal CT RegistrationHang Zhang, Yuxi Zhang, Jiazheng Wang et al.
Recent developments in neural networks have improved deformable image registration (DIR) by amortizing iterative optimization, enabling fast and accurate DIR results. However, learning-based methods often face challenges with limited training data, large deformations, and tend to underperform compared to iterative approaches when label supervision is unavailable. While iterative methods can achieve higher accuracy in such scenarios, they are considerably slower than learning-based methods. To address these limitations, we propose VoxelOpt, a discrete optimization-based DIR framework that combines the strengths of learning-based and iterative methods to achieve a better balance between registration accuracy and runtime. VoxelOpt uses displacement entropy from local cost volumes to measure displacement signal strength at each voxel, which differs from earlier approaches in three key aspects. First, it introduces voxel-wise adaptive message passing, where voxels with lower entropy receives less influence from their neighbors. Second, it employs a multi-level image pyramid with 27-neighbor cost volumes at each level, avoiding exponential complexity growth. Third, it replaces hand-crafted features or contrastive learning with a pretrained foundational segmentation model for feature extraction. In abdominal CT registration, these changes allow VoxelOpt to outperform leading iterative in both efficiency and accuracy, while matching state-of-the-art learning-based methods trained with label supervision. The source code will be available at https://github.com/tinymilky/VoxelOpt
CVSep 2, 2024
Large Scale Unsupervised Brain MRI Image Registration Solution for Learn2Reg 2024Yuxi Zhang, Xiang Chen, Jiazheng Wang et al.
In this paper, we summarize the methods and experimental results we proposed for Task 2 in the learn2reg 2024 Challenge. This task focuses on unsupervised registration of anatomical structures in brain MRI images between different patients. The difficulty lies in: (1) without segmentation labels, and (2) a large amount of data. To address these challenges, we built an efficient backbone network and explored several schemes to further enhance registration accuracy. Under the guidance of the NCC loss function and smoothness regularization loss function, we obtained a smooth and reasonable deformation field. According to the leaderboard, our method achieved a Dice coefficient of 77.34%, which is 1.4% higher than the TransMorph. Overall, we won second place on the leaderboard for Task 2.
CVJan 23
A Step to Decouple Optimization in 3DGSRenjie Ding, Yaonan Wang, Min Liu et al.
3D Gaussian Splatting (3DGS) has emerged as a powerful technique for real-time novel view synthesis. As an explicit representation optimized through gradient propagation among primitives, optimization widely accepted in deep neural networks (DNNs) is actually adopted in 3DGS, such as synchronous weight updating and Adam with the adaptive gradient. However, considering the physical significance and specific design in 3DGS, there are two overlooked details in the optimization of 3DGS: (i) update step coupling, which induces optimizer state rescaling and costly attribute updates outside the viewpoints, and (ii) gradient coupling in the moment, which may lead to under- or over-effective regularization. Nevertheless, such a complex coupling is under-explored. After revisiting the optimization of 3DGS, we take a step to decouple it and recompose the process into: Sparse Adam, Re-State Regularization and Decoupled Attribute Regularization. Taking a large number of experiments under the 3DGS and 3DGS-MCMC frameworks, our work provides a deeper understanding of these components. Finally, based on the empirical analysis, we re-design the optimization and propose AdamW-GS by re-coupling the beneficial components, under which better optimization efficiency and representation effectiveness are achieved simultaneously.
CVAug 23, 2025Code
Gaussian Primitive Optimized Deformable Retinal Image RegistrationXin Tian, Jiazheng Wang, Yuxi Zhang et al.
Deformable retinal image registration is notoriously difficult due to large homogeneous regions and sparse but critical vascular features, which cause limited gradient signals in standard learning-based frameworks. In this paper, we introduce Gaussian Primitive Optimization (GPO), a novel iterative framework that performs structured message passing to overcome these challenges. After an initial coarse alignment, we extract keypoints at salient anatomical structures (e.g., major vessels) to serve as a minimal set of descriptor-based control nodes (DCN). Each node is modelled as a Gaussian primitive with trainable position, displacement, and radius, thus adapting its spatial influence to local deformation scales. A K-Nearest Neighbors (KNN) Gaussian interpolation then blends and propagates displacement signals from these information-rich nodes to construct a globally coherent displacement field; focusing interpolation on the top (K) neighbors reduces computational overhead while preserving local detail. By strategically anchoring nodes in high-gradient regions, GPO ensures robust gradient flow, mitigating vanishing gradient signal in textureless areas. The framework is optimized end-to-end via a multi-term loss that enforces both keypoint consistency and intensity alignment. Experiments on the FIRE dataset show that GPO reduces the target registration error from 6.2\,px to ~2.4\,px and increases the AUC at 25\,px from 0.770 to 0.938, substantially outperforming existing methods. The source code can be accessed via https://github.com/xintian-99/GPOreg.
AIMay 29, 2021Code
CNTLS: A Benchmark Dataset for Abstractive or Extractive Chinese Timeline SummarizationQianren Mao, Jiazheng Wang, Zheng Wang et al.
Timeline summarization (TLS) involves creating summaries of long-running events using dated summaries from numerous news articles. However, limited data availability has significantly slowed down the development of timeline summarization. In this paper, we introduce the CNTLS dataset, a versatile resource for Chinese timeline summarization. CNTLS encompasses 77 real-life topics, each with 2524 documents and summarizes nearly 60\% days duration compression on average all topics. We meticulously analyze the corpus using well-known metrics, focusing on the style of the summaries and the complexity of the summarization task. Specifically, we evaluate the performance of various extractive and generative summarization systems on the CNTLS corpus to provide benchmarks and support further research. To the best of our knowledge, CNTLS is the first Chinese timeline summarization dataset. The dataset and source code are released\footnote{Code and data available at: \emph{\url{https://github.com/OpenSUM/CNTLS}}.}.
AIApr 30, 2025
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient ReasoningJingyang Yi, Jiazheng Wang, Sida Li
Recent models such as OpenAI o1 and DeepSeek-R1 have demonstrated strong performance on reasoning-intensive tasks by generating extended Chain-of-Thought (CoT) traces. While longer reasoning helps with thorough exploration of solution paths for complex problems, it also often leads to inefficient and redundant outputs--a phenomenon commonly described as overthinking. In this paper, we propose ShorterBetter, a simple yet effective reinforcement learning method that enables reasoning models to learn their own optimal CoT lengths without manual supervision. We define the Sample Optimal Length (SOL) as the length of the shortest correct response among multiple generations, which serves as a dynamic reward signal to guide the model toward efficient reasoning. Applied to DeepSeek-Distill-Qwen-1.5B/7B as base models, ShorterBetter achieves 50%-80% reduction in output lengths in both in-domain and out-of-domain reasoning tasks while maintaining accuracy. Our reasoning trace analysis shows that ShorterBetter refines the structure of the reasoning traces by reducing unnecessary repetition, excessive self-verification, and over-exploration of alternatives.
CVJan 20, 2025
EndoChat: Grounded Multimodal Large Language Model for Endoscopic SurgeryGuankun Wang, Long Bai, Junyi Wang et al.
Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense potential in computer-aided diagnosis and decision-making. In the context of robotic-assisted surgery, MLLMs can serve as effective tools for surgical training and guidance. However, there is still a lack of MLLMs specialized for surgical scene understanding in clinical applications. In this work, we introduce EndoChat to address various dialogue paradigms and subtasks in surgical scene understanding that surgeons encounter. To train our EndoChat, we construct the Surg-396K dataset through a novel pipeline that systematically extracts surgical information and generates structured annotations based on collected large-scale endoscopic surgery datasets. Furthermore, we introduce a multi-scale visual token interaction mechanism and a visual contrast-based reasoning mechanism to enhance the model's representation learning and reasoning capabilities. Our model achieves state-of-the-art performance across five dialogue paradigms and eight surgical scene understanding tasks. Additionally, we conduct evaluations with professional surgeons, most of whom provide positive feedback on collaborating with EndoChat. Overall, these results demonstrate that our EndoChat has great potential to significantly advance training and automation in robotic-assisted surgery.
CLJan 1, 2025
Enhancing Transformers for Generalizable First-Order Logical EntailmentTianshi Zheng, Jiazheng Wang, Zihao Wang et al. · tsinghua
Transformers, as the fundamental deep learning architecture, have demonstrated great capability in reasoning. This paper studies the generalizable first-order logical reasoning ability of transformers with their parameterized knowledge and how to improve it. Transformers' capability of first-order reasoning is further captured by whether they can conduct first-order logical entailment, which is quantitatively measured by their performance in answering knowledge graph queries. We establish the connections between (1) two types of distribution shifts studied in out-of-distribution generalization and (2) unseen knowledge and query settings discussed in the task of knowledge graph query answering, which makes it possible to characterize the fine-grained generalizability. Results on our comprehensive dataset showed that transformers \textit{outperform} previous methods designed particularly for this task and provided detailed empirical evidence about the impact of the input query syntax, token embedding, and transformer architectures on their reasoning capability. Interestingly, our results revealed the mismatch of positional encoding and other design choices of transformer architectures in previous practices. Motivated by this, we propose TEGA, a logic-aware architecture that significantly improves the performance in generalizable first-order logical entailment.
CVNov 27, 2025
Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?Wenkai Huang, Yijia Guo, Gaolei Li et al.
3D Gaussian Splatting (3DGS) has emerged as a powerful representation for 3D scenes, widely adopted due to its exceptional efficiency and high-fidelity visual quality. Given the significant value of 3DGS assets, recent works have introduced specialized watermarking schemes to ensure copyright protection and ownership verification. However, can existing 3D Gaussian watermarking approaches genuinely guarantee robust protection of the 3D assets? In this paper, for the first time, we systematically explore and validate possible vulnerabilities of 3DGS watermarking frameworks. We demonstrate that conventional watermark removal techniques designed for 2D images do not effectively generalize to the 3DGS scenario due to the specialized rendering pipeline and unique attributes of each gaussian primitives. Motivated by this insight, we propose GSPure, the first watermark purification framework specifically for 3DGS watermarking representations. By analyzing view-dependent rendering contributions and exploiting geometrically accurate feature clustering, GSPure precisely isolates and effectively removes watermark-related Gaussian primitives while preserving scene integrity. Extensive experiments demonstrate that our GSPure achieves the best watermark purification performance, reducing watermark PSNR by up to 16.34dB while minimizing degradation to original scene fidelity with less than 1dB PSNR loss. Moreover, it consistently outperforms existing methods in both effectiveness and generalization.
CVSep 17, 2025
SAMIR, an efficient registration framework via robust feature learning from SAMYue He, Min Liu, Qinghao Liu et al.
Image registration is a fundamental task in medical image analysis. Deformations are often closely related to the morphological characteristics of tissues, making accurate feature extraction crucial. Recent weakly supervised methods improve registration by incorporating anatomical priors such as segmentation masks or landmarks, either as inputs or in the loss function. However, such weak labels are often not readily available, limiting their practical use. Motivated by the strong representation learning ability of visual foundation models, this paper introduces SAMIR, an efficient medical image registration framework that utilizes the Segment Anything Model (SAM) to enhance feature extraction. SAM is pretrained on large-scale natural image datasets and can learn robust, general-purpose visual representations. Rather than using raw input images, we design a task-specific adaptation pipeline using SAM's image encoder to extract structure-aware feature embeddings, enabling more accurate modeling of anatomical consistency and deformation patterns. We further design a lightweight 3D head to refine features within the embedding space, adapting to local deformations in medical images. Additionally, we introduce a Hierarchical Feature Consistency Loss to guide coarse-to-fine feature matching and improve anatomical alignment. Extensive experiments demonstrate that SAMIR significantly outperforms state-of-the-art methods on benchmark datasets for both intra-subject cardiac image registration and inter-subject abdomen CT image registration, achieving performance improvements of 2.68% on ACDC and 6.44% on the abdomen dataset. The source code will be publicly available on GitHub following the acceptance of this paper.
CVJul 12, 2025
Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal DissectionRui Tang, Haochen Yin, Guankun Wang et al.
Surgical phase recognition plays a critical role in developing intelligent assistance systems for minimally invasive procedures such as Endoscopic Submucosal Dissection (ESD). However, the high visual similarity across different phases and the lack of structural cues in RGB images pose significant challenges. Depth information offers valuable geometric cues that can complement appearance features by providing insights into spatial relationships and anatomical structures. In this paper, we pioneer the use of depth information for surgical phase recognition and propose Geo-RepNet, a geometry-aware convolutional framework that integrates RGB image and depth information to enhance recognition performance in complex surgical scenes. Built upon a re-parameterizable RepVGG backbone, Geo-RepNet incorporates the Depth-Guided Geometric Prior Generation (DGPG) module that extracts geometry priors from raw depth maps, and the Geometry-Enhanced Multi-scale Attention (GEMA) to inject spatial guidance through geometry-aware cross-attention and efficient multi-scale aggregation. To evaluate the effectiveness of our approach, we construct a nine-phase ESD dataset with dense frame-level annotations from real-world ESD videos. Extensive experiments on the proposed dataset demonstrate that Geo-RepNet achieves state-of-the-art performance while maintaining robustness and high computational efficiency under complex and low-texture surgical environments.
MLMar 6, 2025
Reproducibility Assessment of Magnetic Resonance Spectroscopy of Pregenual Anterior Cingulate Cortex across Sessions and Vendors via the Cloud Computing Platform CloudBrain-MRSRunhan Chen, Meijin Lin, Jianshu Chen et al.
Given the need to elucidate the mechanisms underlying illnesses and their treatment, as well as the lack of harmonization of acquisition and post-processing protocols among different magnetic resonance system vendors, this work is to determine if metabolite concentrations obtained from different sessions, machine models and even different vendors of 3 T scanners can be highly reproducible and be pooled for diagnostic analysis, which is very valuable for the research of rare diseases. Participants underwent magnetic resonance imaging (MRI) scanning once on two separate days within one week (one session per day, each session including two proton magnetic resonance spectroscopy (1H-MRS) scans with no more than a 5-minute interval between scans (no off-bed activity)) on each machine. were analyzed for reliability of within- and between- sessions using the coefficient of variation (CV) and intraclass correlation coefficient (ICC), and for reproducibility of across the machines using correlation coefficient. As for within- and between- session, all CV values for a group of all the first or second scans of a session, or for a session were almost below 20%, and most of the ICCs for metabolites range from moderate (0.4-0.59) to excellent (0.75-1), indicating high data reliability. When it comes to the reproducibility across the three scanners, all Pearson correlation coefficients across the three machines approached 1 with most around 0.9, and majority demonstrated statistical significance (P<0.01). Additionally, the intra-vendor reproducibility was greater than the inter-vendor ones.
CVOct 28, 2024
Fidelity-Imposed Displacement Editing for the Learn2Reg 2024 SHG-BF ChallengeJiacheng Wang, Xiang Chen, Renjiu Hu et al.
Co-examination of second-harmonic generation (SHG) and bright-field (BF) microscopy enables the differentiation of tissue components and collagen fibers, aiding the analysis of human breast and pancreatic cancer tissues. However, large discrepancies between SHG and BF images pose challenges for current learning-based registration models in aligning SHG to BF. In this paper, we propose a novel multi-modal registration framework that employs fidelity-imposed displacement editing to address these challenges. The framework integrates batch-wise contrastive learning, feature-based pre-alignment, and instance-level optimization. Experimental results from the Learn2Reg COMULISglobe SHG-BF Challenge validate the effectiveness of our method, securing the 1st place on the online leaderboard.
LGNov 5, 2021
Branch and Bound in Mixed Integer Linear Programming Problems: A Survey of Techniques and TrendsLingying Huang, Xiaomeng Chen, Wei Huo et al.
In this paper, we surveyed the existing literature studying different approaches and algorithms for the four critical components in the general branch and bound (B&B) algorithm, namely, branching variable selection, node selection, node pruning, and cutting-plane selection. However, the complexity of the B&B algorithm always grows exponentially with respect to the increase of the decision variable dimensions. In order to improve the speed of B&B algorithms, learning techniques have been introduced in this algorithm recently. We further surveyed how machine learning can be used to improve the four critical components in B&B algorithms. In general, a supervised learning method helps to generate a policy that mimics an expert but significantly improves the speed. An unsupervised learning method helps choose different methods based on the features. In addition, models trained with reinforcement learning can beat the expert policy, given enough training and a supervised initialization. Detailed comparisons between different algorithms have been summarized in our survey. Finally, we discussed some future research directions to accelerate and improve the algorithms further in the literature.
MED-PHJan 26, 2021
Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo DataDicheng Chen, Wanqi Hu, Huiting Liu et al.
Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a common setting M=128. Recently, deep learning has been introduced to improve the SNR but most of them use the simulated data as the training set. This may hinder the MRS applications since some potential differences, such as acquisition system imperfections, and physiological and psychologic conditions may exist between the simulated and in vivo data. Here, we proposed a new scheme that purely used the repeated samples of realistic data. A deep learning model, Refusion Long Short-Term Memory (ReLSTM), was designed to learn the mapping from the low SNR time-domain data (24 SA) to the high SNR one (128 SA). Experiments on the in vivo brain spectra of 7 healthy subjects, 2 brain tumor patients and 1 cerebral infarction patient showed that only using 20% repeated samples, the denoised spectra by ReLSTM could provide comparable estimated concentrations of metabolites to 128 SA. Compared with the state-of-the-art low-rank denoising method, the ReLSTM achieved the lower relative error and the Cramér-Rao lower bounds in quantifying some important biomarkers. In summary, ReLSTM can perform high-fidelity denoising of the spectra under fast acquisition (24 SA), which would be valuable to MRS clinical studies.