Di Guo

RO
h-index35
40papers
1,226citations
Novelty49%
AI Score39

40 Papers

ROFeb 21, 2023
Deep Reinforcement Learning for Robotic Pushing and Picking in Cluttered Environment

Yuhong Deng, Xiaofeng Guo, Yixuan Wei et al.

In this paper, a novel robotic grasping system is established to automatically pick up objects in cluttered scenes. A composite robotic hand composed of a suction cup and a gripper is designed for grasping the object stably. The suction cup is used for lifting the object from the clutter first and the gripper for grasping the object accordingly. We utilize the affordance map to provide pixel-wise lifting point candidates for the suction cup. To obtain a good affordance map, the active exploration mechanism is introduced to the system. An effective metric is designed to calculate the reward for the current affordance map, and a deep Q-Network (DQN) is employed to guide the robotic hand to actively explore the environment until the generated affordance map is suitable for grasping. Experimental results have demonstrated that the proposed robotic grasping system is able to greatly increase the success rate of the robotic grasping in cluttered scenes.

IVJul 25, 2023
One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

Zi Wang, Xiaotong Yu, Chengyan Wang et al.

Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep Learning (DL) has proven effective for fast MRI image reconstruction, its broader applicability across various imaging scenarios has been constrained. Challenges include the high cost and privacy restrictions associated with acquiring large-scale, diverse training data, coupled with the inherent difficulty of addressing mismatches between training and target data in existing DL methodologies. Here, we present a novel Physics-Informed Synthetic data learning framework for Fast MRI, called PISF. PISF marks a breakthrough by enabling generalized DL for multi-scenario MRI reconstruction through a single trained model. Our approach separates the reconstruction of a 2D image into many 1D basic problems, commencing with 1D data synthesis to facilitate generalization. We demonstrate that training DL models on synthetic data, coupled with enhanced learning techniques, yields in vivo MRI reconstructions comparable to or surpassing those of models trained on matched realistic datasets, reducing the reliance on real-world MRI data by up to 96%. Additionally, PISF exhibits remarkable generalizability across multiple vendors and imaging centers. Its adaptability to diverse patient populations has been validated through evaluations by ten experienced medical professionals. PISF presents a feasible and cost-effective way to significantly boost the widespread adoption of DL in various fast MRI applications.

IVOct 23, 2022
A Faithful Deep Sensitivity Estimation for Accelerated Magnetic Resonance Imaging

Zi Wang, Haoming Fang, Chen Qian et al.

Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learning methods still rely on pre-estimated sensitivity maps and ignore their inaccuracy, resulting in the significant quality degradation of reconstructed images. In this work, we propose a Joint Deep Sensitivity estimation and Image reconstruction network, called JDSI. During the image artifacts removal, it gradually provides more faithful sensitivity maps with high-frequency information, leading to improved image reconstructions. To understand the behavior of the network, the mutual promotion of sensitivity estimation and image reconstruction is revealed through the visualization of network intermediate results. Results on in vivo datasets and radiologist reader study demonstrate that, for both calibration-based and calibrationless reconstruction, the proposed JDSI achieves the state-of-the-art performance visually and quantitatively, especially when the acceleration factor is high. Additionally, JDSI owns nice robustness to patients and autocalibration signals.

IVOct 18, 2023
Cloud-Magnetic Resonance Imaging System: In the Era of 6G and Artificial Intelligence

Yirong Zhou, Yanhuang Wu, Yuhan Su et al.

Magnetic Resonance Imaging (MRI) plays an important role in medical diagnosis, generating petabytes of image data annually in large hospitals. This voluminous data stream requires a significant amount of network bandwidth and extensive storage infrastructure. Additionally, local data processing demands substantial manpower and hardware investments. Data isolation across different healthcare institutions hinders cross-institutional collaboration in clinics and research. In this work, we anticipate an innovative MRI system and its four generations that integrate emerging distributed cloud computing, 6G bandwidth, edge computing, federated learning, and blockchain technology. This system is called Cloud-MRI, aiming at solving the problems of MRI data storage security, transmission speed, AI algorithm maintenance, hardware upgrading, and collaborative work. The workflow commences with the transformation of k-space raw data into the standardized Imaging Society for Magnetic Resonance in Medicine Raw Data (ISMRMRD) format. Then, the data are uploaded to the cloud or edge nodes for fast image reconstruction, neural network training, and automatic analysis. Then, the outcomes are seamlessly transmitted to clinics or research institutes for diagnosis and other services. The Cloud-MRI system will save the raw imaging data, reduce the risk of data loss, facilitate inter-institutional medical collaboration, and finally improve diagnostic accuracy and work efficiency.

MED-PHJun 16, 2023
Magnetic Resonance Spectroscopy Quantification Aided by Deep Estimations of Imperfection Factors and Macromolecular Signal

Dicheng Chen, Meijin Lin, Huiting Liu et al.

Objective: Magnetic Resonance Spectroscopy (MRS) is an important technique for biomedical detection. However, it is challenging to accurately quantify metabolites with proton MRS due to serious overlaps of metabolite signals, imperfections because of non-ideal acquisition conditions, and interference with strong background signals mainly from macromolecules. The most popular method, LCModel, adopts complicated non-linear least square to quantify metabolites and addresses these problems by designing empirical priors such as basis-sets, imperfection factors. However, when the signal-to-noise ratio of MRS signal is low, the solution may have large deviation. Methods: Linear Least Squares (LLS) is integrated with deep learning to reduce the complexity of solving this overall quantification. First, a neural network is designed to explicitly predict the imperfection factors and the overall signal from macromolecules. Then, metabolite quantification is solved analytically with the introduced LLS. In our Quantification Network (QNet), LLS takes part in the backpropagation of network training, which allows the feedback of the quantification error into metabolite spectrum estimation. This scheme greatly improves the generalization to metabolite concentrations unseen for training compared to the end-to-end deep learning method. Results: Experiments show that compared with LCModel, the proposed QNet, has smaller quantification errors for simulated data, and presents more stable quantification for 20 healthy in vivo data at a wide range of signal-to-noise ratio. QNet also outperforms other end-to-end deep learning methods. Conclusion: This study provides an intelligent, reliable and robust MRS quantification. Significance: QNet is the first LLS quantification aided by deep learning.

ROSep 23, 2024
Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation

Guokang Wang, Hang Li, Shuyuan Zhang et al.

In real-world scenarios, many robotic manipulation tasks are hindered by occlusions and limited fields of view, posing significant challenges for passive observation-based models that rely on fixed or wrist-mounted cameras. In this paper, we investigate the problem of robotic manipulation under limited visual observation and propose a task-driven asynchronous active vision-action model.Our model serially connects a camera Next-Best-View (NBV) policy with a gripper Next-Best Pose (NBP) policy, and trains them in a sensor-motor coordination framework using few-shot reinforcement learning. This approach allows the agent to adjust a third-person camera to actively observe the environment based on the task goal, and subsequently infer the appropriate manipulation actions.We trained and evaluated our model on 8 viewpoint-constrained tasks in RLBench. The results demonstrate that our model consistently outperforms baseline algorithms, showcasing its effectiveness in handling visual constraints in manipulation tasks.

CVJul 26, 2023
Heterogeneous Embodied Multi-Agent Collaboration

Xinzhu Liu, Di Guo, Huaping Liu

Multi-agent embodied tasks have recently been studied in complex indoor visual environments. Collaboration among multiple agents can improve work efficiency and has significant practical value. However, most of the existing research focuses on homogeneous multi-agent tasks. Compared with homogeneous agents, heterogeneous agents can leverage their different capabilities to allocate corresponding sub-tasks and cooperate to complete complex tasks. Heterogeneous multi-agent tasks are common in real-world scenarios, and the collaboration strategy among heterogeneous agents is a challenging and important problem to be solved. To study collaboration among heterogeneous agents, we propose the heterogeneous multi-agent tidying-up task, in which multiple heterogeneous agents with different capabilities collaborate with each other to detect misplaced objects and place them in reasonable locations. This is a demanding task since it requires agents to make the best use of their different capabilities to conduct reasonable task planning and complete the whole task. To solve this task, we build a heterogeneous multi-agent tidying-up benchmark dataset in a large number of houses with multiple rooms based on ProcTHOR-10K. We propose the hierarchical decision model based on misplaced object detection, reasonable receptacle prediction, as well as the handshake-based group communication mechanism. Extensive experiments are conducted to demonstrate the effectiveness of the proposed model. The project's website and videos of experiments can be found at https://hetercol.github.io/.

QMSep 12, 2023
CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis

Di Guo, Sijin Li, Jun Liu et al.

Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html

LGApr 14, 2023
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding

Chunyan Xiong, Mengli Lu, Xiaotong Yu et al.

Soft-thresholding has been widely used in neural networks. Its basic network structure is a two-layer convolution neural network with soft-thresholding. Due to the network's nature of nonlinearity and nonconvexity, the training process heavily depends on an appropriate initialization of network parameters, resulting in the difficulty of obtaining a globally optimal solution. To address this issue, a convex dual network is designed here. We theoretically analyze the network convexity and numerically confirm that the strong duality holds. This conclusion is further verified in the linear fitting and denoising experiments. This work provides a new way to convexify soft-thresholding neural networks.

LGDec 29, 2022
A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization

Jian Cao, Chen Qian, Yihui Huang et al.

Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.

IVOct 20, 2022
Physics-informed Deep Diffusion MRI Reconstruction with Synthetic Data: Break Training Data Bottleneck in Artificial Intelligence

Chen Qian, Haoyu Zhang, Yuncheng Gao et al.

Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion weighted imaging (DWI) MRI acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot motion-induced artifacts. These artifacts cannot be removed prospectively, leading to the absence of artifact-free training labels. Thus, the potential of deep learning in multi-shot DWI reconstruction remains largely untapped. To break the training data bottleneck, here, we propose a Physics-Informed Deep DWI reconstruction method (PIDD) to synthesize high-quality paired training data by leveraging the physical diffusion model (magnitude synthesis) and inter-shot motion-induced phase model (motion phase synthesis). The network is trained only once with 100,000 synthetic samples, achieving encouraging results on multiple realistic in vivo data reconstructions. Advantages over conventional methods include: (a) Better motion artifact suppression and reconstruction stability; (b) Outstanding generalization to multi-scenario reconstructions, including multi-resolution, multi-b-value, multi-under-sampling, multi-vendor, and multi-center; (c) Excellent clinical adaptability to patients with verifications by seven experienced doctors (p<0.001). In conclusion, PIDD presents a novel deep learning framework by exploiting the power of MRI physics, providing a cost-effective and explainable way to break the data bottleneck in deep learning medical imaging.

ROSep 26, 2024
AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

Nan Sun, Bo Mao, Yongchang Li et al.

Current service robots suffer from limited natural language communication abilities, heavy reliance on predefined commands, ongoing human intervention, and, most notably, a lack of proactive collaboration awareness in human-populated environments. This results in narrow applicability and low utility. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed for autonomous operation in realworld scenarios with high accuracy. AssistantX employs a multi-agent framework consisting of 4 specialized LLM agents, each dedicated to perception, planning, decision-making, and reflective review, facilitating advanced inference capabilities and comprehensive collaboration awareness, much like a human assistant by your side. We built a dataset of 210 real-world tasks to validate AssistantX, which includes instruction content and status information on whether relevant personnel are available. Extensive experiments were conducted in both text-based simulations and a real office environment over the course of a month and a half. Our experiments demonstrate the effectiveness of the proposed framework, showing that AssistantX can reactively respond to user instructions, actively adjust strategies to adapt to contingencies, and proactively seek assistance from humans to ensure successful task completion. More details and videos can be found at https://assistantx-agent.github.io/AssistantX/.

CVJan 23, 2022Code
Rich Action-semantic Consistent Knowledge for Early Action Prediction

Xiaoli Liu, Jianqin Yin, Di Guo et al.

Early action prediction (EAP) aims to recognize human actions from a part of action execution in ongoing videos, which is an important task for many practical applications. Most prior works treat partial or full videos as a whole, ignoring rich action knowledge hidden in videos, i.e., semantic consistencies among different partial videos. In contrast, we partition original partial or full videos to form a new series of partial videos and mine the Action-Semantic Consistent Knowledge (ASCK) among these new partial videos evolving in arbitrary progress levels. Moreover, a novel Rich Action-semantic Consistent Knowledge network (RACK) under the teacher-student framework is proposed for EAP. Firstly, we use a two-stream pre-trained model to extract features of videos. Secondly, we treat the RGB or flow features of the partial videos as nodes and their action semantic consistencies as edges. Next, we build a bi-directional semantic graph for the teacher network and a single-directional semantic graph for the student network to model rich ASCK among partial videos. The MSE and MMD losses are incorporated as our distillation loss to enrich the ASCK of partial videos from the teacher to the student network. Finally, we obtain the final prediction by summering the logits of different subnetworks and applying a softmax layer. Extensive experiments and ablative studies have been conducted, demonstrating the effectiveness of modeling rich ASCK for EAP. With the proposed RACK, we have achieved state-of-the-art performance on three benchmarks. The code is available at https://github.com/lily2lab/RACK.git.

IVFeb 24, 2024
Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI

Zi Wang, Min Xiao, Yirong Zhou et al.

Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge necessitates extensive training data in deep learning reconstruction methods. In this work, we propose a novel and efficient approach, leveraging a dimension-reduced separable learning scheme that can perform exceptionally well even with highly limited training data. We design this new approach by incorporating spatiotemporal priors into the development of a Deep Separable Spatiotemporal Learning network (DeepSSL), which unrolls an iteration process of a 2D spatiotemporal reconstruction model with both temporal low-rankness and spatial sparsity. Intermediate outputs can also be visualized to provide insights into the network behavior and enhance interpretability. Extensive results on cardiac cine datasets demonstrate that the proposed DeepSSL surpasses state-of-the-art methods both visually and quantitatively, while reducing the demand for training cases by up to 75%. Additionally, its preliminary adaptability to unseen cardiac patients has been verified through a blind reader study conducted by experienced radiologists and cardiologists. Furthermore, DeepSSL enhances the accuracy of the downstream task of cardiac segmentation and exhibits robustness in prospectively undersampled real-time cardiac MRI.

CVJan 10, 2025
Conditional Diffusion Model for Electrical Impedance Tomography

Duanpeng Shi, Wendong Zheng, Di Guo et al.

Electrical impedance tomography (EIT) is a non-invasive imaging technique, which has been widely used in the fields of industrial inspection, medical monitoring and tactile sensing. However, due to the inherent non-linearity and ill-conditioned nature of the EIT inverse problem, the reconstructed image is highly sensitive to the measured data, and random noise artifacts often appear in the reconstructed image, which greatly limits the application of EIT. To address this issue, a conditional diffusion model with voltage consistency (CDMVC) is proposed in this study. The method consists of a pre-imaging module, a conditional diffusion model for reconstruction, a forward voltage constraint network and a scheme of voltage consistency constraint during sampling process. The pre-imaging module is employed to generate the initial reconstruction. This serves as a condition for training the conditional diffusion model. Finally, based on the forward voltage constraint network, a voltage consistency constraint is implemented in the sampling phase to incorporate forward information of EIT, thereby enhancing imaging quality. A more complete dataset, including both common and complex concave shapes, is generated. The proposed method is validated using both simulation and physical experiments. Experimental results demonstrate that our method can significantly improves the quality of reconstructed images. In addition, experimental results also demonstrate that our method has good robustness and generalization performance.

ROMar 15, 2024
Stimulate the Potential of Robots via Competition

Kangyao Huang, Di Guo, Xinyu Zhang et al.

It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help individual robot to acquire knowledge from the competition, fully stimulating its dynamics potential in the race. Specifically, the competition information among competitors is introduced as the additional auxiliary signal to learn advantaged actions. We further build a Multiagent-Race environment, and extensive experiments are conducted, demonstrating that robots trained in competitive environments outperform ones that are trained with SoTA algorithms in single robot environment.

CVOct 17, 2025
Robust High-Resolution Multi-Organ Diffusion MRI Using Synthetic-Data-Tuned Prompt Learning

Chen Qian, Haoyu Zhang, Junnan Ma et al.

Clinical adoption of multi-shot diffusion-weighted magnetic resonance imaging (multi-shot DWI) for body-wide tumor diagnostics is limited by severe motion-induced phase artifacts from respiration, peristalsis, and so on, compounded by multi-organ, multi-slice, multi-direction and multi-b-value complexities. Here, we introduce a reconstruction framework, LoSP-Prompt, that overcomes these challenges through physics-informed modeling and synthetic-data-driven prompt learning. We model inter-shot phase variations as a high-order Locally Smooth Phase (LoSP), integrated into a low-rank Hankel matrix reconstruction. Crucially, the algorithm's rank parameter is automatically set via prompt learning trained exclusively on synthetic abdominal DWI data emulating physiological motion. Validated across 10,000+ clinical images (43 subjects, 4 scanner models, 5 centers), LoSP-Prompt: (1) Achieved twice the spatial resolution of clinical single-shot DWI, enhancing liver lesion conspicuity; (2) Generalized to seven diverse anatomical regions (liver, kidney, sacroiliac, pelvis, knee, spinal cord, brain) with a single model; (3) Outperformed state-of-the-art methods in image quality, artifact suppression, and noise reduction (11 radiologists' evaluations on a 5-point scale, $p<0.05$), achieving 4-5 points (excellent) on kidney DWI, 4 points (good to excellent) on liver, sacroiliac and spinal cord DWI, and 3-4 points (good) on knee and tumor brain. The approach eliminates navigator signals and realistic data supervision, providing an interpretable, robust solution for high-resolution multi-organ multi-shot DWI. Its scanner-agnostic performance signifies transformative potential for precision oncology.

ROMar 12, 2025
Tacchi 2.0: A Low Computational Cost and Comprehensive Dynamic Contact Simulator for Vision-based Tactile Sensors

Yuhao Sun, Shixin Zhang, Wenzhuang Li et al.

With the development of robotics technology, some tactile sensors, such as vision-based sensors, have been applied to contact-rich robotics tasks. However, the durability of vision-based tactile sensors significantly increases the cost of tactile information acquisition. Utilizing simulation to generate tactile data has emerged as a reliable approach to address this issue. While data-driven methods for tactile data generation lack robustness, finite element methods (FEM) based approaches require significant computational costs. To address these issues, we integrated a pinhole camera model into the low computational cost vision-based tactile simulator Tacchi that used the Material Point Method (MPM) as the simulated method, completing the simulation of marker motion images. We upgraded Tacchi and introduced Tacchi 2.0. This simulator can simulate tactile images, marked motion images, and joint images under different motion states like pressing, slipping, and rotating. Experimental results demonstrate the reliability of our method and its robustness across various vision-based tactile sensors.

MLMar 6, 2025
Reproducibility Assessment of Magnetic Resonance Spectroscopy of Pregenual Anterior Cingulate Cortex across Sessions and Vendors via the Cloud Computing Platform CloudBrain-MRS

Runhan Chen, Meijin Lin, Jianshu Chen et al.

Given the need to elucidate the mechanisms underlying illnesses and their treatment, as well as the lack of harmonization of acquisition and post-processing protocols among different magnetic resonance system vendors, this work is to determine if metabolite concentrations obtained from different sessions, machine models and even different vendors of 3 T scanners can be highly reproducible and be pooled for diagnostic analysis, which is very valuable for the research of rare diseases. Participants underwent magnetic resonance imaging (MRI) scanning once on two separate days within one week (one session per day, each session including two proton magnetic resonance spectroscopy (1H-MRS) scans with no more than a 5-minute interval between scans (no off-bed activity)) on each machine. were analyzed for reliability of within- and between- sessions using the coefficient of variation (CV) and intraclass correlation coefficient (ICC), and for reproducibility of across the machines using correlation coefficient. As for within- and between- session, all CV values for a group of all the first or second scans of a session, or for a session were almost below 20%, and most of the ICCs for metabolites range from moderate (0.4-0.59) to excellent (0.75-1), indicating high data reliability. When it comes to the reproducibility across the three scanners, all Pearson correlation coefficients across the three machines approached 1 with most around 0.9, and majority demonstrated statistical significance (P<0.01). Additionally, the intra-vendor reproducibility was greater than the inter-vendor ones.

MED-PHMar 6, 2025
An artificially intelligent magnetic resonance spectroscopy quantification method: Comparison between QNet and LCModel on the cloud computing platform CloudBrain-MRS

Meijin Lin, Lin Guo, Dicheng Chen et al.

Objctives: This work aimed to statistically compare the metabolite quantification of human brain magnetic resonance spectroscopy (MRS) between the deep learning method QNet and the classical method LCModel through an easy-to-use intelligent cloud computing platform CloudBrain-MRS. Materials and Methods: In this retrospective study, two 3 T MRI scanners Philips Ingenia and Achieva collected 61 and 46 in vivo 1H magnetic resonance (MR) spectra of healthy participants, respectively, from the brain region of pregenual anterior cingulate cortex from September to October 2021. The analyses of Bland-Altman, Pearson correlation and reasonability were performed to assess the degree of agreement, linear correlation and reasonability between the two quantification methods. Results: Fifteen healthy volunteers (12 females and 3 males, age range: 21-35 years, mean age/standard deviation = 27.4/3.9 years) were recruited. The analyses of Bland-Altman, Pearson correlation and reasonability showed high to good consistency and very strong to moderate correlation between the two methods for quantification of total N-acetylaspartate (tNAA), total choline (tCho), and inositol (Ins) (relative half interval of limits of agreement = 3.04%, 9.3%, and 18.5%, respectively; Pearson correlation coefficient r = 0.775, 0.927, and 0.469, respectively). In addition, quantification results of QNet are more likely to be closer to the previous reported average values than those of LCModel. Conclusion: There were high or good degrees of consistency between the quantification results of QNet and LCModel for tNAA, tCho, and Ins, and QNet generally has more reasonable quantification than LCModel.

CVJan 26, 2022
Self-supervised 3D Semantic Representation Learning for Vision-and-Language Navigation

Sinan Tan, Mengmeng Ge, Di Guo et al.

In the Vision-and-Language Navigation task, the embodied agent follows linguistic instructions and navigates to a specific goal. It is important in many practical scenarios and has attracted extensive attention from both computer vision and robotics communities. However, most existing works only use RGB images but neglect the 3D semantic information of the scene. To this end, we develop a novel self-supervised training framework to encode the voxel-level 3D semantic reconstruction into a 3D semantic representation. Specifically, a region query task is designed as the pretext task, which predicts the presence or absence of objects of a particular class in a specific 3D region. Then, we construct an LSTM-based navigation model and train it with the proposed 3D semantic representations and BERT language features on vision-language pairs. Experiments show that the proposed approach achieves success rates of 68% and 66% on the validation unseen and test unseen splits of the R2R dataset respectively, which are superior to most of RGB-based methods utilizing vision-language transformers.

IVDec 9, 2021
One-dimensional Deep Low-rank and Sparse Network for Accelerated MRI

Zi Wang, Chen Qian, Di Guo et al.

Deep learning has shown astonishing performance in accelerated magnetic resonance imaging (MRI). Most state-of-the-art deep learning reconstructions adopt the powerful convolutional neural network and perform 2D convolution since many magnetic resonance images or their corresponding k-space are in 2D. In this work, we present a new approach that explores the 1D convolution, making the deep network much easier to be trained and generalized. We further integrate the 1D convolution into the proposed deep network, named as One-dimensional Deep Low-rank and Sparse network (ODLS), which unrolls the iteration procedure of a low-rank and sparse reconstruction model. Extensive results on in vivo knee and brain datasets demonstrate that, the proposed ODLS is very suitable for the case of limited training subjects and provides improved reconstruction performance than state-of-the-art methods both visually and quantitatively. Additionally, ODLS also shows nice robustness to different undersampling scenarios and some mismatches between the training and test data. In summary, our work demonstrates that the 1D deep learning scheme is memory-efficient and robust in fast MRI.

ROSep 22, 2021
Audio-Visual Grounding Referring Expression for Robotic Manipulation

Yefei Wang, Kaili Wang, Yi Wang et al.

Referring expressions are commonly used when referring to a specific target in people's daily dialogue. In this paper, we develop a novel task of audio-visual grounding referring expression for robotic manipulation. The robot leverages both the audio and visual information to understand the referring expression in the given manipulation instruction and the corresponding manipulations are implemented. To solve the proposed task, an audio-visual framework is proposed for visual localization and sound recognition. We have also established a dataset which contains visual data, auditory data and manipulation instructions for evaluation. Finally, extensive experiments are conducted both offline and online to verify the effectiveness of the proposed audio-visual framework. And it is demonstrated that the robot performs better with the audio-visual data than with only the visual data.

AISep 20, 2021
Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

Xinzhu Liu, Di Guo, Huaping Liu et al.

In visual semantic navigation, the robot navigates to a target object with egocentric visual observations and the class label of the target is given. It is a meaningful task inspiring a surge of relevant research. However, most of the existing models are only effective for single-agent navigation, and a single agent has low efficiency and poor fault tolerance when completing more complicated tasks. Multi-agent collaboration can improve the efficiency and has strong application potentials. In this paper, we propose the multi-agent visual semantic navigation, in which multiple agents collaborate with others to find multiple target objects. It is a challenging task that requires agents to learn reasonable collaboration strategies to perform efficient exploration under the restrictions of communication bandwidth. We develop a hierarchical decision framework based on semantic mapping, scene prior knowledge, and communication mechanism to solve this task. The results of testing experiments in unseen scenes with both known objects and unknown objects illustrate the higher accuracy and efficiency of the proposed model compared with the single-agent model.

ROSep 16, 2021
Knowledge-based Embodied Question Answering

Sinan Tan, Mengmeng Ge, Di Guo et al.

In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly improves the efficiency for the multi-turn question answering. Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment. The proposed method is also applicable to multi-agent scenarios.

MED-PHJan 26, 2021
Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo Data

Dicheng Chen, Wanqi Hu, Huiting Liu et al.

Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a common setting M=128. Recently, deep learning has been introduced to improve the SNR but most of them use the simulated data as the training set. This may hinder the MRS applications since some potential differences, such as acquisition system imperfections, and physiological and psychologic conditions may exist between the simulated and in vivo data. Here, we proposed a new scheme that purely used the repeated samples of realistic data. A deep learning model, Refusion Long Short-Term Memory (ReLSTM), was designed to learn the mapping from the low SNR time-domain data (24 SA) to the high SNR one (128 SA). Experiments on the in vivo brain spectra of 7 healthy subjects, 2 brain tumor patients and 1 cerebral infarction patient showed that only using 20% repeated samples, the denoised spectra by ReLSTM could provide comparable estimated concentrations of metabolites to 128 SA. Compared with the state-of-the-art low-rank denoising method, the ReLSTM achieved the lower relative error and the Cramér-Rao lower bounds in quantifying some important biomarkers. In summary, ReLSTM can perform high-fidelity denoising of the spectra under fast acquisition (24 SA), which would be valuable to MRS clinical studies.

LGDec 29, 2020
A Sparse Model-inspired Deep Thresholding Network for Exponential Signal Reconstruction -- Application in Fast Biological Spectroscopy

Zi Wang, Di Guo, Zhangren Tu et al.

The non-uniform sampling is a powerful approach to enable fast acquisition but requires sophisticated reconstruction algorithms. Faithful reconstruction from partial sampled exponentials is highly expected in general signal processing and many applications. Deep learning has shown astonishing potential in this field but many existing problems, such as lack of robustness and explainability, greatly limit its applications. In this work, by combining merits of the sparse model-based optimization method and data-driven deep learning, we propose a deep learning architecture for spectra reconstruction from undersampled data, called MoDern. It follows the iterative reconstruction in solving a sparse model to build the neural network and we elaborately design a learnable soft-thresholding to adaptively eliminate the spectrum artifacts introduced by undersampling. Extensive results on both synthetic and biological data show that MoDern enables more robust, high-fidelity, and ultra-fast reconstruction than the state-of-the-art methods. Remarkably, MoDern has a small number of network parameters and is trained on solely synthetic data while generalizing well to biological data in various scenarios. Furthermore, we extend it to an open-access and easy-to-use cloud computing platform (XCloud-MoDern), contributing a promising strategy for further development of biological applications.

RONov 17, 2020
Fault-Aware Robust Control via Adversarial Reinforcement Learning

Fan Yang, Chao Yang, Di Guo et al.

Robots have limited adaptation ability compared to humans and animals in the case of damage. However, robot damages are prevalent in real-world applications, especially for robots deployed in extreme environments. The fragility of robots greatly limits their widespread application. We propose an adversarial reinforcement learning framework, which significantly increases robot robustness over joint damage cases in both manipulation tasks and locomotion tasks. The agent is trained iteratively under the joint damage cases where it has poor performance. We validate our algorithm on a three-fingered robot hand and a quadruped robot. Our algorithm can be trained only in simulation and directly deployed on a real robot without any fine-tuning. It also demonstrates exceeding success rates over arbitrary joint damage cases.

RONov 6, 2020
Adversarial Skill Learning for Robust Manipulation

Pingcheng Jian, Chao Yang, Di Guo et al.

Deep reinforcement learning has made significant progress in robotic manipulation tasks and it works well in the ideal disturbance-free environment. However, in a real-world environment, both internal and external disturbances are inevitable, thus the performance of the trained policy will dramatically drop. To improve the robustness of the policy, we introduce the adversarial training mechanism to the robotic manipulation tasks in this paper, and an adversarial skill learning algorithm based on soft actor-critic (SAC) is proposed for robust manipulation. Extensive experiments are conducted to demonstrate that the learned policy is robust to internal and external disturbances. Additionally, the proposed algorithm is evaluated in both the simulation environment and on the real robotic platform.

CVOct 7, 2020
Unsupervised Representation Learning by InvariancePropagation

Feng Wang, Huaping Liu, Di Guo et al.

Unsupervised learning methods based on contrastive learning have drawn increasing attention and achieved promising results. Most of them aim to learn representations invariant to instance-level variations, which are provided by different views of the same instance. In this paper, we propose Invariance Propagation to focus on learning representations invariant to category-level variations, which are provided by different instances from the same category. Our method recursively discovers semantically consistent samples residing in the same high-density regions in representation space. We demonstrate a hard sampling strategy to concentrate on maximizing the agreement between the anchor sample and its hard positive samples, which provide more intra-class variations to help capture more abstract invariance. As a result, with a ResNet-50 as the backbone, our method achieves 71.3% top-1 accuracy on ImageNet linear classification and 78.2% top-5 accuracy fine-tuning on only 1% labels, surpassing previous results. We also achieve state-of-the-art performance on other downstream tasks, including linear classification on Places205 and Pascal VOC, and transfer learning on small scale datasets.

ROApr 30, 2020
Towards Embodied Scene Description

Sinan Tan, Huaping Liu, Di Guo et al.

Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from the interaction between the agent and the environment. In this work, we propose the Embodied Scene Description, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks. A learning framework with the paradigms of imitation learning and reinforcement learning is established to teach the intelligent agent to generate corresponding sensorimotor activities. The proposed framework is tested on both the AI2Thor dataset and a real world robotic platform demonstrating the effectiveness and extendability of the developed method.

AIMar 10, 2020
MQA: Answering the Question via Robotic Manipulation

Yuhong Deng, Di Guo, Xiaofeng Guo et al.

In this paper, we propose a novel task, Manipulation Question Answering (MQA), where the robot performs manipulation actions to change the environment in order to answer a given question. To solve this problem, a framework consisting of a QA module and a manipulation module is proposed. For the QA module, we adopt the method for the Visual Question Answering (VQA) task. For the manipulation module, a Deep Q Network (DQN) model is designed to generate manipulation actions for the robot to interact with the environment. We consider the situation where the robot continuously manipulating objects inside a bin until the answer to the question is found. Besides, a novel dataset that contains a variety of object models, scenarios and corresponding question-answer pairs is established in a simulation environment. Extensive experiments have been conducted to validate the effectiveness of the proposed framework.

MED-PHJan 13, 2020
Review and Prospect: Deep Learning in Nuclear Magnetic Resonance Spectroscopy

Dicheng Chen, Zi Wang, Di Guo et al.

Since the concept of Deep Learning (DL) was formally proposed in 2006, it had a major impact on academic research and industry. Nowadays, DL provides an unprecedented way to analyze and process data with demonstrated great results in computer vision, medical imaging, natural language processing, etc. In this Minireview, we summarize applications of DL in Nuclear Magnetic Resonance (NMR) spectroscopy and outline a perspective for DL as entirely new approaches that are likely to transform NMR spectroscopy into a much more efficient and powerful technique in chemistry and life science.

IVSep 24, 2019
pISTA-SENSE-ResNet for Parallel MRI Reconstruction

Tieyuan Lu, Xinlin Zhang, Yihui Huang et al.

Magnetic resonance imaging has been widely applied in clinical diagnosis, however, is limited by its long data acquisition time. Although imaging can be accelerated by sparse sampling and parallel imaging, achieving promising reconstruction images with a fast reconstruction speed remains a challenge. Recently, deep learning approaches have attracted a lot of attention for its encouraging reconstruction results but without a proper interpretability. In this letter, to enable high-quality image reconstruction for the parallel magnetic resonance imaging, we design the network structure from the perspective of sparse iterative reconstruction and enhance it with the residual structure. The experimental results of a public knee dataset show that compared with the optimization-based method and the latest deep learning parallel imaging methods, the proposed network has less error in reconstruction and is more stable under different acceleration factors.

IVSep 17, 2019
A Guaranteed Convergence Analysis for the Projected Fast Iterative Soft-Thresholding Algorithm in Parallel MRI

Xinlin Zhang, Hengfa Lu, Di Guo et al.

The boom of non-uniform sampling and compressed sensing techniques dramatically alleviates the lengthy data acquisition problem of magnetic resonance imaging. Sparse reconstruction, thanks to its fast computation and promising performance, has attracted researchers to put numerous efforts on it and has been adopted in commercial scanners. To perform sparse reconstruction, choosing a proper algorithm is essential in providing satisfying results and saving time in tuning parameters. The pFISTA, a simple and efficient algorithm for sparse reconstruction, has been successfully extended to parallel imaging. However, its convergence criterion is still an open question. And the existing convergence criterion of single-coil pFISTA cannot be applied to the parallel imaging pFISTA, which, therefore, imposes confusions and difficulties on users about determining the only parameter - step size. In this work, we provide the guaranteed convergence analysis of the parallel imaging version pFISTA to solve the two well-known parallel imaging reconstruction models, SENSE and SPIRiT. Along with the convergence analysis, we provide recommended step size values for SENSE and SPIRiT reconstructions to obtain fast and promising reconstructions. Experiments on in vivo brain images demonstrate the validity of the convergence criterion. Besides, experimental results show that compared to using backtracking and power iteration to determine the step size, our recommended step size achieves more than five times acceleration in reconstruction time in most tested cases.

MED-PHApr 9, 2019
Accelerated Nuclear Magnetic Resonance Spectroscopy with Deep Learning

Xiaobo Qu, Yihui Huang, Hengfa Lu et al.

Nuclear magnetic resonance (NMR) spectroscopy serves as an indispensable tool in chemistry and biology but often suffers from long experimental time. We present a proof-of-concept of application of deep learning and neural network for high-quality, reliable, and very fast NMR spectra reconstruction from limited experimental data. We show that the neural network training can be achieved using solely synthetic NMR signal, which lifts the prohibiting demand for a large volume of realistic training data usually required in the deep learning approach.

MLApr 6, 2016
Hankel Matrix Nuclear Norm Regularized Tensor Completion for $N$-dimensional Exponential Signals

Jiaxi Ying, Hengfa Lu, Qingtao Wei et al.

Signals are generally modeled as a superposition of exponential functions in spectroscopy of chemistry, biology and medical imaging. For fast data acquisition or other inevitable reasons, however, only a small amount of samples may be acquired and thus how to recover the full signal becomes an active research topic. But existing approaches can not efficiently recover $N$-dimensional exponential signals with $N\geq 3$. In this paper, we study the problem of recovering N-dimensional (particularly $N\geq 3$) exponential signals from partial observations, and formulate this problem as a low-rank tensor completion problem with exponential factor vectors. The full signal is reconstructed by simultaneously exploiting the CANDECOMP/PARAFAC structure and the exponential structure of the associated factor vectors. The latter is promoted by minimizing an objective function involving the nuclear norm of Hankel matrices. Experimental results on simulated and real magnetic resonance spectroscopy data show that the proposed approach can successfully recover full signals from very limited samples and is robust to the estimated tensor rank.

MED-PHApr 29, 2015
Projected Iterative Soft-thresholding Algorithm for Tight Frames in Compressed Sensing Magnetic Resonance Imaging

Yunsong Liu, Zhifang Zhan, Jian-Feng Cai et al.

Compressed sensing has shown great potentials in accelerating magnetic resonance imaging. Fast image reconstruction and high image quality are two main issues faced by this new technology. It has been shown that, redundant image representations, e.g. tight frames, can significantly improve the image quality. But how to efficiently solve the reconstruction problem with these redundant representation systems is still challenging. This paper attempts to address the problem of applying iterative soft-thresholding algorithm (ISTA) to tight frames based magnetic resonance image reconstruction. By introducing the canonical dual frame to construct the orthogonal projection operator on the range of the analysis sparsity operator, we propose a projected iterative soft-thresholding algorithm (pISTA) and further accelerate it by incorporating the strategy proposed by Beck and Teboulle in 2009. We theoretically prove that pISTA converges to the minimum of a function with a balanced tight frame sparsity. Experimental results demonstrate that the proposed algorithm achieves better reconstruction than the widely used synthesis sparse model and the accelerated pISTA converges faster or comparable to the state-of-art smoothing FISTA. One major advantage of pISTA is that only one extra parameter, the step size, is introduced and the numerical solution is stable to it in terms of image reconstruction errors, thus allowing easily setting in many fast magnetic resonance imaging applications.

CVMar 10, 2015
Fast Multi-class Dictionaries Learning with Geometrical Directions in MRI Reconstruction

Zhifang Zhan, Jian-Feng Cai, Di Guo et al.

Objective: Improve the reconstructed image with fast and multi-class dictionaries learning when magnetic resonance imaging is accelerated by undersampling the k-space data. Methods: A fast orthogonal dictionary learning method is introduced into magnetic resonance image reconstruction to providing adaptive sparse representation of images. To enhance the sparsity, image is divided into classified patches according to the same geometrical direction and dictionary is trained within each class. A new sparse reconstruction model with the multi-class dictionaries is proposed and solved using a fast alternating direction method of multipliers. Results: Experiments on phantom and brain imaging data with acceleration factor up to 10 and various undersampling patterns are conducted. The proposed method is compared with state-of-the-art magnetic resonance image reconstruction methods. Conclusion: Artifacts are better suppressed and image edges are better preserved than the compared methods. Besides, the computation of the proposed approach is much faster than the typical K-SVD dictionary learning method in magnetic resonance image reconstruction. Significance: The proposed method can be exploited in undersapmled magnetic resonance imaging to reduce data acquisition time and reconstruct images with better image quality.

CVJan 23, 2013
Spread spectrum compressed sensing MRI using chirp radio frequency pulses

Xiaobo Qu, Ying Chen, Xiaoxing Zhuang et al.

Compressed sensing has shown great potential in reducing data acquisition time in magnetic resonance imaging (MRI). Recently, a spread spectrum compressed sensing MRI method modulates an image with a quadratic phase. It performs better than the conventional compressed sensing MRI with variable density sampling, since the coherence between the sensing and sparsity bases are reduced. However, spread spectrum in that method is implemented via a shim coil which limits its modulation intensity and is not convenient to operate. In this letter, we propose to apply chirp (linear frequency-swept) radio frequency pulses to easily control the spread spectrum. To accelerate the image reconstruction, an alternating direction algorithm is modified by exploiting the complex orthogonality of the quadratic phase encoding. Reconstruction on the acquired data demonstrates that more image features are preserved using the proposed approach than those of conventional CS-MRI.