Jingwei Song

h-index14

16papers

599citations

Novelty48%

AI Score35

Ranked #107,896 of 194,257 authors (top 56%)#36,083 in CV (top 61%)

16 Papers

5.7CVMay 6, 2022Code

BDIS: Bayesian Dense Inverse Searching Method for Real-Time Stereo Surgical Image Matching

Jingwei Song, Qiuchen Zhu, Jianyu Lin et al.

In stereoscope-based Minimally Invasive Surgeries (MIS), dense stereo matching plays an indispensable role in 3D shape recovery, AR, VR, and navigation tasks. Although numerous Deep Neural Network (DNN) approaches are proposed, the conventional prior-free approaches are still popular in the industry because of the lack of open-source annotated data set and the limitation of the task-specific pre-trained DNNs. Among the prior-free stereo matching algorithms, there is no successful real-time algorithm in none GPU environment for MIS. This paper proposes the first CPU-level real-time prior-free stereo matching algorithm for general MIS tasks. We achieve an average 17 Hz on 640*480 images with a single-core CPU (i5-9400) for surgical images. Meanwhile, it achieves slightly better accuracy than the popular ELAS. The patch-based fast disparity searching algorithm is adopted for the rectified stereo images. A coarse-to-fine Bayesian probability and a spatial Gaussian mixed model were proposed to evaluate the patch probability at different scales. An optional probability density function estimation algorithm was adopted to quantify the prediction variance. Extensive experiments demonstrated the proposed method's capability to handle ambiguities introduced by the textureless surfaces and the photometric inconsistency from the non-Lambertian reflectance and dark illumination. The estimated probability managed to balance the confidences of the patches for stereo images at different scales. It has similar or higher accuracy and fewer outliers than the baseline ELAS in MIS, while it is 4-5 times faster. The code and the synthetic data sets are available at https://github.com/JingweiSong/BDIS-v2.

3.7CVSep 27, 2024

DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences

Jingwei Song, Maani Ghaffari

This paper addresses a special Perspective-n-Point (PnP) problem: estimating the optimal pose to align 3D and 2D shapes in real-time without correspondences, termed as correspondence-free PnP. While several studies have focused on 3D and 2D shape registration, achieving both real-time and accurate performance remains challenging. This study specifically targets the 3D-2D geometric shape registration tasks, applying the recently developed Reproducing Kernel Hilbert Space (RKHS) to address the "big-to-small" issue. An iterative reweighted least squares method is employed to solve the RKHS-based formulation efficiently. Moreover, our work identifies a unique and interesting observability issue in correspondence-free PnP: the numerical ambiguity between rotation and translation. To address this, we proposed DynaWeightPnP, introducing a dynamic weighting sub-problem and an alternative searching algorithm designed to enhance pose estimation and alignment accuracy. Experiments were conducted on a typical case, that is, a 3D-2D vascular centerline registration task within Endovascular Image-Guided Interventions (EIGIs). Results demonstrated that the proposed algorithm achieves registration processing rates of 60 Hz (without post-refinement) and 31 Hz (with post-refinement) on modern single-core CPUs, with competitive accuracy comparable to existing methods. These results underscore the suitability of DynaWeightPnP for future robot navigation tasks like EIGIs.

2.2ROSep 18, 2024

SLAM assisted 3D tracking system for laparoscopic surgery

Jingwei Song, Ray Zhang, Wenwei Zhang et al.

A major limitation of minimally invasive surgery is the difficulty in accurately locating the internal anatomical structures of the target organ due to the lack of tactile feedback and transparency. Augmented reality (AR) offers a promising solution to overcome this challenge. Numerous studies have shown that combining learning-based and geometric methods can achieve accurate preoperative and intraoperative data registration. This work proposes a real-time monocular 3D tracking algorithm for post-registration tasks. The ORB-SLAM2 framework is adopted and modified for prior-based 3D tracking. The primitive 3D shape is used for fast initialization of the monocular SLAM. A pseudo-segmentation strategy is employed to separate the target organ from the background for tracking purposes, and the geometric prior of the 3D shape is incorporated as an additional constraint in the pose graph. Experiments from in-vivo and ex-vivo tests demonstrate that the proposed 3D tracking system provides robust 3D tracking and effectively handles typical challenges such as fast motion, out-of-field-of-view scenarios, partial visibility, and "organ-background" relative motion.

15.5CLFeb 17, 2025Code

Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities

Hanbin Wang, Xiaoxuan Zhou, Zhipeng Xu et al.

This paper introduces Code-Vision, a benchmark designed to evaluate the logical understanding and code generation capabilities of Multimodal Large Language Models (MLLMs). It challenges MLLMs to generate a correct program that fulfills specific functionality requirements based on a given flowchart, which visually represents the desired algorithm or process. Code-Vision comprises three subsets: HumanEval-V, Algorithm, and MATH, which evaluate MLLMs' coding abilities across basic programming, algorithmic, and mathematical problem-solving domains. Our experiments evaluate 12 MLLMs on Code-Vision. Experimental results demonstrate that there is a large performance difference between proprietary and open-source models. On Hard problems, GPT-4o can achieve 79.3% pass@1, but the best open-source model only achieves 15%. Further experiments reveal that Code-Vision can pose unique challenges compared to other multimodal reasoning benchmarks MMCode and MathVista. We also explore the reason for the poor performance of the open-source models. All data and codes are available at https://github.com/wanghanbinpanda/CodeVision.

3.7IVApr 23, 2020

Uncertainty Quantification for Hyperspectral Image Denoising Frameworks based on Low-rank Matrix Approximation

Jingwei Song, Shaobo Xia, Jun Wang et al.

Sliding-window based low-rank matrix approximation (LRMA) is a technique widely used in hyperspectral images (HSIs) denoising or completion. However, the uncertainty quantification of the restored HSI has not been addressed to date. Accurate uncertainty quantification of the denoised HSI facilitates to applications such as multi-source or multi-scale data fusion, data assimilation, and product uncertainty quantification, since these applications require an accurate approach to describe the statistical distributions of the input data. Therefore, we propose a prior-free closed-form element-wise uncertainty quantification method for LRMA-based HSI restoration. Our closed-form algorithm overcomes the difficulty of the HSI patch mixing problem caused by the sliding-window strategy used in the conventional LRMA process. The proposed approach only requires the uncertainty of the observed HSI and provides the uncertainty result relatively rapidly and with similar computational complexity as the LRMA technique. We conduct extensive experiments to validate the estimation accuracy of the proposed closed-form uncertainty approach. The method is robust to at least 10% random impulse noise at the cost of 10-20% of additional processing time compared to the LRMA. The experiments indicate that the proposed closed-form uncertainty quantification method is more applicable to real-world applications than the baseline Monte Carlo test, which is computationally expensive. The code is available in the attachment and will be released after the acceptance of this paper.

1.9RODec 25, 2023Code

BDIS-SLAM: A lightweight CPU-based dense stereo SLAM for surgery

Jingwei Song, Ray Zhang, Qiuchen Zhu et al.

Purpose: Common dense stereo Simultaneous Localization and Mapping (SLAM) approaches in Minimally Invasive Surgery (MIS) require high-end parallel computational resources for real-time implementation. Yet, it is not always feasible since the computational resources should be allocated to other tasks like segmentation, detection, and tracking. To solve the problem of limited parallel computational power, this research aims at a lightweight dense stereo SLAM system that works on a single-core CPU and achieves real-time performance (more than 30 Hz in typical scenarios). Methods: A new dense stereo mapping module is integrated with the ORB-SLAM2 system and named BDIS-SLAM. Our new dense stereo mapping module includes stereo matching and 3D dense depth mosaic methods. Stereo matching is achieved with the recently proposed CPU-level real-time matching algorithm Bayesian Dense Inverse Searching (BDIS). A BDIS-based shape recovery and a depth mosaic strategy are integrated as a new thread and coupled with the backbone ORB-SLAM2 system for real-time stereo shape recovery. Results: Experiments on in-vivo data sets show that BDIS-SLAM runs at over 30 Hz speed on modern single-core CPU in typical endoscopy/colonoscopy scenarios. BDIS-SLAM only consumes around an additional 12% time compared with the backbone ORB-SLAM2. Although our lightweight BDIS-SLAM simplifies the process by ignoring deformation and fusion procedures, it can provide a usable dense mapping for modern MIS on computationally constrained devices. Conclusion: The proposed BDIS-SLAM is a lightweight stereo dense SLAM system for MIS. It achieves 30 Hz on a modern single-core CPU in typical endoscopy/colonoscopy scenarios (image size around 640*480). BDIS-SLAM provides a low-cost solution for dense mapping in MIS and has the potential to be applied in surgical robots and AR systems.

3.7CVJan 5, 2022

Fusing Convolutional Neural Network and Geometric Constraint for Image-based Indoor Localization

Jingwei Song, Mitesh Patel, Maani Ghaffari

This paper proposes a new image-based localization framework that explicitly localizes the camera/robot by fusing Convolutional Neural Network (CNN) and sequential images' geometric constraints. The camera is localized using a single or few observed images and training images with 6-degree-of-freedom pose labels. A Siamese network structure is adopted to train an image descriptor network, and the visually similar candidate image in the training set is retrieved to localize the testing image geometrically. Meanwhile, a probabilistic motion model predicts the pose based on a constant velocity assumption. The two estimated poses are finally fused using their uncertainties to yield an accurate pose prediction. This method leverages the geometric uncertainty and is applicable in indoor scenarios predominated by diffuse illumination. Experiments on simulation and real data sets demonstrate the efficiency of our proposed method. The results further show that combining the CNN-based framework with geometric constraint achieves better accuracy when compared with CNN-only methods, especially when the training data size is small.

3.7CVJun 14, 2021Code

Bayesian dense inverse searching algorithm for real-time stereo matching in minimally invasive surgery

Jingwei Song, Qiuchen Zhu, Jianyu Lin et al.

This paper reports a CPU-level real-time stereo matching method for surgical images (10 Hz on 640 * 480 image with a single core of i5-9400). The proposed method is built on the fast ''dense inverse searching'' algorithm, which estimates the disparity of the stereo images. The overlapping image patches (arbitrary squared image segment) from the images at different scales are aligned based on the photometric consistency presumption. We propose a Bayesian framework to evaluate the probability of the optimized patch disparity at different scales. Moreover, we introduce a spatial Gaussian mixed probability distribution to address the pixel-wise probability within the patch. In-vivo and synthetic experiments show that our method can handle ambiguities resulted from the textureless surfaces and the photometric inconsistency caused by the Lambertian reflectance. Our Bayesian method correctly balances the probability of the patch for stereo images at different scales. Experiments indicate that the estimated depth has higher accuracy and fewer outliers than the baseline methods in the surgical scenario.

3.3CVMay 11, 2020

Combining Deep Learning with Geometric Features for Image based Localization in the Gastrointestinal Tract

Jingwei Song, Mitesh Patel, Andreas Girgensohn et al.

Tracking monocular colonoscope in the Gastrointestinal tract (GI) is a challenging problem as the images suffer from deformation, blurred textures, significant changes in appearance. They greatly restrict the tracking ability of conventional geometry based methods. Even though Deep Learning (DL) can overcome these issues, limited labeling data is a roadblock to state-of-art DL method. Considering these, we propose a novel approach to combine DL method with traditional feature based approach to achieve better localization with small training data. Our method fully exploits the best of both worlds by introducing a Siamese network structure to perform few-shot classification to the closest zone in the segmented training image set. The classified label is further adopted to initialize the pose of scope. To fully use the training dataset, a pre-generated triangulated map points within the zone in the training set are registered with observation and contribute to estimating the optimal pose of the test image. The proposed hybrid method is extensively tested and compared with existing methods, and the result shows significant improvement over traditional geometric based or DL based localization. The accuracy is improved by 28.94% (Position) and 10.97% (Orientation) with respect to state-of-art method.

1.2CVMay 10, 2020Code

A Closed-Form Uncertainty Propagation in Non-Rigid Structure from Motion

Jingwei Song, Mitesh Patel, Ashkan Jasour et al.

Semi-Definite Programming (SDP) with low-rank prior has been widely applied in Non-Rigid Structure from Motion (NRSfM). Based on a low-rank constraint, it avoids the inherent ambiguity of basis number selection in conventional base-shape or base-trajectory methods. Despite the efficiency in deformable shape reconstruction, it remains unclear how to assess the uncertainty of the recovered shape from the SDP process. In this paper, we present a statistical inference on the element-wise uncertainty quantification of the estimated deforming 3D shape points in the case of the exact low-rank SDP problem. A closed-form uncertainty quantification method is proposed and tested. Moreover, we extend the exact low-rank uncertainty quantification to the approximate low-rank scenario with a numerical optimal rank selection method, which enables solving practical application in SDP based NRSfM scenario. The proposed method provides an independent module to the SDP method and only requires the statistic information of the input 2D tracked points. Extensive experiments prove that the output 3D points have identical normal distribution to the 2D trackings, the proposed method and quantify the uncertainty accurately, and supports that it has desirable effects on routinely SDP low-rank based NRSfM solver.

16.3CVMar 22, 2020

Dynamic Reconstruction of Deformable Soft-tissue with Stereo Scope in Minimal Invasive Surgery

Jingwei Song, Jun Wang, Liang Zhao et al.

In minimal invasive surgery, it is important to rebuild and visualize the latest deformed shape of soft-tissue surfaces to mitigate tissue damages. This paper proposes an innovative Simultaneous Localization and Mapping (SLAM) algorithm for deformable dense reconstruction of surfaces using a sequence of images from a stereoscope. We introduce a warping field based on the Embedded Deformation (ED) nodes with 3D shapes recovered from consecutive pairs of stereo images. The warping field is estimated by deforming the last updated model to the current live model. Our SLAM system can: (1) Incrementally build a live model by progressively fusing new observations with vivid accurate texture. (2) Estimate the deformed shape of unobserved region with the principle As-Rigid-As-Possible. (3) Show the consecutive shape of models. (4) Estimate the current relative pose between the soft-tissue and the scope. In-vivo experiments with publicly available datasets demonstrate that the 3D models can be incrementally built for different soft-tissues with different deformations from sequences of stereo images obtained by laparoscopes. Results show the potential clinical application of our SLAM system for providing surgeon useful shape and texture information in minimal invasive surgery.

24.6RONov 13, 2019

Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM

Xuesong Shi, Dongjiang Li, Pengpeng Zhao et al.

Service robots should be able to operate autonomously in dynamic and daily changing environments over an extended period of time. While Simultaneous Localization And Mapping (SLAM) is one of the most fundamental problems for robotic autonomy, most existing SLAM works are evaluated with data sequences that are recorded in a short period of time. In real-world deployment, there can be out-of-sight scene changes caused by both natural factors and human activities. For example, in home scenarios, most objects may be movable, replaceable or deformable, and the visual features of the same place may be significantly different in some successive days. Such out-of-sight dynamics pose great challenges to the robustness of pose estimation, and hence a robot's long-term deployment and operation. To differentiate the forementioned problem from the conventional works which are usually evaluated in a static setting in a single run, the term \textit{lifelong SLAM} is used here to address SLAM problems in an ever-changing environment over a long period of time. To accelerate lifelong SLAM research, we release the OpenLORIS-Scene datasets. The data are collected in real-world indoor scenes, for multiple times in each place to include scene changes in real life. We also design benchmarking metrics for lifelong SLAM, with which the robustness and accuracy of pose estimation are evaluated separately. The datasets and benchmark are available online at https://lifelong-robotic-vision.github.io/dataset/scene.

4.9ROJun 20, 2019

An observable time series based SLAM algorithm for deforming environment

Jingwei Song, Liang Zhao, Shoudong Huang et al.

In this paper, we study the back-end of simultaneous localization and mapping (SLAM) problem in deforming environment, where robot localizes itself and tracks multiple non-rigid soft surface using its onboard sensor measurements. An elaborate analysis is conducted on conventional deformation modelling method, Embedded Deformation (ED) graph. We demonstrate and prove that the ED graph widely used in such scenarios is unobservable and leads to multiple solutions unless suitable priors are provided. Example as well as theoretical prove are provided to show the ambiguity of ED graph and camera pose. In modelling non-rigid scenario with ED graph, motion priors of the deforming environment is essential to separate robot pose and deforming environment. The conclusion can be extrapolated to any free form deformation formulation. In solving the observability, this research proposes a preliminary deformable SLAM approach to estimate robot pose in complex environments that exhibits regular motion. A strategy that approximates deformed shape using a linear combination of several previous shapes is proposed to avoid the ambiguity in robot movement and rigid and non-rigid motions of the environment. Fisher information matrix rank analysis with a base case is discussed to prove the effectiveness. Moreover, the proposed algorithm is validated extensively on Monte Carlo simulations and real experiments. It is demonstrated that the new algorithm significantly outperforms conventional rigid SLAM and ED based SLAM especially in scenarios where there is large deformation.

4.9ROJun 20, 2019

Efficient two step optimization for large embedded deformation graph based SLAM

Jingwei Song, Fang Bai, Liang Zhao et al.

Embedded deformation nodes based formulation has been widely applied in deformable geometry and graphical problems. Though being promising in stereo (or RGBD) sensor based SLAM applications, it remains challenging to keep constant speed in deformation nodes parameter estimation when model grows larger. In practice, the processing time grows rapidly in accordance with the expansion of maps. In this paper, we propose an approach to decouple nodes of deformation graph in large scale dense deformable SLAM and keep the estimation time to be constant. We observe that only partial deformable nodes in the graph are connected to visible points. Based on this fact, sparsity of original Hessian matrix is utilized to split parameter estimation in two independent steps. With this new technique, we achieve faster parameter estimation with amortized computation complexity reduced from O(n^2) to closing O(1). As a result, the computation cost barely increases as the map keeps growing. Based on our strategy, computational bottleneck in large scale embedded deformation graph based applications will be greatly mitigated. The effectiveness is validated by experiments, featuring large scale deformation scenarios.

9.6CVMar 6, 2018

MIS-SLAM: Real-time Large Scale Dense Deformable SLAM System in Minimal Invasive Surgery Based on Heterogeneous Computing

Jingwei Song, Jun Wang, Liang Zhao et al.

Real-time simultaneously localization and dense mapping is very helpful for providing Virtual Reality and Augmented Reality for surgeons or even surgical robots. In this paper, we propose MIS-SLAM: a complete real-time large scale dense deformable SLAM system with stereoscope in Minimal Invasive Surgery based on heterogeneous computing by making full use of CPU and GPU. Idled CPU is used to perform ORB- SLAM for providing robust global pose. Strategies are taken to integrate modules from CPU and GPU. We solved the key problem raised in previous work, that is, fast movement of scope and blurry images make the scope tracking fail. Benefiting from improved localization, MIS-SLAM can achieve large scale scope localizing and dense mapping in real-time. It transforms and deforms current model and incrementally fuses new observation while keeping vivid texture. In-vivo experiments conducted on publicly available datasets presented in the form of videos demonstrate the feasibility and practicality of MIS-SLAM for potential clinical purpose.

11.5ROFeb 22, 2017Code

Convergence and Consistency Analysis for A 3D Invariant-EKF SLAM

Teng Zhang, Kanzhi Wu, Jingwei Song et al.

In this paper, we investigate the convergence and consistency properties of an Invariant-Extended Kalman Filter (RI-EKF) based Simultaneous Localization and Mapping (SLAM) algorithm. Basic convergence properties of this algorithm are proven. These proofs do not require the restrictive assumption that the Jacobians of the motion and observation models need to be evaluated at the ground truth. It is also shown that the output of RI-EKF is invariant under any stochastic rigid body transformation in contrast to $\mathbb{SO}(3)$ based EKF SLAM algorithm ($\mathbb{SO}(3)$-EKF) that is only invariant under deterministic rigid body transformation. Implications of these invariance properties on the consistency of the estimator are also discussed. Monte Carlo simulation results demonstrate that RI-EKF outperforms $\mathbb{SO}(3)$-EKF, Robocentric-EKF and the "First Estimates Jacobian" EKF, for 3D point feature based SLAM.