ASFeb 27, 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training ModelJaeyoung Huh, Sangjoon Park, Jeong Eun Lee et al.
Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on typists to transcribe their voice recordings. However, developing an STT model for the medical domain is challenging due to the lack of sufficient speech and text datasets. To address this issue, we propose a medical-domain text correction method that modifies the output text of a general STT system using the Vision Language Pre-training (VLP) method. VLP combines textual and visual information to correct text based on image knowledge. Our extensive experiments demonstrate that the proposed method offers quantitatively and clinically significant improvements in STT performance in the medical field. We further show that multi-modal understanding of image and text information outperforms single-modal understanding using only text information.
IVJan 29
Blind Ultrasound Image Enhancement via Self-Supervised Physics-Guided Degradation ModelingShujaat Khan, Syed Muhammad Atif, Jaeyoung Huh et al.
Ultrasound (US) interpretation is hampered by multiplicative speckle, acquisition blur from the point-spread function (PSF), and scanner- and operator-dependent artifacts. Supervised enhancement methods assume access to clean targets or known degradations; conditions rarely met in practice. We present a blind, self-supervised enhancement framework that jointly deconvolves and denoises B-mode images using a Swin Convolutional U-Net trained with a \emph{physics-guided} degradation model. From each training frame, we extract rotated/cropped patches and synthesize inputs by (i) convolving with a Gaussian PSF surrogate and (ii) injecting noise via either spatial additive Gaussian noise or complex Fourier-domain perturbations that emulate phase/magnitude distortions. For US scans, clean-like targets are obtained via non-local low-rank (NLLR) denoising, removing the need for ground truth; for natural images, the originals serve as targets. Trained and validated on UDIAT~B, JNU-IFM, and XPIE Set-P, and evaluated additionally on a 700-image PSFHS test set, the method achieves the highest PSNR/SSIM across Gaussian and speckle noise levels, with margins that widen under stronger corruption. Relative to MSANN, Restormer, and DnCNN, it typically preserves an extra $\sim$1--4\,dB PSNR and 0.05--0.15 SSIM in heavy Gaussian noise, and $\sim$2--5\,dB PSNR and 0.05--0.20 SSIM under severe speckle. Controlled PSF studies show reduced FWHM and higher peak gradients, evidence of resolution recovery without edge erosion. Used as a plug-and-play preprocessor, it consistently boosts Dice for fetal head and pubic symphysis segmentation. Overall, the approach offers a practical, assumption-light path to robust US enhancement that generalizes across datasets, scanners, and degradation types.
AISep 25, 2024
AI-driven View Guidance System in Intra-cardiac Echocardiography ImagingJaeyoung Huh, Paul Klein, Gareth Funka-Lea et al.
Intra-cardiac echocardiography (ICE) is a crucial imaging modality used in electrophysiology (EP) and structural heart disease (SHD) interventions, providing realtime, high-resolution views from within the heart. Despite its advantages, effective manipulation of the ICE catheter requires significant expertise, which can lead to inconsistent outcomes, especially among less experienced operators. To address this challenge, we propose an AIdriven view guidance system that operates in a continuous closed-loop with human-in-the-loop feedback, designed to assist users in navigating ICE imaging without requiring specialized knowledge. Specifically, our method models the relative position and orientation vectors between arbitrary views and clinically defined ICE views in a spatial coordinate system. It guides users on how to manipulate the ICE catheter to transition from the current view to the desired view over time. By operating in a closedloop configuration, the system continuously predicts and updates the necessary catheter manipulations, ensuring seamless integration into existing clinical workflows. The effectiveness of the proposed system is demonstrated through a simulation-based performance evaluation using real clinical data, achieving an 89% success rate with 6,532 test cases. Additionally, a semi-simulation experiment with human-in-the-loop testing validated the feasibility of continuous yet discrete guidance. These results underscore the potential of the proposed method to enhance the accuracy and efficiency of ICE imaging procedures.
IVDec 5, 2023
Breast Ultrasound Report Generation using LangChainJaeyoung Huh, Hyun Jeong Park, Jong Chul Ye
Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multiple images of various parts to compile a single report, presenting a time-consuming challenge. To address this problem, we propose the integration of multiple image analysis tools through a LangChain using Large Language Models (LLM), into the breast reporting process. Through a combination of designated tools and text generation through LangChain, our method can accurately extract relevant features from ultrasound images, interpret them in a clinical context, and produce comprehensive and standardized reports. This approach not only reduces the burden on radiologists and healthcare professionals but also enhances the consistency and quality of reports. The extensive experiments shows that each tools involved in the proposed method can offer qualitatively and quantitatively significant results. Furthermore, clinical evaluation on the generated reports demonstrates that the proposed method can make report in clinically meaningful way.
IVMay 8, 2025
Guidance for Intra-cardiac Echocardiography Manipulation to Maintain Continuous Therapy Device Tip VisibilityJaeyoung Huh, Ankur Kapoor, Young-Ho Kim
Intra-cardiac Echocardiography (ICE) plays a critical role in Electrophysiology (EP) and Structural Heart Disease (SHD) interventions by providing real-time visualization of intracardiac structures. However, maintaining continuous visibility of the therapy device tip remains a challenge due to frequent adjustments required during manual ICE catheter manipulation. To address this, we propose an AI-driven tracking model that estimates the device tip incident angle and passing point within the ICE imaging plane, ensuring continuous visibility and facilitating robotic ICE catheter control. A key innovation of our approach is the hybrid dataset generation strategy, which combines clinical ICE sequences with synthetic data augmentation to enhance model robustness. We collected ICE images in a water chamber setup, equipping both the ICE catheter and device tip with electromagnetic (EM) sensors to establish precise ground-truth locations. Synthetic sequences were created by overlaying catheter tips onto real ICE images, preserving motion continuity while simulating diverse anatomical scenarios. The final dataset consists of 5,698 ICE-tip image pairs, ensuring comprehensive training coverage. Our model architecture integrates a pretrained ultrasound (US) foundation model, trained on 37.4M echocardiography images, for feature extraction. A transformer-based network processes sequential ICE frames, leveraging historical passing points and incident angles to improve prediction accuracy. Experimental results demonstrate that our method achieves 3.32 degree entry angle error, 12.76 degree rotation angle error. This AI-driven framework lays the foundation for real-time robotic ICE catheter adjustments, minimizing operator workload while ensuring consistent therapy device visibility. Future work will focus on expanding clinical datasets to further enhance model generalization.
IVMay 7, 2025
Pose Estimation for Intra-cardiac Echocardiography Catheter via AI-Based Anatomical UnderstandingJaeyoung Huh, Ankur Kapoor, Young-Ho Kim
Intra-cardiac Echocardiography (ICE) plays a crucial role in Electrophysiology (EP) and Structural Heart Disease (SHD) interventions by providing high-resolution, real-time imaging of cardiac structures. However, existing navigation methods rely on electromagnetic (EM) tracking, which is susceptible to interference and position drift, or require manual adjustments based on operator expertise. To overcome these limitations, we propose a novel anatomy-aware pose estimation system that determines the ICE catheter position and orientation solely from ICE images, eliminating the need for external tracking sensors. Our approach leverages a Vision Transformer (ViT)-based deep learning model, which captures spatial relationships between ICE images and anatomical structures. The model is trained on a clinically acquired dataset of 851 subjects, including ICE images paired with position and orientation labels normalized to the left atrium (LA) mesh. ICE images are patchified into 16x16 embeddings and processed through a transformer network, where a [CLS] token independently predicts position and orientation via separate linear layers. The model is optimized using a Mean Squared Error (MSE) loss function, balancing positional and orientational accuracy. Experimental results demonstrate an average positional error of 9.48 mm and orientation errors of (16.13 deg, 8.98 deg, 10.47 deg) across x, y, and z axes, confirming the model accuracy. Qualitative assessments further validate alignment between predicted and target views within 3D cardiac meshes. This AI-driven system enhances procedural efficiency, reduces operator workload, and enables real-time ICE catheter localization for tracking-free procedures. The proposed method can function independently or complement existing mapping systems like CARTO, offering a transformative approach to ICE-guided interventions.
IVFeb 16, 2022
Phase Aberration Robust Beamformer for Planewave US Using Self-Supervised LearningShujaat Khan, Jaeyoung Huh, Jong Chul Ye
Ultrasound (US) is widely used for clinical imaging applications thanks to its real-time and non-invasive nature. However, its lesion detectability is often limited in many applications due to the phase aberration artefact caused by variations in the speed of sound (SoS) within body parts. To address this, here we propose a novel self-supervised 3D CNN that enables phase aberration robust plane-wave imaging. Instead of aiming at estimating the SoS distribution as in conventional methods, our approach is unique in that the network is trained in a self-supervised manner to robustly generate a high-quality image from various phase aberrated images by modeling the variation in the speed of sound as stochastic. Experimental results using real measurements from tissue-mimicking phantom and \textit{in vivo} scans confirmed that the proposed method can significantly reduce the phase aberration artifacts and improve the visual quality of deep scans.
IVDec 6, 2021
Tunable Image Quality Control of 3-D Ultrasound using Switchable CycleGANJaeyoung Huh, Shujaat Khan, Sungjin Choi et al.
In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for example, the image quality is comparable along the beam direction, but significant deterioration in image quality is often observed in the other two axial image planes. To address this, here we propose a novel unsupervised deep learning approach to improve 3-D US image quality. In particular, using {\em unmatched} high-quality 2-D US images as a reference, we trained a recently proposed switchable CycleGAN architecture so that every mapping plane in 3-D US can learn the image quality of 2-D US images. Thanks to the switchable architecture, our network can also provide real-time control of image enhancement level based on user preference, which is ideal for a user-centric scanner setup. Extensive experiments with clinical evaluation confirm that our method offers significantly improved image quality as well user-friendly flexibility.
IVMar 16, 2021
Missing Cone Artifacts Removal in ODT using Unsupervised Deep Learning in Projection DomainHyungjin Chung, Jaeyoung Huh, Geon Kim et al.
Optical diffraction tomography (ODT) produces three dimensional distribution of refractive index (RI) by measuring scattering fields at various angles. Although the distribution of RI index is highly informative, due to the missing cone problem stemming from the limited-angle acquisition of holograms, reconstructions have very poor resolution along axial direction compared to the horizontal imaging plane. To solve this issue, here we present a novel unsupervised deep learning framework, which learns the probability distribution of missing projection views through optimal transport driven cycleGAN. Experimental results show that missing cone artifact in ODT can be significantly resolved by the proposed method.
IVAug 31, 2020
Switchable Deep BeamformerShujaat Khan, Jaeyoung Huh, Jong Chul Ye
Recent proposals of deep beamformers using deep neural networks have attracted significant attention as computational efficient alternatives to adaptive and compressive beamformers. Moreover, deep beamformers are versatile in that image post-processing algorithms can be combined with the beamforming. Unfortunately, in the current technology, a separate beamformer should be trained and stored for each application, demanding significant scanner resources. To address this problem, here we propose a {\em switchable} deep beamformer that can produce various types of output such as DAS, speckle removal, deconvolution, etc., using a single network with a simple switch. In particular, the switch is implemented through Adaptive Instance Normalization (AdaIN) layers, so that various output can be generated by merely changing the AdaIN code. Experimental results using B-mode focused ultrasound confirm the flexibility and efficacy of the proposed methods for various applications.
IVJul 10, 2020
OT-driven Multi-Domain Unsupervised Ultrasound Image Artifact Removal using a Single CNNJaeyoung Huh, Shujaat Khan, Jong Chul Ye
Ultrasound imaging (US) often suffers from distinct image artifacts from various sources. Classic approaches for solving these problems are usually model-based iterative approaches that have been developed specifically for each type of artifact, which are often computationally intensive. Recently, deep learning approaches have been proposed as computationally efficient and high performance alternatives. Unfortunately, in the current deep learning approaches, a dedicated neural network should be trained with matched training data for each specific artifact type. This poses a fundamental limitation in the practical use of deep learning for US, since large number of models should be stored to deal with various US image artifacts. Inspired by the recent success of multi-domain image transfer, here we propose a novel, unsupervised, deep learning approach in which a single neural network can be used to deal with different types of US artifacts simply by changing a mask vector that switches between different target domains. Our algorithm is rigorously derived using an optimal transport (OT) theory for cascaded probability measures. Experimental results using phantom and in vivo data demonstrate that the proposed method can generate high quality image by removing distinct artifacts, which are comparable to those obtained by separately trained multiple neural networks.
CVJun 26, 2020
Pushing the Limit of Unsupervised Learning for Ultrasound Image Artifact RemovalShujaat Khan, Jaeyoung Huh, Jong Chul Ye
Ultrasound (US) imaging is a fast and non-invasive imaging modality which is widely used for real-time clinical imaging applications without concerning about radiation hazard. Unfortunately, it often suffers from poor visual quality from various origins, such as speckle noises, blurring, multi-line acquisition (MLA), limited RF channels, small number of view angles for the case of plane wave imaging, etc. Classical methods to deal with these problems include image-domain signal processing approaches using various adaptive filtering and model-based approaches. Recently, deep learning approaches have been successfully used for ultrasound imaging field. However, one of the limitations of these approaches is that paired high quality images for supervised training are difficult to obtain in many practical applications. In this paper, inspired by the recent theory of unsupervised learning using optimal transport driven cycleGAN (OT-cycleGAN), we investigate applicability of unsupervised deep learning for US artifact removal problems without matched reference data. Experimental results for various tasks such as deconvolution, speckle removal, limited data artifact removal, etc. confirmed that our unsupervised learning method provides comparable results to supervised learning for many practical applications.
IVJul 24, 2019
Adaptive and Compressive Beamforming Using Deep Learning for Medical UltrasoundShujaat Khan, Jaeyoung Huh, Jong Chul Ye
In ultrasound (US) imaging, various types of adaptive beamforming techniques have been investigated to improve the resolution and contrast-to-noise ratio of the delay and sum (DAS) beamformers. Unfortunately, the performance of these adaptive beamforming approaches degrade when the underlying model is not sufficiently accurate and the number of channels decreases. To address this problem, here we propose a deep learning-based beamformer to generate significantly improved images over widely varying measurement conditions and channel subsampling patterns. In particular, our deep neural network is designed to directly process full or sub-sampled radio-frequency (RF) data acquired at various subsampling rates and detector configurations so that it can generate high quality ultrasound images using a single beamformer. The origin of such input-dependent adaptivity is also theoretically analyzed. Experimental results using B-mode focused ultrasound confirm the efficacy of the proposed methods.
IVApr 5, 2019
Deep Learning-based Universal Beamformer for Ultrasound ImagingShujaat Khan, Jaeyoung Huh, Jong Chul Ye
In ultrasound (US) imaging, individual channel RF measurements are back-propagated and accumulated to form an image after applying specific delays. While this time reversal is usually implemented using a hardware- or software-based delay-and-sum (DAS) beamformer, the performance of DAS decreases rapidly in situations where data acquisition is not ideal. Herein, for the first time, we demonstrate that a single data-driven adaptive beamformer designed as a deep neural network can generate high quality images robustly for various detector channel configurations and subsampling rates. The proposed deep beamformer is evaluated for two distinct acquisition schemes: focused ultrasound imaging and planewave imaging. Experimental results showed that the proposed deep beamformer exhibit significant performance gain for both focused and planar imaging schemes, in terms of contrast-to-noise ratio and structural similarity.
CVJan 7, 2019
Universal Deep Beamformer for Variable Rate Ultrasound ImagingShujaat Khan, Jaeyoung Huh, Jong Chul Ye
Ultrasound (US) imaging is based on the time-reversal principle, in which individual channel RF measurements are back-propagated and accumulated to form an image after applying specific delays. While this time reversal is usually implemented as a delay-and-sum (DAS) beamformer, the image quality quickly degrades as the number of measurement channels decreases. To address this problem, various types of adaptive beamforming techniques have been proposed using predefined models of the signals. However, the performance of these adaptive beamforming approaches degrade when the underlying model is not sufficiently accurate. Here, we demonstrate for the first time that a single universal deep beamformer trained using a purely data-driven way can generate significantly improved images over widely varying aperture and channel subsampling patterns. In particular, we design an end-to-end deep learning framework that can directly process sub-sampled RF data acquired at different subsampling rate and detector configuration to generate high quality ultrasound images using a single beamformer. Experimental results using B-mode focused ultrasound confirm the efficacy of the proposed methods.
CVDec 17, 2017
Efficient B-mode Ultrasound Image Reconstruction from Sub-sampled RF Data using Deep LearningYeo Hun Yoon, Shujaat Khan, Jaeyoung Huh et al.
In portable, three dimensional, and ultra-fast ultrasound imaging systems, there is an increasing demand for the reconstruction of high quality images from a limited number of radio-frequency (RF) measurements due to receiver (Rx) or transmit (Xmit) event sub-sampling. However, due to the presence of side lobe artifacts from RF sub-sampling, the standard beamformer often produces blurry images with less contrast, which are unsuitable for diagnostic purposes. Existing compressed sensing approaches often require either hardware changes or computationally expensive algorithms, but their quality improvements are limited. To address this problem, here we propose a novel deep learning approach that directly interpolates the missing RF data by utilizing redundancy in the Rx-Xmit plane. Our extensive experimental results using sub-sampled RF data from a multi-line acquisition B-mode system confirm that the proposed method can effectively reduce the data rate without sacrificing image quality.