54.9LGMay 7
Adaptive Selection of LoRA Components in Privacy-Preserving Federated LearningMyoungjun Kim, Sangwoo Park, Yoseob Han et al.
Differentially private federated fine-tuning of large models with LoRA suffers from aggregation error caused by LoRA's multiplicative structure, which is further amplified by DP noise and degrades both stability and accuracy. Existing remedies apply a single update mode uniformly across all layers and all communication rounds (or alternate them on a fixed schedule), ignoring both the structural asymmetry between the two LoRA factors and the round-wise dynamics of training. We propose AS-LoRA, an adaptive framework defined by three axes (i) layer-wise freedom, in which each layer independently selects its active component, (ii) round-wise adaptivity, in which the selection updates over communication rounds, and (iii) a curvature-aware score derived from a second-order approximation of the loss. Theoretically, AS-LoRA eliminates the reconstruction-error floor of layer-tied schedules, accelerates convergence, implicitly biases solutions toward flatter minima, and incurs no additional privacy cost. Across GLUE, SQuAD, CIFAR-100, and Tiny-ImageNet under strict DP budgets and non-IID partitions, AS-LoRA improves over the federated LoRA baselines by up to $+7.5$ pp on GLUE and $+12.5$ pp on MNLI-mm for example, while matching or exceeding SVD-based aggregation methods at $33\text{--}180 \times$ lower aggregation cost and with negligible communication overhead. Code for the proposed method is available at https://anonymous.4open.science/r/as_lora-F75F/.
ASNov 11, 2022
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representationsYoori Oh, Juheon Lee, Yoseob Han et al.
Recent text-to-speech models have reached the level of generating natural speech similar to what humans say. But there still have limitations in terms of expressiveness. The existing emotional speech synthesis models have shown controllability using interpolated features with scaling parameters in emotional latent space. However, the emotional latent space generated from the existing models is difficult to control the continuous emotional intensity because of the entanglement of features like emotions, speakers, etc. In this paper, we propose a novel method to control the continuous intensity of emotions using semi-supervised learning. The model learns emotions of intermediate intensity using pseudo-labels generated from phoneme-level sequences of speech information. An embedding space built from the proposed model satisfies the uniform grid geometry with an emotional basis. The experimental results showed that the proposed method was superior in controllability and naturalness.
15.6CVMay 8Code
Hierarchical Perfusion Graphs for Tumor Heterogeneity Modeling in Glioma Molecular SubtypingHan Jang, Junhyeok Lee, Heeseong Eum et al.
Precise molecular subtyping of gliomas, including isocitrate dehydrogenase (IDH) mutation and 1p/19q codeletion, directly guides surgical and therapeutic decisions, yet currently relies on invasive tissue sampling. Deep learning on structural MRI has emerged as a non-invasive alternative, but anatomy-only approaches cannot capture the hemodynamic signatures that distinguish molecular subtypes. Radiogenomics based on dynamic susceptibility contrast (DSC) MRI holds immense potential for non-invasively characterizing glioma molecular subtypes, yet clinical deployment has been hindered by inter-site variability and the limitations of voxel-wise analysis. We introduce HiPerfGNN, a framework that first learns discrete hemodynamic representations from raw time-intensity curves using a vector-quantized variational autoencoder (VQ-VAE). These quantized perfusion codes define coarse-level graph nodes representing functional tumor habitats, each of which is hierarchically subdivided into fine-level subregions guided by structural MRI. A hierarchical graph neural network then propagates information across scales for molecular prediction. On an internal cohort (n=475), the model achieved AUCs of 0.96 (IDH), 0.89 (1p/19q), and 0.84 (WHO grade), and maintained robust IDH performance (AUC 0.89) on an independent external cohort (n=397) without recalibration. Gradient-based saliency analysis confirms biologically grounded attention patterns aligned with known glioma pathophysiology. Our results demonstrate the added value of integrating perfusion dynamics into radiogenomic pipelines for glioma molecular subtyping. Code is available at https://github.com/janghana/HiPerfGNN.
IVJan 9, 2025
End-to-End Deep Learning for Interior Tomography with Low-Dose X-ray CTYoseob Han, Dufan Wu, Kyungsang Kim et al.
Objective: There exist several X-ray computed tomography (CT) scanning strategies to reduce a radiation dose, such as (1) sparse-view CT, (2) low-dose CT, and (3) region-of-interest (ROI) CT (called interior tomography). To further reduce the dose, the sparse-view and/or low-dose CT settings can be applied together with interior tomography. Interior tomography has various advantages in terms of reducing the number of detectors and decreasing the X-ray radiation dose. However, a large patient or small field-of-view (FOV) detector can cause truncated projections, and then the reconstructed images suffer from severe cupping artifacts. In addition, although the low-dose CT can reduce the radiation exposure dose, analytic reconstruction algorithms produce image noise. Recently, many researchers have utilized image-domain deep learning (DL) approaches to remove each artifact and demonstrated impressive performances, and the theory of deep convolutional framelets supports the reason for the performance improvement. Approach: In this paper, we found that the image-domain convolutional neural network (CNN) is difficult to solve coupled artifacts, based on deep convolutional framelets. Significance: To address the coupled problem, we decouple it into two sub-problems: (i) image domain noise reduction inside truncated projection to solve low-dose CT problem and (ii) extrapolation of projection outside truncated projection to solve the ROI CT problem. The decoupled sub-problems are solved directly with a novel proposed end-to-end learning using dual-domain CNNs. Main results: We demonstrate that the proposed method outperforms the conventional image-domain deep learning methods, and a projection-domain CNN shows better performance than the image-domain CNNs which are commonly used by many researchers.
LGJan 9, 2025
Hierarchical Decomposed Dual-domain Deep Learning for Sparse-View CT ReconstructionYoseob Han
Objective: X-ray computed tomography employing sparse projection views has emerged as a contemporary technique to mitigate radiation dose. However, due to the inadequate number of projection views, an analytic reconstruction method utilizing filtered backprojection results in severe streaking artifacts. Recently, deep learning strategies employing image-domain networks have demonstrated remarkable performance in eliminating the streaking artifact caused by analytic reconstruction methods with sparse projection views. Nevertheless, it is difficult to clarify the theoretical justification for applying deep learning to sparse view CT reconstruction, and it has been understood as restoration by removing image artifacts, not reconstruction. Approach: By leveraging the theory of deep convolutional framelets and the hierarchical decomposition of measurement, this research reveals the constraints of conventional image- and projection-domain deep learning methodologies, subsequently, the research proposes a novel dual-domain deep learning framework utilizing hierarchical decomposed measurements. Specifically, the research elucidates how the performance of the projection-domain network can be enhanced through a low-rank property of deep convolutional framelets and a bowtie support of hierarchical decomposed measurement in the Fourier domain. Main Results: This study demonstrated performance improvement of the proposed framework based on the low-rank property, resulting in superior reconstruction performance compared to conventional analytic and deep learning methods. Significance: By providing a theoretically justified deep learning approach for sparse-view CT reconstruction, this study not only offers a superior alternative to existing methods but also opens new avenues for research in medical imaging.
IRMay 1, 2024
Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data ManipulationYoori Oh, Yoseob Han, Kyogu Lee
There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different audio samples. Therefore, under many-to-one mapping conditions, audio-text datasets lead to poor performance of retrieval tasks. In this paper, we propose a novel approach to tackle the data imbalance problem in audio-language retrieval task. To overcome the limitation, we introduce a method that employs a distance sampling-based paraphraser leveraging ChatGPT, utilizing distance function to generate a controllable distribution of manipulated text data. For a set of sentences with the same context, the distance is used to calculate a degree of manipulation for any two sentences, and ChatGPT's few-shot prompting is performed using a text cluster with a similar distance defined by the Jaccard similarity. Therefore, ChatGPT, when applied to few-shot prompting with text clusters, can adjust the diversity of the manipulated text based on the distance. The proposed approach is shown to significantly enhance performance in audio-text retrieval, outperforming conventional text augmentation techniques.
CVSep 25, 2020
AIM 2020 Challenge on Real Image Super-Resolution: Methods and ResultsPengxu Wei, Hannan Lu, Radu Timofte et al.
This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020. This challenge involves three tracks to super-resolve an input image for $\times$2, $\times$3 and $\times$4 scaling factors, respectively. The goal is to attract more attention to realistic image degradation for the SR task, which is much more complicated and challenging, and contributes to real-world image super-resolution applications. 452 participants were registered for three tracks in total, and 24 teams submitted their results. They gauge the state-of-the-art approaches for real image SR in terms of PSNR and SSIM.
IVJun 17, 2019
Differentiated Backprojection Domain Deep Learning for Conebeam Artifact RemovalYoseob Han, Junyoung Kim, Jong Chul Ye
Conebeam CT using a circular trajectory is quite often used for various applications due to its relative simple geometry. For conebeam geometry, Feldkamp, Davis and Kress algorithm is regarded as the standard reconstruction method, but this algorithm suffers from so-called conebeam artifacts as the cone angle increases. Various model-based iterative reconstruction methods have been developed to reduce the cone-beam artifacts, but these algorithms usually require multiple applications of computational expensive forward and backprojections. In this paper, we develop a novel deep learning approach for accurate conebeam artifact removal. In particular, our deep network, designed on the differentiated backprojection domain, performs a data-driven inversion of an ill-posed deconvolution problem associated with the Hilbert transform. The reconstruction results along the coronal and sagittal directions are then combined using a spectral blending technique to minimize the spectral leakage. Experimental results show that our method outperforms the existing iterative methods despite significantly reduced runtime complexity.
CVOct 1, 2018
One Network to Solve All ROIs: Deep Learning CT for Any ROI using Differentiated BackprojectionYoseob Han, Jong Chul Ye
Computed tomography for region-of-interest (ROI) reconstruction has advantages of reducing X-ray radiation dose and using a small detector. However, standard analytic reconstruction methods suffer from severe cupping artifacts, and existing model-based iterative reconstruction methods require extensive computations. Recently, we proposed a deep neural network to learn the cupping artifact, but the network is not well generalized for different ROIs due to the singularities in the corrupted images. Therefore, there is an increasing demand for a neural network that works well for any ROI sizes. In this paper, two types of neural networks are designed. The first type learns ROI size-specific cupping artifacts from the analytic reconstruction images, whereas the second type network is to learn to invert the finite Hilbert transform from the truncated differentiated backprojection (DBP) data. Their generalizability for any ROI sizes is then examined. Experimental results show that the new type of neural network significantly outperforms the existing iterative methods for any ROI size in spite of significantly reduced run-time complexity. Since the proposed method consistently surpasses existing methods for any ROIs, it can be used as a general CT reconstruction engine for many practical applications without compromising possible detector truncation.
CVJun 1, 2018
k-Space Deep Learning for Reference-free EPI Ghost CorrectionJuyoung Lee, Yoseob Han, Jae-Kyun Ryu et al.
Nyquist ghost artifacts in EPI are originated from phase mismatch between the even and odd echoes. However, conventional correction methods using reference scans often produce erroneous results especially in high-field MRI due to the non-linear and time-varying local magnetic field changes. Recently, it was shown that the problem of ghost correction can be reformulated as k-space interpolation problem that can be solved using structured low-rank Hankel matrix approaches. Another recent work showed that data driven Hankel matrix decomposition can be reformulated to exhibit similar structures as deep convolutional neural network. By synergistically combining these findings, we propose a k-space deep learning approach that immediately corrects the phase mismatch without a reference scan in both accelerated and non-accelerated EPI acquisitions. To take advantage of the even and odd-phase directional redundancy, the k-space data is divided into two channels configured with even and odd phase encodings. The redundancies between coils are also exploited by stacking the multi-coil k-space data into additional input channels. Then, our k-space ghost correction network is trained to learn the interpolation kernel to estimate the missing virtual k-space data. For the accelerated EPI data, the same neural network is trained to directly estimate the interpolation kernels for missing k-space data from both ghost and subsampling. Reconstruction results using 3T and 7T in-vivo data showed that the proposed method outperformed the image quality compared to the existing methods, and the computing time is much faster.The proposed k-space deep learning for EPI ghost correction is highly robust and fast, and can be combined with acceleration, so that it can be used as a promising correction tool for high-field MRI without changing the current acquisition protocol.
CVMay 10, 2018
k-Space Deep Learning for Accelerated MRIYoseob Han, Leonard Sunwoo, Jong Chul Ye
The annihilating filter-based low-rank Hankel matrix approach (ALOHA) is one of the state-of-the-art compressed sensing approaches that directly interpolates the missing k-space data using low-rank Hankel matrix completion. The success of ALOHA is due to the concise signal representation in the k-space domain thanks to the duality between structured low-rankness in the k-space domain and the image domain sparsity. Inspired by the recent mathematical discovery that links convolutional neural networks to Hankel matrix decomposition using data-driven framelet basis, here we propose a fully data-driven deep learning algorithm for k-space interpolation. Our network can be also easily applied to non-Cartesian k-space trajectories by simply adding an additional regridding layer. Extensive numerical experiments show that the proposed deep learning method consistently outperforms the existing image-domain deep learning approaches.
CVJan 4, 2018
Deep Learning Reconstruction for 9-View Dual Energy CT Baggage ScannerYoseob Han, Jingu Kang, Jong Chul Ye
For homeland and transportation security applications, 2D X-ray explosive detection system (EDS) have been widely used, but they have limitations in recognizing 3D shape of the hidden objects. Among various types of 3D computed tomography (CT) systems to address this issue, this paper is interested in a stationary CT using fixed X-ray sources and detectors. However, due to the limited number of projection views, analytic reconstruction algorithms produce severe streaking artifacts. Inspired by recent success of deep learning approach for sparse view CT reconstruction, here we propose a novel image and sinogram domain deep learning architecture for 3D reconstruction from very sparse view measurement. The algorithm has been tested with the real data from a prototype 9-view dual energy stationary CT EDS carry-on baggage scanner developed by GEMSS Medical Systems, Korea, which confirms the superior reconstruction performance over the existing approaches.
CVDec 29, 2017
Deep Learning Interior Tomography for Region-of-Interest ReconstructionYoseob Han, Jawook Gu, Jong Chul Ye
Interior tomography for the region-of-interest (ROI) imaging has advantages of using a small detector and reducing X-ray radiation dose. However, standard analytic reconstruction suffers from severe cupping artifacts due to existence of null space in the truncated Radon transform. Existing penalized reconstruction methods may address this problem but they require extensive computations due to the iterative reconstruction. Inspired by the recent deep learning approaches to low-dose and sparse view CT, here we propose a deep learning architecture that removes null space signals from the FBP reconstruction. Experimental results have shown that the proposed method provides near-perfect reconstruction with about 7-10 dB improvement in PSNR over existing methods in spite of significantly reduced run-time complexity.
CVAug 28, 2017
Framing U-Net via Deep Convolutional Framelets: Application to Sparse-view CTYoseob Han, Jong Chul Ye
X-ray computed tomography (CT) using sparse projection views is a recent approach to reduce the radiation dose. However, due to the insufficient projection views, an analytic reconstruction approach using the filtered back projection (FBP) produces severe streaking artifacts. Recently, deep learning approaches using large receptive field neural networks such as U-Net have demonstrated impressive performance for sparse- view CT reconstruction. However, theoretical justification is still lacking. Inspired by the recent theory of deep convolutional framelets, the main goal of this paper is, therefore, to reveal the limitation of U-Net and propose new multi-resolution deep learning schemes. In particular, we show that the alternative U- Net variants such as dual frame and the tight frame U-Nets satisfy the so-called frame condition which make them better for effective recovery of high frequency edges in sparse view- CT. Using extensive experiments with real patient data set, we demonstrate that the new network architectures provide better reconstruction performance.
MLJul 3, 2017
Deep Convolutional Framelets: A General Deep Learning Framework for Inverse ProblemsJong Chul Ye, Yoseob Han, Eunju Cha
Recently, deep learning approaches with various network architectures have achieved significant performance improvement over existing iterative reconstruction methods in various imaging problems. However, it is still unclear why these deep learning architectures work for specific inverse problems. To address these issues, here we show that the long-searched-for missing link is the convolution framelets for representing a signal by convolving local and non-local bases. The convolution framelets was originally developed to generalize the theory of low-rank Hankel matrix approaches for inverse problems, and this paper further extends the idea so that we can obtain a deep neural network using multilayer convolution framelets with perfect reconstruction (PR) under rectilinear linear unit nonlinearity (ReLU). Our analysis also shows that the popular deep network components such as residual block, redundant filter channels, and concatenated ReLU (CReLU) do indeed help to achieve the PR, while the pooling and unpooling layers should be augmented with high-pass branches to meet the PR condition. Moreover, by changing the number of filter channels and bias, we can control the shrinkage behaviors of the neural network. This discovery leads us to propose a novel theory for deep convolutional framelets neural network. Using numerical experiments with various inverse problems, we demonstrated that our deep convolution framelets network shows consistent improvement over existing deep architectures.This discovery suggests that the success of deep learning is not from a magical power of a black-box, but rather comes from the power of a novel signal representation using non-local basis combined with data-driven local basis, which is indeed a natural extension of classical signal processing theory.