IVSep 13, 2024Code
DX2CT: Diffusion Model for 3D CT Reconstruction from Bi or Mono-planar 2D X-ray(s)Yun Su Jeong, Hye Bin Yoo, Il Yong Chun
Computational tomography (CT) provides high-resolution medical imaging, but it can expose patients to high radiation. X-ray scanners have low radiation exposure, but their resolutions are low. This paper proposes a new conditional diffusion model, DX2CT, that reconstructs three-dimensional (3D) CT volumes from bi or mono-planar X-ray image(s). Proposed DX2CT consists of two key components: 1) modulating feature maps extracted from two-dimensional (2D) X-ray(s) with 3D positions of CT volume using a new transformer and 2) effectively using the modulated 3D position-aware feature maps as conditions of DX2CT. In particular, the proposed transformer can provide conditions with rich information of a target CT slice to the conditional diffusion model, enabling high-quality CT reconstruction. Our experiments with the bi or mono-planar X-ray(s) benchmark datasets show that proposed DX2CT outperforms several state-of-the-art methods. Our codes and model will be available at: https://www.github.com/intyeger/DX2CT.
IVApr 17, 2022
Accelerated MRI With Deep Linear Convolutional Transform LearningHongyi Gu, Burhaneddin Yaman, Steen Moeller et al.
Recent studies show that deep learning (DL) based MRI reconstruction outperforms conventional methods, such as parallel imaging and compressed sensing (CS), in multiple applications. Unlike CS that is typically implemented with pre-determined linear representations for regularization, DL inherently uses a non-linear representation learned from a large database. Another line of work uses transform learning (TL) to bridge the gap between these two approaches by learning linear representations from data. In this work, we combine ideas from CS, TL and DL reconstructions to learn deep linear convolutional transforms as part of an algorithm unrolling approach. Using end-to-end training, our results show that the proposed technique can reconstruct MR images to a level comparable to DL methods, while supporting uniform undersampling patterns unlike conventional CS methods. Our proposed method relies on convex sparse image reconstruction with linear representation at inference time, which may be beneficial for characterizing robustness, stability and generalizability.
IVMay 10, 2022
Self-supervised regression learning using domain knowledge: Applications to improving self-supervised denoising in imagingIl Yong Chun, Dongwon Park, Xuehang Zheng et al.
Regression that predicts continuous quantity is a central part of applications using computational imaging and computer vision technologies. Yet, studying and understanding self-supervised learning for regression tasks - except for a particular regression task, image denoising - have lagged behind. This paper proposes a general self-supervised regression learning (SSRL) framework that enables learning regression neural networks with only input data (but without ground-truth target data), by using a designable pseudo-predictor that encapsulates domain knowledge of a specific application. The paper underlines the importance of using domain knowledge by showing that under different settings, the better pseudo-predictor can lead properties of SSRL closer to those of ordinary supervised learning. Numerical experiments for low-dose computational tomography denoising and camera image denoising demonstrate that proposed SSRL significantly improves the denoising quality over several existing self-supervised denoising methods.
CVOct 6, 2023
Improving Neural Radiance Field using Near-Surface Sampling with Point Cloud GenerationHye Bin Yoo, Hyun Min Han, Sung Soo Hwang et al.
Neural radiance field (NeRF) is an emerging view synthesis method that samples points in a three-dimensional (3D) space and estimates their existence and color probabilities. The disadvantage of NeRF is that it requires a long training time since it samples many 3D points. In addition, if one samples points from occluded regions or in the space where an object is unlikely to exist, the rendering quality of NeRF can be degraded. These issues can be solved by estimating the geometry of 3D scene. This paper proposes a near-surface sampling framework to improve the rendering quality of NeRF. To this end, the proposed method estimates the surface of a 3D object using depth images of the training set and sampling is performed around there only. To obtain depth information on a novel view, the paper proposes a 3D point cloud generation method and a simple refining method for projected depth from a point cloud. Experimental results show that the proposed near-surface sampling NeRF framework can significantly improve the rendering quality, compared to the original NeRF and three different state-of-the-art NeRF. In addition, one can significantly accelerate the training time of a NeRF model with the proposed near-surface sampling framework.
ROAug 28, 2023
End-to-End Driving via Self-Supervised Imitation Learning Using Camera and LiDAR DataJin Bok Park, Jinkyu Lee, Muhyun Back et al.
In autonomous driving, the end-to-end (E2E) driving approach that predicts vehicle control signals directly from sensor data is rapidly gaining attention. To learn a safe E2E driving system, one needs an extensive amount of driving data and human intervention. Vehicle control data is constructed by many hours of human driving, and it is challenging to construct large vehicle control datasets. Often, publicly available driving datasets are collected with limited driving scenes, and collecting vehicle control data is only available by vehicle manufacturers. To address these challenges, this letter proposes the first fully self-supervised learning framework, self-supervised imitation learning (SSIL), for E2E driving, based on the self-supervised regression learning (SSRL) framework.The proposed SSIL framework can learn E2E driving networks \emph{without} using driving command data or a pre-trained model. To construct pseudo steering angle data, proposed SSIL predicts a pseudo target from the vehicle's poses at the current and previous time points that are estimated with light detection and ranging sensors. In addition, we propose two E2E driving networks that predict driving commands depending on high-level instruction. Our numerical experiments with three different benchmark datasets demonstrate that the proposed SSIL framework achieves \emph{very} comparable E2E driving accuracy with the supervised learning counterpart. The proposed pseudo-label predictor outperformed an existing one using proportional integral derivative controller.
CVOct 30, 2025
MoTDiff: High-resolution Motion Trajectory estimation from a single blurred image using Diffusion modelsWontae Choi, Jaelin Lee, Hyung Sup Yun et al.
Accurate estimation of motion information is crucial in diverse computational imaging and computer vision applications. Researchers have investigated various methods to extract motion information from a single blurred image, including blur kernels and optical flow. However, existing motion representations are often of low quality, i.e., coarse-grained and inaccurate. In this paper, we propose the first high-resolution (HR) Motion Trajectory estimation framework using Diffusion models (MoTDiff). Different from existing motion representations, we aim to estimate an HR motion trajectory with high-quality from a single motion-blurred image. The proposed MoTDiff consists of two key components: 1) a new conditional diffusion framework that uses multi-scale feature maps extracted from a single blurred image as a condition, and 2) a new training method that can promote precise identification of a fine-grained motion trajectory, consistent estimation of overall shape and position of a motion path, and pixel connectivity along a motion trajectory. Our experiments demonstrate that the proposed MoTDiff can outperform state-of-the-art methods in both blind image deblurring and coded exposure photography applications.
CVFeb 12
SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous drivingSeo Hyun Kim, Jin Bok Park, Do Yeon Koo et al.
In autonomous driving, end-to-end (E2E) driving systems that predict control commands directly from sensor data have achieved significant advancements. For safe driving in unexpected scenarios, these systems may additionally rely on human interventions such as natural language instructions. Using a multi-modal large language model (MLLM) facilitates human-vehicle interaction and can improve performance in such scenarios. However, this approach requires substantial computational resources due to its reliance on an LLM and numerous visual tokens from sensor inputs, which are limited in autonomous vehicles. Many MLLM studies have explored reducing visual tokens, but often suffer end-task performance degradation compared to using all tokens. To enable efficient E2E driving while maintaining performance comparable to using all tokens, this paper proposes the first Supervised Token Reduction framework for multi-modal LLMs (SToRM). The proposed framework consists of three key elements. First, a lightweight importance predictor with short-term sliding windows estimates token importance scores. Second, a supervised training approach uses an auxiliary path to obtain pseudo-supervision signals from an all-token LLM pass. Third, an anchor-context merging module partitions tokens into anchors and context tokens, and merges context tokens into relevant anchors to reduce redundancy while minimizing information loss. Experiments on the LangAuto benchmark show that SToRM outperforms state-of-the-art E2E driving MLLMs under the same reduced-token budget, maintaining all-token performance while reducing computational cost by up to 30x.
CVFeb 10
Equilibrium contrastive learning for imbalanced image classificationSumin Roh, Harim Kim, Ho Yun Lee et al.
Contrastive learning (CL) is a predominant technique in image classification, but they showed limited performance with an imbalanced dataset. Recently, several supervised CL methods have been proposed to promote an ideal regular simplex geometric configuration in the representation space-characterized by intra-class feature collapse and uniform inter-class mean spacing, especially for imbalanced datasets. In particular, existing prototype-based methods include class prototypes, as additional samples to consider all classes. However, the existing CL methods suffer from two limitations. First, they do not consider the alignment between the class means/prototypes and classifiers, which could lead to poor generalization. Second, existing prototype-based methods treat prototypes as only one additional sample per class, making their influence depend on the number of class instances in a batch and causing unbalanced contributions across classes. To address these limitations, we propose Equilibrium Contrastive Learning (ECL), a supervised CL framework designed to promote geometric equilibrium, where class features, means, and classifiers are harmoniously balanced under data imbalance. The proposed ECL framework uses two main components. First, ECL promotes the representation geometric equilibrium (i.e., a regular simplex geometry characterized by collapsed class samples and uniformly distributed class means), while balancing the contributions of class-average features and class prototypes. Second, ECL establishes a classifier-class center geometric equilibrium by aligning classifier weights and class prototypes. We ran experiments with three long-tailed datasets, the CIFAR-10(0)-LT, ImageNet-LT, and the two imbalanced medical datasets, the ISIC 2019 and our constructed LCCT dataset. Results show that ECL outperforms existing SOTA supervised CL methods designed for imbalanced classification.
CVJan 30, 2025
MAMS: Model-Agnostic Module Selection Framework for Video CaptioningSangho Lee, Il Yong Chun, Hogun Park
Multi-modal transformers are rapidly gaining attention in video captioning tasks. Existing multi-modal video captioning methods typically extract a fixed number of frames, which raises critical challenges. When a limited number of frames are extracted, important frames with essential information for caption generation may be missed. Conversely, extracting an excessive number of frames includes consecutive frames, potentially causing redundancy in visual tokens extracted from consecutive video frames. To extract an appropriate number of frames for each video, this paper proposes the first model-agnostic module selection framework in video captioning that has two main functions: (1) selecting a caption generation module with an appropriate size based on visual tokens extracted from video frames, and (2) constructing subsets of visual tokens for the selected caption generation module. Furthermore, we propose a new adaptive attention masking scheme that enhances attention on important visual tokens. Our experiments on three different benchmark datasets demonstrate that the proposed framework significantly improves the performance of three recent video captioning models.
CVMay 28, 2025
Autoregression-free video prediction using diffusion model for mitigating error propagationWoonho Ko, Jin Bok Park, Il Yong Chun
Existing long-term video prediction methods often rely on an autoregressive video prediction mechanism. However, this approach suffers from error propagation, particularly in distant future frames. To address this limitation, this paper proposes the first AutoRegression-Free (ARFree) video prediction framework using diffusion models. Different from an autoregressive video prediction mechanism, ARFree directly predicts any future frame tuples from the context frame tuple. The proposed ARFree consists of two key components: 1) a motion prediction module that predicts a future motion using motion feature extracted from the context frame tuple; 2) a training method that improves motion continuity and contextual consistency between adjacent future frame tuples. Our experiments with two benchmark datasets show that the proposed ARFree video prediction framework outperforms several state-of-the-art video prediction methods.
CVApr 30, 2021
Improved Real-Time Monocular SLAM Using Semantic Segmentation on Selective FramesJinkyu Lee, Muhyun Back, Sung Soo Hwang et al.
Monocular simultaneous localization and mapping (SLAM) is emerging in advanced driver assistance systems and autonomous driving, because a single camera is cheap and easy to install. Conventional monocular SLAM has two major challenges leading inaccurate localization and mapping. First, it is challenging to estimate scales in localization and mapping. Second, conventional monocular SLAM uses inappropriate mapping factors such as dynamic objects and low-parallax areas in mapping. This paper proposes an improved real-time monocular SLAM that resolves the aforementioned challenges by efficiently using deep learning-based semantic segmentation. To achieve the real-time execution of the proposed method, we apply semantic segmentation only to downsampled keyframes in parallel with mapping processes. In addition, the proposed method corrects scales of camera poses and three-dimensional (3D) points, using estimated ground plane from road-labeled 3D points and the real camera height. The proposed method also removes inappropriate corner features labeled as moving objects and low parallax areas. Experiments with eight video sequences demonstrate that the proposed monocular SLAM system achieves significantly improved and comparable trajectory tracking accuracy, compared to existing state-of-the-art monocular and stereo SLAM systems, respectively. The proposed system can achieve real-time tracking on a standard CPU potentially with a standard GPU support, whereas existing segmentation-aided monocular SLAM does not.
CVApr 1, 2021
Improved and efficient inter-vehicle distance estimation using road gradients of both ego and target vehiclesMuhyun Back, Jinkyu Lee, Kyuho Bae et al.
In advanced driver assistant systems and autonomous driving, it is crucial to estimate distances between an ego vehicle and target vehicles. Existing inter-vehicle distance estimation methods assume that the ego and target vehicles drive on a same ground plane. In practical driving environments, however, they may drive on different ground planes. This paper proposes an inter-vehicle distance estimation framework that can consider slope changes of a road forward, by estimating road gradients of \emph{both} ego vehicle and target vehicles and using a 2D object detection deep net. Numerical experiments demonstrate that the proposed method significantly improves the distance estimation accuracy and time complexity, compared to deep learning-based depth estimation methods.
IVDec 2, 2020
An Improved Iterative Neural Network for High-Quality Image-Domain Material Decomposition in Dual-Energy CTZhipeng Li, Yong Long, Il Yong Chun
Dual-energy computed tomography (DECT) has been widely used in many applications that need material decomposition. Image-domain methods directly decompose material images from high- and low-energy attenuation images, and thus, are susceptible to noise and artifacts on attenuation images. The purpose of this study is to develop an improved iterative neural network (INN) for high-quality image-domain material decomposition in DECT, and to study its properties. We propose a new INN architecture for DECT material decomposition. The proposed INN architecture uses distinct cross-material convolutional neural network (CNN) in image refining modules, and uses image decomposition physics in image reconstruction modules. The distinct cross-material CNN refiners incorporate distinct encoding-decoding filters and cross-material model that captures correlations between different materials. We study the distinct cross-material CNN refiner with patch-based reformulation and tight-frame condition. Numerical experiments with extended cardiactorso (XCAT) phantom and clinical data show that the proposed INN significantly improves the image quality over several image-domain material decomposition methods, including a conventional model-based image decomposition (MBID) method using an edge-preserving regularizer, a recent MBID method using pre-learned material-wise sparsifying transforms, and a noniterative deep CNN method. Our study with patch-based reformulations reveals that learned filters of distinct cross-material CNN refiners can approximately satisfy the tight-frame condition.
IVFeb 27, 2020
Momentum-Net for Low-Dose CT Image ReconstructionSiqi Ye, Yong Long, Il Yong Chun
This paper applies the recent fast iterative neural network framework, Momentum-Net, using appropriate models to low-dose X-ray computed tomography (LDCT) image reconstruction. At each layer of the proposed Momentum-Net, the model-based image reconstruction module solves the majorized penalized weighted least-square problem, and the image refining module uses a four-layer convolutional neural network (CNN). Experimental results with the NIH AAPM-Mayo Clinic Low Dose CT Grand Challenge dataset show that the proposed Momentum-Net architecture significantly improves image reconstruction accuracy, compared to a state-of-the-art noniterative image denoising deep neural network (NN), WavResNet (in LDCT). We also investigated the spectral normalization technique that applies to image refining NN learning to satisfy the nonexpansive NN property; however, experimental results show that this does not improve the image reconstruction performance of Momentum-Net.
IVAug 4, 2019
BCD-Net for Low-dose CT Reconstruction: Acceleration, Convergence, and GeneralizationIl Yong Chun, Xuehang Zheng, Yong Long et al.
Obtaining accurate and reliable images from low-dose computed tomography (CT) is challenging. Regression convolutional neural network (CNN) models that are learned from training data are increasingly gaining attention in low-dose CT reconstruction. This paper modifies the architecture of an iterative regression CNN, BCD-Net, for fast, stable, and accurate low-dose CT reconstruction, and presents the convergence property of the modified BCD-Net. Numerical results with phantom data show that applying faster numerical solvers to model-based image reconstruction (MBIR) modules of BCD-Net leads to faster and more accurate BCD-Net; BCD-Net significantly improves the reconstruction accuracy, compared to the state-of-the-art MBIR method using learned transforms; BCD-Net achieves better image quality, compared to a state-of-the-art iterative NN architecture, ADMM-Net. Numerical results with clinical data show that BCD-Net generalizes significantly better than a state-of-the-art deep (non-iterative) regression NN, FBPConvNet, that lacks MBIR modules.
IVJul 26, 2019
Momentum-Net: Fast and convergent iterative neural network for inverse problemsIl Yong Chun, Zhengyu Huang, Hongki Lim et al.
Iterative neural networks (INN) are rapidly gaining attention for solving inverse problems in imaging, image processing, and computer vision. INNs combine regression NNs and an iterative model-based image reconstruction (MBIR) algorithm, often leading to both good generalization capability and outperforming reconstruction quality over existing MBIR optimization models. This paper proposes the first fast and convergent INN architecture, Momentum-Net, by generalizing a block-wise MBIR algorithm that uses momentum and majorizers with regression NNs. For fast MBIR, Momentum-Net uses momentum terms in extrapolation modules, and noniterative MBIR modules at each iteration by using majorizers, where each iteration of Momentum-Net consists of three core modules: image refining, extrapolation, and MBIR. Momentum-Net guarantees convergence to a fixed-point for general differentiable (non)convex MBIR functions (or data-fit terms) and convex feasible sets, under two asymptomatic conditions. To consider data-fit variations across training and testing samples, we also propose a regularization parameter selection scheme based on the "spectral spread" of majorization matrices. Numerical experiments for light-field photography using a focal stack and sparse-view computational tomography demonstrate that, given identical regression NN architectures, Momentum-Net significantly improves MBIR speed and accuracy over several existing INNs; it significantly improves reconstruction quality compared to a state-of-the-art MBIR method in each application.
IVJun 5, 2019
Improved low-count quantitative PET reconstruction with an iterative neural networkHongki Lim, Il Yong Chun, Yuni K. Dewaraja et al.
Image reconstruction in low-count PET is particularly challenging because gammas from natural radioactivity in Lu-based crystals cause high random fractions that lower the measurement signal-to-noise-ratio (SNR). In model-based image reconstruction (MBIR), using more iterations of an unregularized method may increase the noise, so incorporating regularization into the image reconstruction is desirable to control the noise. New regularization methods based on learned convolutional operators are emerging in MBIR. We modify the architecture of an iterative neural network, BCD-Net, for PET MBIR, and demonstrate the efficacy of the trained BCD-Net using XCAT phantom data that simulates the low true coincidence count-rates with high random fractions typical for Y-90 PET patient imaging after Y-90 microsphere radioembolization. Numerical results show that the proposed BCD-Net significantly improves CNR and RMSE of the reconstructed images compared to MBIR methods using non-trained regularizers, total variation (TV) and non-local means (NLM). Moreover, BCD-Net successfully generalizes to test data that differs from the training data. Improvements were also demonstrated for the clinically relevant phantom measurement data where we used training and testing datasets having very different activity distributions and count-levels.
LGFeb 21, 2019
Convolutional Analysis Operator Learning: Dependence on Training DataIl Yong Chun, David Hong, Ben Adcock et al.
Convolutional analysis operator learning (CAOL) enables the unsupervised training of (hierarchical) convolutional sparsifying operators or autoencoders from large datasets. One can use many training images for CAOL, but a precise understanding of the impact of doing so has remained an open question. This paper presents a series of results that lend insight into the impact of dataset size on the filter update in CAOL. The first result is a general deterministic bound on errors in the estimated filters, and is followed by a bound on the expected errors as the number of training samples increases. The second result provides a high probability analogue. The bounds depend on properties of the training data, and we investigate their empirical values with real data. Taken together, these results provide evidence for the potential benefit of using more training data in CAOL.
MLFeb 20, 2018
Deep BCD-Net Using Identical Encoding-Decoding CNN Structures for Iterative Image RecoveryIl Yong Chun, Jeffrey A. Fessler
In "extreme" computational imaging that collects extremely undersampled or noisy measurements, obtaining an accurate image within a reasonable computing time is challenging. Incorporating image mapping convolutional neural networks (CNN) into iterative image recovery has great potential to resolve this issue. This paper 1) incorporates image mapping CNN using identical convolutional kernels in both encoders and decoders into a block coordinate descent (BCD) signal recovery method and 2) applies alternating direction method of multipliers to train the aforementioned image mapping CNN. We refer to the proposed recurrent network as BCD-Net using identical encoding-decoding CNN structures. Numerical experiments show that, for a) denoising low signal-to-noise-ratio images and b) extremely undersampled magnetic resonance imaging, the proposed BCD-Net achieves significantly more accurate image recovery, compared to BCD-Net using distinct encoding-decoding structures and/or the conventional image recovery model using both wavelets and total variation.
IVFeb 15, 2018
Convolutional Analysis Operator Learning: Acceleration and ConvergenceIl Yong Chun, Jeffrey A. Fessler
Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets -- particularly with multi-layered structures, e.g., convolutional neural networks -- or when applying the learned kernels to high-dimensional signal recovery problems. The so-called convolution approach does not store many overlapping patches, and thus overcomes the memory problems particularly with careful algorithmic designs; it has been studied within the "synthesis" signal model, e.g., convolutional dictionary learning. This paper proposes a new convolutional analysis operator learning (CAOL) framework that learns an analysis sparsifying regularizer with the convolution perspective, and develops a new convergent Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve the corresponding block multi-nonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tight-frame filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, with sharp majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared to the state-of-the-art block proximal gradient (BPG) method. Numerical experiments for sparse-view computational tomography show that a convolutional sparsifying regularizer learned via CAOL significantly improves reconstruction quality compared to a conventional edge-preserving regularizer. Using more and wider kernels in a learned regularizer better preserves edges in reconstructed images.
MLNov 2, 2017
Sparse-View X-Ray CT Reconstruction Using $\ell_1$ Prior with Learned TransformXuehang Zheng, Il Yong Chun, Zhipeng Li et al.
A major challenge in X-ray computed tomography (CT) is reducing radiation dose while maintaining high quality of reconstructed images. To reduce the radiation dose, one can reduce the number of projection views (sparse-view CT); however, it becomes difficult to achieve high-quality image reconstruction as the number of projection views decreases. Researchers have applied the concept of learning sparse representations from (high-quality) CT image dataset to the sparse-view CT reconstruction. We propose a new statistical CT reconstruction model that combines penalized weighted-least squares (PWLS) and $\ell_1$ prior with learned sparsifying transform (PWLS-ST-$\ell_1$), and a corresponding efficient algorithm based on Alternating Direction Method of Multipliers (ADMM). To moderate the difficulty of tuning ADMM parameters, we propose a new ADMM parameter selection scheme based on approximated condition numbers. We interpret the proposed model by analyzing the minimum mean square error of its ($\ell_2$-norm relaxed) image update estimator. Our results with the extended cardiac-torso (XCAT) phantom data and clinical chest data show that, for sparse-view 2D fan-beam CT and 3D axial cone-beam CT, PWLS-ST-$\ell_1$ improves the quality of reconstructed images compared to the CT reconstruction methods using edge-preserving regularizer and $\ell_2$ prior with learned ST. These results also show that, for sparse-view 2D fan-beam CT, PWLS-ST-$\ell_1$ achieves comparable or better image quality and requires much shorter runtime than PWLS-DL using a learned overcomplete dictionary. Our results with clinical chest data show that, methods using the unsupervised learned prior generalize better than a state-of-the-art deep "denoising" neural network that does not use a physical imaging model.
LGJul 3, 2017
Convolutional Dictionary Learning: Acceleration and ConvergenceIl Yong Chun, Jeffrey A. Fessler
Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach.