CVJun 1Code
CanonCGT: Reference-Based Color Grading via Canonical Pivot RepresentationJinwon Ko, Keunsoo Ko, Chang-Su Kim
Reference-based color grading aims to reproduce the tonal mood and lighting of a reference while preserving color harmony and scene structure. Existing photorealistic and filter-based methods often produce unstable tone mappings -- over-shifting or inconsistently retaining colors -- leading to unnatural results. We propose CanonCGT, a two-stage framework built on a canonical pivot -- a style-neutral intermediate representation for stable color mapping. The first stage canonicalizes the input by removing intrinsic tonal bias, and the second color-grades it to match the reference style. A dual-phase training scheme, DP-CGT, combines supervised preset learning with self-supervised refinement on unpaired photographs. CanonCGT delivers photorealistic and tonally consistent results across diverse datasets, surpassing state-of-the-art methods in stability and visual fidelity. Our codes are available at \href{https://github.com/Jinwon-Ko/CanonCGT}{https://github.com/Jinwon-Ko/CanonCGT}
CVMar 29, 2022Code
Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse LanesDongkwon Jin, Wonhui Park, Seong-Gyun Jeong et al.
A novel algorithm to detect road lanes in the eigenlane space is proposed in this paper. First, we introduce the notion of eigenlanes, which are data-driven descriptors for structurally diverse lanes, including curved, as well as straight, lanes. To obtain eigenlanes, we perform the best rank-M approximation of a lane matrix containing all lanes in a training set. Second, we generate a set of lane candidates by clustering the training lanes in the eigenlane space. Third, using the lane candidates, we determine an optimal set of lanes by developing an anchor-based detection network, called SIIC-Net. Experimental results demonstrate that the proposed algorithm provides excellent detection performance for structurally diverse lanes. Our codes are available at https://github.com/dongkwonjin/Eigenlanes.
CVApr 5, 2023Code
BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame InterpolationJunheum Park, Jintae Kim, Chang-Su Kim
A novel 4K video frame interpolator based on bilateral transformer (BiFormer) is proposed in this paper, which performs three steps: global motion estimation, local motion refinement, and frame synthesis. First, in global motion estimation, we predict symmetric bilateral motion fields at a coarse scale. To this end, we propose BiFormer, the first transformer-based bilateral motion estimator. Second, we refine the global motion fields efficiently using blockwise bilateral cost volumes (BBCVs). Third, we warp the input frames using the refined motion fields and blend them to synthesize an intermediate frame. Extensive experiments demonstrate that the proposed BiFormer algorithm achieves excellent interpolation performance on 4K datasets. The source codes are available at https://github.com/JunHeum/BiFormer.
IVMar 10, 2023Code
Context-Based Trit-Plane Coding for Progressive Image CompressionSeungmin Jeon, Kwang Pyo Choi, Youngo Park et al.
Trit-plane coding enables deep progressive image compression, but it cannot use autoregressive context models. In this paper, we propose the context-based trit-plane coding (CTC) algorithm to achieve progressive compression more compactly. First, we develop the context-based rate reduction module to estimate trit probabilities of latent elements accurately and thus encode the trit-planes compactly. Second, we develop the context-based distortion reduction module to refine partial latent tensors from the trit-planes and improve the reconstructed image quality. Third, we propose a retraining scheme for the decoder to attain better rate-distortion tradeoffs. Extensive experiments show that CTC outperforms the baseline trit-plane codec significantly in BD-rate on the Kodak lossless dataset, while increasing the time complexity only marginally. Our codes are available at https://github.com/seungminjeon-github/CTC.
CVAug 22, 2023Code
Recursive Video Lane DetectionDongkwon Jin, Dahyun Kim, Chang-Su Kim
A novel algorithm to detect road lanes in videos, called recursive video lane detector (RVLD), is proposed in this paper, which propagates the state of a current frame recursively to the next frame. RVLD consists of an intra-frame lane detector (ILD) and a predictive lane detector (PLD). First, we design ILD to localize lanes in a still frame. Second, we develop PLD to exploit the information of the previous frame for lane detection in a current frame. To this end, we estimate a motion field and warp the previous output to the current frame. Using the warped information, we refine the feature map of the current frame to detect lanes more reliably. Experimental results show that RVLD outperforms existing detectors on video lane datasets. Our codes are available at https://github.com/dongkwonjin/RVLD.
CVMar 29, 2022Code
Semantic Line Detection Using Mirror Attention and Comparative Ranking and MatchingDongkwon Jin, Jun-Tae Lee, Chang-Su Kim
A novel algorithm to detect semantic lines is proposed in this paper. We develop three networks: detection network with mirror attention (D-Net) and comparative ranking and matching networks (R-Net and M-Net). D-Net extracts semantic lines by exploiting rich contextual information. To this end, we design the mirror attention module. Then, through pairwise comparisons of extracted semantic lines, we iteratively select the most semantic line and remove redundant ones overlapping with the selected one. For the pairwise comparisons, we develop R-Net and M-Net in the Siamese architecture. Experiments demonstrate that the proposed algorithm outperforms the conventional semantic line detector significantly. Moreover, we apply the proposed algorithm to detect two important kinds of semantic lines successfully: dominant parallel lines and reflection symmetry axes. Our codes are available at https://github.com/dongkwonjin/Semantic-Line-DRM.
CVMar 24, 2022Code
Moving Window Regression: A Novel Approach to Ordinal RegressionNyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim
A novel ordinal regression algorithm, called moving window regression (MWR), is proposed in this paper. First, we propose the notion of relative rank ($ρ$-rank), which is a new order representation scheme for input and reference instances. Second, we develop global and local relative regressors ($ρ$-regressors) to predict $ρ$-ranks within entire and specific rank ranges, respectively. Third, we refine an initial rank estimate iteratively by selecting two reference instances to form a search window and then estimating the $ρ$-rank within the window. Extensive experiments results show that the proposed algorithm achieves the state-of-the-art performances on various benchmark datasets for facial age estimation and historical color image classification. The codes are available at https://github.com/nhshin-mcl/MWR.
CVAug 24, 2022Code
Applying Eigencontours to PolarMask-Based Instance SegmentationWonhui Park, Dongkwon Jin, Chang-Su Kim
Eigencontours are the first data-driven contour descriptors based on singular value decomposition. Based on the implementation of ESE-Seg, eigencontours were applied to the instance segmentation task successfully. In this report, we incorporate eigencontours into the PolarMask network for instance segmentation. Experimental results demonstrate that the proposed algorithm yields better results than PolarMask on two instance segmentation datasets of COCO2017 and SBD. Also, we analyze the characteristics of eigencontours qualitatively. Our codes are available at https://github.com/dnjs3594/Eigencontours.
CVJul 19, 2024Code
Forbes: Face Obfuscation Rendering via Backpropagation Refinement SchemeJintae Kim, Seungwon yang, Seong-Gyun Jeong et al.
A novel algorithm for face obfuscation, called Forbes, which aims to obfuscate facial appearance recognizable by humans but preserve the identity and attributes decipherable by machines, is proposed in this paper. Forbes first applies multiple obfuscating transformations with random parameters to an image to remove the identity information distinguishable by humans. Then, it optimizes the parameters to make the transformed image decipherable by machines based on the backpropagation refinement scheme. Finally, it renders an obfuscated image by applying the transformations with the optimized parameters. Experimental results on various datasets demonstrate that Forbes achieves both human indecipherability and machine decipherability excellently. The source codes are available at https://github.com/mcljtkim/Forbes.
CVAug 23, 2022
Depth Map Decomposition for Monocular Depth EstimationJinyoung Jun, Jae-Han Lee, Chul Lee et al.
We propose a novel algorithm for monocular depth estimation that decomposes a metric depth map into a normalized depth map and scale features. The proposed network is composed of a shared encoder and three decoders, called G-Net, N-Net, and M-Net, which estimate gradient maps, a normalized depth map, and a metric depth map, respectively. M-Net learns to estimate metric depths more accurately using relative depth features extracted by G-Net and N-Net. The proposed algorithm has the advantage that it can use datasets without metric depth labels to improve the performance of metric depth estimation. Experimental results on various datasets demonstrate that the proposed algorithm not only provides competitive performance to state-of-the-art algorithms but also yields acceptable results even when only a small amount of metric depth data is available for its training.
CVAug 14, 2024Code
OMR: Occlusion-Aware Memory-Based Refinement for Video Lane DetectionDongkwon Jin, Chang-Su Kim
A novel algorithm for video lane detection is proposed in this paper. First, we extract a feature map for a current frame and detect a latent mask for obstacles occluding lanes. Then, we enhance the feature map by developing an occlusion-aware memory-based refinement (OMR) module. It takes the obstacle mask and feature map from the current frame, previous output, and memory information as input, and processes them recursively in a video. Moreover, we apply a novel data augmentation scheme for training the OMR module effectively. Experimental results show that the proposed algorithm outperforms existing techniques on video lane datasets. Our codes are available at https://github.com/dongkwonjin/OMR.
CVMar 29, 2022
Eigencontours: Novel Contour Descriptors Based on Low-Rank ApproximationWonhui Park, Dongkwon Jin, Chang-Su Kim
Novel contour descriptors, called eigencontours, based on low-rank approximation are proposed in this paper. First, we construct a contour matrix containing all object boundaries in a training set. Second, we decompose the contour matrix into eigencontours via the best rank-M approximation. Third, we represent an object boundary by a linear combination of the M eigencontours. We also incorporate the eigencontours into an instance segmentation framework. Experimental results demonstrate that the proposed eigencontours can represent object boundaries more effectively and more efficiently than existing descriptors in a low-dimensional space. Furthermore, the proposed algorithm yields meaningful performances on instance segmentation datasets.
IVMar 25, 2022
RD-Optimized Trit-Plane Coding of Deep Compressed Image Latent TensorsSeungmin Jeon, Jae-Han Lee, Chang-Su Kim
DPICT is the first learning-based image codec supporting fine granular scalability. In this paper, we describe how to implement two key components of DPICT efficiently: trit-plane slicing and rate-distortion-optimized (RD-optimized) coding. In DPICT, we transform an image into a latent tensor, represent the tensor in ternary digits (trits), and encode the trits in the decreasing order of significance. For entropy encoding, it is necessary to compute the probability of each trit, which demands high time complexity in both the encoder and the decoder. To reduce the complexity, we develop a parallel computing scheme for the probabilities, which is described in detail with pseudo-codes. Moreover, we compare the trit-plane slicing in DPICT with the alternative bit-plane slicing. Experimental results show that the time complexity is reduced significantly by the parallel computing and that the trit-plane slicing provides better RD performances than the bit-plane slicing.
CVMar 20, 2023
Versatile Depth Estimator Based on Common Relative Depth Estimation and Camera-Specific Relative-to-Metric Depth ConversionJinyoung Jun, Jae-Han Lee, Chang-Su Kim
A typical monocular depth estimator is trained for a single camera, so its performance drops severely on images taken with different cameras. To address this issue, we propose a versatile depth estimator (VDE), composed of a common relative depth estimator (CRDE) and multiple relative-to-metric converters (R2MCs). The CRDE extracts relative depth information, and each R2MC converts the relative information to predict metric depths for a specific camera. The proposed VDE can cope with diverse scenes, including both indoor and outdoor scenes, with only a 1.12\% parameter increase per camera. Experimental results demonstrate that VDE supports multiple cameras effectively and efficiently and also achieves state-of-the-art performance in the conventional single-camera scenario.
CVApr 29, 2024Code
MFP: Making Full Use of Probability Maps for Interactive Image SegmentationChaewon Lee, Seon-Ho Lee, Chang-Su Kim
In recent interactive segmentation algorithms, previous probability maps are used as network input to help predictions in the current segmentation round. However, despite the utilization of previous masks, useful information contained in the probability maps is not well propagated to the current predictions. In this paper, to overcome this limitation, we propose a novel and effective algorithm for click-based interactive image segmentation, called MFP, which attempts to make full use of probability maps. We first modulate previous probability maps to enhance their representations of user-specified objects. Then, we feed the modulated probability maps as additional input to the segmentation network. We implement the proposed MFP algorithm based on the ResNet-34, HRNet-18, and ViT-B backbones and assess the performance extensively on various datasets. It is demonstrated that MFP meaningfully outperforms the existing algorithms using identical backbones. The source codes are available at https://github.com/cwlee00/MFP.
CVApr 29, 2024Code
Semantic Line Combination DetectorJinwon Ko, Dongkwon Jin, Chang-Su Kim
A novel algorithm, called semantic line combination detector (SLCD), to find an optimal combination of semantic lines is proposed in this paper. It processes all lines in each line combination at once to assess the overall harmony of the lines. First, we generate various line combinations from reliable lines. Second, we estimate the score of each line combination and determine the best one. Experimental results demonstrate that the proposed SLCD outperforms existing semantic line detectors on various datasets. Moreover, it is shown that SLCD can be applied effectively to three vision tasks of vanishing point detection, symmetry axis detection, and composition-based image retrieval. Our codes are available at https://github.com/Jinwon-Ko/SLCD.
IVDec 12, 2021Code
DPICT: Deep Progressive Image Compression Using Trit-PlanesJae-Han Lee, Seungmin Jeon, Kwang Pyo Choi et al.
We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS). First, we transform an image into a latent tensor using an analysis network. Then, we represent the latent tensor in ternary digits (trits) and encode it into a compressed bitstream trit-plane by trit-plane in the decreasing order of significance. Moreover, within each trit-plane, we sort the trits according to their rate-distortion priorities and transmit more important information first. Since the compression network is less optimized for the cases of using fewer trit-planes, we develop a postprocessing network for refining reconstructed images at low rates. Experimental results show that DPICT outperforms conventional progressive codecs significantly, while enabling FGS transmission. Codes are available at https://github.com/jaehanlee-mcl/DPICT.
CVAug 15, 2021Code
Asymmetric Bilateral Motion Estimation for Video Frame InterpolationJunheum Park, Chul Lee, Chang-Su Kim
We propose a novel video frame interpolation algorithm based on asymmetric bilateral motion estimation (ABME), which synthesizes an intermediate frame between two input frames. First, we predict symmetric bilateral motion fields to interpolate an anchor frame. Second, we estimate asymmetric bilateral motions fields from the anchor frame to the input frames. Third, we use the asymmetric fields to warp the input frames backward and reconstruct the intermediate frame. Last, to refine the intermediate frame, we develop a new synthesis network that generates a set of dynamic filters and a residual frame using local and global information. Experimental results show that the proposed algorithm achieves excellent performance on various datasets. The source codes and pretrained models are available at https://github.com/JunHeum/ABME.
CVApr 21, 2021Code
Guided Interactive Video Object Segmentation Using Reliability-Based Attention MapsYuk Heo, Yeong Jun Koh, Chang-Su Kim
We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time. First, we design the reliability-based attention module to analyze the reliability of multiple annotated frames. Second, we develop the intersection-aware propagation module to propagate segmentation results to neighboring frames. Third, we introduce the GIS mechanism for a user to select unsatisfactory frames quickly with less effort. Experimental results demonstrate that the proposed algorithm provides more accurate segmentation results at a faster speed than conventional algorithms. Codes are available at https://github.com/yuk6heo/GIS-RAmap.
CVApr 14, 2021Code
Harmonious Semantic Line Detection via Maximal Weight Clique SelectionDongkwon Jin, Wonhui Park, Seong-Gyun Jeong et al.
A novel algorithm to detect an optimal set of semantic lines is proposed in this work. We develop two networks: selection network (S-Net) and harmonization network (H-Net). First, S-Net computes the probabilities and offsets of line candidates. Second, we filter out irrelevant lines through a selection-and-removal process. Third, we construct a complete graph, whose edge weights are computed by H-Net. Finally, we determine a maximal weight clique representing an optimal set of semantic lines. Moreover, to assess the overall harmony of detected lines, we propose a novel metric, called HIoU. Experimental results demonstrate that the proposed algorithm can detect harmonious semantic lines effectively and efficiently. Our codes are available at https://github.com/dongkwonjin/Semantic-Line-MWCS.
CVJul 16, 2020Code
Interactive Video Object Segmentation Using Global and Local Transfer ModulesYuk Heo, Yeong Jun Koh, Chang-Su Kim
An interactive video object segmentation algorithm, which takes scribble annotations on query objects as input, is proposed in this paper. We develop a deep neural network, which consists of the annotation network (A-Net) and the transfer network (T-Net). First, given user scribbles on a frame, A-Net yields a segmentation result based on the encoder-decoder architecture. Second, T-Net transfers the segmentation result bidirectionally to the other frames, by employing the global and local transfer modules. The global transfer module conveys the segmentation information in an annotated frame to a target frame, while the local transfer module propagates the segmentation information in a temporally adjacent frame to the target frame. By applying A-Net and T-Net alternately, a user can obtain desired segmentation results with minimal efforts. We train the entire network in two stages, by emulating user scribbles and employing an auxiliary loss. Experimental results demonstrate that the proposed interactive video object segmentation algorithm outperforms the state-of-the-art conventional algorithms. Codes and models are available at https://github.com/yuk6heo/IVOS-ATNet.
CVApr 30, 2024
Masked Spatial Propagation Network for Sparsity-Adaptive Depth RefinementJinyoung Jun, Jae-Han Lee, Chang-Su Kim
The main function of depth completion is to compensate for an insufficient and unpredictable number of sparse depth measurements of hardware sensors. However, existing research on depth completion assumes that the sparsity -- the number of points or LiDAR lines -- is fixed for training and testing. Hence, the completion performance drops severely when the number of sparse depths changes significantly. To address this issue, we propose the sparsity-adaptive depth refinement (SDR) framework, which refines monocular depth estimates using sparse depth points. For SDR, we propose the masked spatial propagation network (MSPN) to perform SDR with a varying number of sparse depths effectively by gradually propagating sparse depth information throughout the entire depth map. Experimental results demonstrate that MPSN achieves state-of-the-art performance on both SDR and conventional depth completion scenarios.
IVFeb 22, 2025
Patch Stitching Data Augmentation for Cancer Classification in Pathology ImagesJiamu Wang, Chang-Su Kim, Jin Tae Kwak
Computational pathology, integrating computational methods and digital imaging, has shown to be effective in advancing disease diagnosis and prognosis. In recent years, the development of machine learning and deep learning has greatly bolstered the power of computational pathology. However, there still remains the issue of data scarcity and data imbalance, which can have an adversarial effect on any computational method. In this paper, we introduce an efficient and effective data augmentation strategy to generate new pathology images from the existing pathology images and thus enrich datasets without additional data collection or annotation costs. To evaluate the proposed method, we employed two sets of colorectal cancer datasets and obtained improved classification results, suggesting that the proposed simple approach holds the potential for alleviating the data scarcity and imbalance in computational pathology.
CVJun 30, 2025
Oneta: Multi-Style Image Enhancement Using Eigentransformation FunctionsJiwon Kim, Soohyun Hwang, Dong-O Kim et al.
The first algorithm, called Oneta, for a novel task of multi-style image enhancement is proposed in this work. Oneta uses two point operators sequentially: intensity enhancement with a transformation function (TF) and color correction with a color correction matrix (CCM). This two-step enhancement model, though simple, achieves a high performance upper bound. Also, we introduce eigentransformation function (eigenTF) to represent TF compactly. The Oneta network comprises Y-Net and C-Net to predict eigenTF and CCM parameters, respectively. To support $K$ styles, Oneta employs $K$ learnable tokens. During training, each style token is learned using image pairs from the corresponding dataset. In testing, Oneta selects one of the $K$ style tokens to enhance an image accordingly. Extensive experiments show that the single Oneta network can effectively undertake six enhancement tasks -- retouching, image signal processing, low-light image enhancement, dehazing, underwater image enhancement, and white balancing -- across 30 datasets.
CVJun 5, 2025
Perfecting Depth: Uncertainty-Aware Enhancement of Metric DepthJinyoung Jun, Lei Chu, Jiahao Li et al.
We propose a novel two-stage framework for sensor depth enhancement, called Perfecting Depth. This framework leverages the stochastic nature of diffusion models to automatically detect unreliable depth regions while preserving geometric cues. In the first stage (stochastic estimation), the method identifies unreliable measurements and infers geometric structure by leveraging a training-inference domain gap. In the second stage (deterministic refinement), it enforces structural consistency and pixel-level accuracy using the uncertainty map derived from the first stage. By combining stochastic uncertainty modeling with deterministic refinement, our method yields dense, artifact-free depth maps with improved reliability. Experimental results demonstrate its effectiveness across diverse real-world scenarios. Furthermore, theoretical analysis, various experiments, and qualitative visualizations validate its robustness and scalability. Our framework sets a new baseline for sensor depth enhancement, with potential applications in autonomous driving, robotics, and immersive technologies.
CVApr 29, 2024
Clicks2Line: Using Lines for Interactive Image SegmentationChaewon Lee, Chang-Su Kim
For click-based interactive segmentation methods, reducing the number of clicks required to obtain a desired segmentation result is essential. Although recent click-based methods yield decent segmentation results, we observe that substantial amount of clicks are required to segment elongated regions. To reduce the amount of user-effort required, we propose using lines instead of clicks for such cases. In this paper, an interactive segmentation algorithm which adaptively adopts either clicks or lines as input is proposed. Experimental results demonstrate that using lines can generate better segmentation results than clicks for several cases.
IVSep 13, 2021
IceNet for Interactive Contrast EnhancementKeunsoo Ko, Chang-Su Kim
A CNN-based interactive contrast enhancement algorithm, called IceNet, is proposed in this work, which enables a user to adjust image contrast easily according to his or her preference. Specifically, a user provides a parameter for controlling the global brightness and two types of scribbles to darken or brighten local regions in an image. Then, given these annotations, IceNet estimates a gamma map for the pixel-wise gamma correction. Finally, through color restoration, an enhanced image is obtained. The user may provide annotations iteratively to obtain a satisfactory image. IceNet is also capable of producing a personalized enhanced image automatically, which can serve as a basis for further adjustment if so desired. Moreover, to train IceNet effectively and reliably, we propose three differentiable losses. Extensive experiments show that IceNet can provide users with satisfactorily enhanced images.
CVJul 17, 2020
BMBC:Bilateral Motion Estimation with Bilateral Cost Volume for Video InterpolationJunheum Park, Keunsoo Ko, Chul Lee et al.
Video interpolation increases the temporal resolution of a video sequence by synthesizing intermediate frames between two consecutive frames. We propose a novel deep-learning-based video interpolation algorithm based on bilateral motion estimation. First, we develop the bilateral motion network with the bilateral cost volume to estimate bilateral motions accurately. Then, we approximate bi-directional motions to predict a different kind of bilateral motions. We then warp the two input frames using the estimated bilateral motions. Next, we develop the dynamic filter generation network to yield dynamic blending filters. Finally, we combine the warped frames using the dynamic blending filters to generate intermediate frames. Experimental results show that the proposed algorithm outperforms the state-of-the-art video interpolation algorithms on several benchmark datasets.