Dongwon Park

CV
13papers
670citations
Novelty50%
AI Score28

13 Papers

CVNov 7, 2022
Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

Andrey Ignatov, Grigory Malivenko, Radu Timofte et al. · tencent-ai

Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper.

IVNov 7, 2022
Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Andrey Ignatov, Radu Timofte, Maurizio Denna et al.

Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.

IVJul 20, 2023
Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors

Haechang Lee, Dongwon Park, Wongi Jeong et al.

As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introduce visual artifacts during demosaicing due to their inherent pixel pattern structures and sensor hardware characteristics. Previous demosaicing methods have primarily focused on Bayer CFA, necessitating distinct reconstruction methods for non-Bayer patterned CIS with various CFA modes under different lighting conditions. In this work, we propose an efficient unified demosaicing method that can be applied to both conventional Bayer RAW and various non-Bayer CFAs' RAW data in different operation modes. Our Knowledge Learning-based demosaicing model for Adaptive Patterns, namely KLAP, utilizes CFA-adaptive filters for only 1% key filters in the network for each CFA, but still manages to effectively demosaic all the CFAs, yielding comparable performance to the large-scale models. Furthermore, by employing meta-learning during inference (KLAP-M), our model is able to eliminate unknown sensor-generic artifacts in real RAW data, effectively bridging the gap between synthetic images and real sensor RAW. Our KLAP and KLAP-M methods achieved state-of-the-art demosaicing performance in both synthetic and real RAW data of Bayer and non-Bayer CFAs.

IVMay 10, 2022
Self-supervised regression learning using domain knowledge: Applications to improving self-supervised denoising in imaging

Il Yong Chun, Dongwon Park, Xuehang Zheng et al.

Regression that predicts continuous quantity is a central part of applications using computational imaging and computer vision technologies. Yet, studying and understanding self-supervised learning for regression tasks - except for a particular regression task, image denoising - have lagged behind. This paper proposes a general self-supervised regression learning (SSRL) framework that enables learning regression neural networks with only input data (but without ground-truth target data), by using a designable pseudo-predictor that encapsulates domain knowledge of a specific application. The paper underlines the importance of using domain knowledge by showing that under different settings, the better pseudo-predictor can lead properties of SSRL closer to those of ordinary supervised learning. Numerical experiments for low-dose computational tomography denoising and camera image denoising demonstrate that proposed SSRL significantly improves the denoising quality over several existing self-supervised denoising methods.

IVAug 16, 2022
Self-supervised training of deep denoisers in multi-coil MRI considering noise correlations

Juhyung Park, Dongwon Park, Sooyeon Ji et al.

Deep learning-based denoising methods have shown powerful results for improving the signal-to-noise ratio of magnetic resonance (MR) images, mostly by leveraging supervised learning with clean ground truth. However, acquiring clean ground truth images is often expensive and time-consuming. Self supervised methods have been widely investigated to mitigate the dependency on clean images, but mostly rely on the suboptimal splitting of K-space measurements of an image to yield input and target images for ensuring statistical independence. In this study, we investigate an alternative self-supervised training method for deep denoisers in multi-coil MRI, dubbed Coil2Coil (C2C), that naturally split and combine the multi-coil data among phased array coils, generating two noise-corrupted images for training. This novel approach allows exploiting multi-coil redundancy, but the images are statistically correlated and may not have the same clean image. To mitigate these issues, we propose the methods to pproximately decorrelate the statistical dependence of these images and match the underlying clean images, thus enabling them to be used as the training pairs. For synthetic denoising experiments, C2C yielded the best performance against prior self-supervised methods, reporting outcome comparable even to supervised methods. For real-world denoising cases, C2C yielded consistent performance as synthetic cases, removing only noise structures.

CVDec 23, 2020
Blur More To Deblur Better: Multi-Blur2Deblur For Efficient Video Deblurring

Dongwon Park, Dong Un Kang, Se Young Chun

One of the key components for video deblurring is how to exploit neighboring frames. Recent state-of-the-art methods either used aligned adjacent frames to the center frame or propagated the information on past frames to the current frame recurrently. Here we propose multi-blur-to-deblur (MB2D), a novel concept to exploit neighboring frames for efficient video deblurring. Firstly, inspired by unsharp masking, we argue that using more blurred images with long exposures as additional inputs significantly improves performance. Secondly, we propose multi-blurring recurrent neural network (MBRNN) that can synthesize more blurred images from neighboring frames, yielding substantially improved performance with existing video deblurring methods. Lastly, we propose multi-scale deblurring with connecting recurrent feature map from MBRNN (MSDR) to achieve state-of-the-art performance on the popular GoPro and Su datasets in fast and memory efficient ways.

LGFeb 11, 2020
Task-Aware Variational Adversarial Active Learning

Kwanyoung Kim, Dongwon Park, Kwang In Kim et al.

Often, labeling large amount of data is challenging due to high labeling cost limiting the application domain of deep learning techniques. Active learning (AL) tackles this by querying the most informative samples to be annotated among unlabeled pool. Two promising directions for AL that have been recently explored are task-agnostic approach to select data points that are far from the current labeled pool and task-aware approach that relies on the perspective of task model. Unfortunately, the former does not exploit structures from tasks and the latter does not seem to well-utilize overall data distribution. Here, we propose task-aware variational adversarial AL (TA-VAAL) that modifies task-agnostic VAAL, that considered data distribution of both label and unlabeled pools, by relaxing task learning loss prediction to ranking loss prediction and by using ranking conditional generative adversarial network to embed normalized ranking loss information on VAAL. Our proposed TA-VAAL outperforms state-of-the-arts on various benchmark datasets for classifications with balanced / imbalanced labels as well as semantic segmentation and its task-aware and task-agnostic AL properties were confirmed with our in-depth analyses.

IVNov 18, 2019
Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training

Dongwon Park, Dong Un Kang, Jisoo Kim et al.

Multi-scale (MS) approaches have been widely investigated for blind single image / video deblurring that sequentially recovers deblurred images in low spatial scale first and then in high spatial scale later with the output of lower scales. MS approaches have been effective especially for severe blurs induced by large motions in high spatial scale since those can be seen as small blurs in low spatial scale. In this work, we investigate alternative approach to MS, called multi-temporal (MT) approach, for non-uniform single image deblurring. We propose incremental temporal training with constructed MT level dataset from time-resolved dataset, develop novel MT-RNNs with recurrent feature maps, and investigate progressive single image deblurring over iterations. Our proposed MT methods outperform state-of-the-art MS methods on the GoPro dataset in PSNR with the smallest number of parameters.

CVSep 16, 2019
A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection

Dongwon Park, Yonghyeok Seo, Dongju Shin et al.

Recently, robotic grasp detection (GD) and object detection (OD) with reasoning have been investigated using deep neural networks (DNNs). There have been works to combine these multi-tasks using separate networks so that robots can deal with situations of grasping specific target objects in the cluttered, stacked, complex piles of novel objects from a single RGB-D camera. We propose a single multi-task DNN that yields the information on GD, OD and relationship reasoning among objects with a simple post-processing. Our proposed methods yielded state-of-the-art performance with the accuracy of 98.6% and 74.2% and the computation speed of 33 and 62 frame per second on VMRD and Cornell datasets, respectively. Our methods also yielded 95.3% grasp success rate for single novel object grasping with a 4-axis robot arm and 86.7% grasp success rate in cluttered novel objects with a Baxter robot.

CVMar 25, 2019
Down-Scaling with Learned Kernels in Multi-Scale Deep Neural Networks for Non-Uniform Single Image Deblurring

Dongwon Park, Jisoo Kim, Se Young Chun

Multi-scale approach has been used for blind image / video deblurring problems to yield excellent performance for both conventional and recent deep-learning-based state-of-the-art methods. Bicubic down-sampling is a typical choice for multi-scale approach to reduce spatial dimension after filtering with a fixed kernel. However, this fixed kernel may be sub-optimal since it may destroy important information for reliable deblurring such as strong edges. We propose convolutional neural network (CNN)-based down-scale methods for multi-scale deep-learning-based non-uniform single image deblurring. We argue that our CNN-based down-scaling effectively reduces the spatial dimension of the original image, while learned kernels with multiple channels may well-preserve necessary details for deblurring tasks. For each scale, we adopt to use RCAN (Residual Channel Attention Networks) as a backbone network to further improve performance. Our proposed method yielded state-of-the-art performance on GoPro dataset by large margin. Our proposed method was able to achieve 2.59dB higher PSNR than the current state-of-the-art method by Tao. Our proposed CNN-based down-scaling was the key factor for this excellent performance since the performance of our network without it was decreased by 1.98dB. The same networks trained with GoPro set were also evaluated on large-scale Su dataset and our proposed method yielded 1.15dB better PSNR than the Tao's method. Qualitative comparisons on Lai dataset also confirmed the superior performance of our proposed method over other state-of-the-art methods.

CVDec 19, 2018
Real-Time, Highly Accurate Robotic Grasp Detection using Fully Convolutional Neural Network with Rotation Ensemble Module

Dongwon Park, Yonghyeok Seo, Se Young Chun

Rotation invariance has been an important topic in computer vision tasks. Ideally, robot grasp detection should be rotation-invariant. However, rotation-invariance in robotic grasp detection has been only recently studied by using rotation anchor box that are often time-consuming and unreliable for multiple objects. In this paper, we propose a rotation ensemble module (REM) for robotic grasp detection using convolutions that rotates network weights. Our proposed REM was able to outperform current state-of-the-art methods by achieving up to 99.2% (image-wise), 98.6% (object-wise) accuracies on the Cornell dataset with real-time computation (50 frames per second). Our proposed method was also able to yield reliable grasps for multiple objects and up to 93.8% success rate for the real-time robotic grasping task with a 4-axis robot arm for small novel objects that was significantly higher than the baseline methods by 11-56%.

CVSep 16, 2018
Real-Time, Highly Accurate Robotic Grasp Detection using Fully Convolutional Neural Networks with High-Resolution Images

Dongwon Park, Yonghyeok Seo, Se Young Chun

Robotic grasp detection for novel objects is a challenging task, but for the last few years, deep learning based approaches have achieved remarkable performance improvements, up to 96.1% accuracy, with RGB-D data. In this paper, we propose fully convolutional neural network (FCNN) based methods for robotic grasp detection. Our methods also achieved state-of-the-art detection accuracy (up to 96.6%) with state-of- the-art real-time computation time for high-resolution images (6-20ms per 360x360 image) on Cornell dataset. Due to FCNN, our proposed method can be applied to images with any size for detecting multigrasps on multiobjects. Proposed methods were evaluated using 4-axis robot arm with small parallel gripper and RGB-D camera for grasping challenging small, novel objects. With accurate vision-robot coordinate calibration through our proposed learning-based, fully automatic approach, our proposed method yielded 90% success rate.

CVMar 4, 2018
Classification based Grasp Detection using Spatial Transformer Network

Dongwon Park, Se Young Chun

Robotic grasp detection task is still challenging, particularly for novel objects. With the recent advance of deep learning, there have been several works on detecting robotic grasp using neural networks. Typically, regression based grasp detection methods have outperformed classification based detection methods in computation complexity with excellent accuracy. However, classification based robotic grasp detection still seems to have merits such as intermediate step observability and straightforward back propagation routine for end-to-end training. In this work, we propose a novel classification based robotic grasp detection method with multiple-stage spatial transformer networks (STN). Our proposed method was able to achieve state-of-the-art performance in accuracy with real- time computation. Additionally, unlike other regression based grasp detection methods, our proposed method allows partial observation for intermediate results such as grasp location and orientation for a number of grasp configuration candidates.