Yeong-Jun Cho

h-index9

11papers

161citations

Novelty51%

AI Score47

Ranked #31,746 of 194,257 authors (top 16%)#11,360 in CV (top 19%)

11 Papers

3.6CVNov 11, 2025Code

CSF-Net: Context-Semantic Fusion Network for Large Mask Inpainting

Chae-Yeon Heo, Yeong-Jun Cho

In this paper, we propose a semantic-guided framework to address the challenging problem of large-mask image inpainting, where essential visual content is missing and contextual cues are limited. To compensate for the limited context, we leverage a pretrained Amodal Completion (AC) model to generate structure-aware candidates that serve as semantic priors for the missing regions. We introduce Context-Semantic Fusion Network (CSF-Net), a transformer-based fusion framework that fuses these candidates with contextual features to produce a semantic guidance image for image inpainting. This guidance improves inpainting quality by promoting structural accuracy and semantic consistency. CSF-Net can be seamlessly integrated into existing inpainting models without architectural changes and consistently enhances performance across diverse masking conditions. Extensive experiments on the Places365 and COCOA datasets demonstrate that CSF-Net effectively reduces object hallucination while enhancing visual realism and semantic alignment. The code for CSF-Net is available at https://github.com/chaeyeonheo/CSF-Net.

2.0CVJul 16, 2024

Flatfish Lesion Detection Based on Part Segmentation Approach and Lesion Image Generation

Seo-Bin Hwang, Han-Young Kim, Chae-Yeon Heo et al.

The flatfish is a major farmed species consumed globally in large quantities. However, due to the densely populated farming environment, flatfish are susceptible to lesions and diseases, making early lesion detection crucial. Traditionally, lesions were detected through visual inspection, but observing large numbers of fish is challenging. Automated approaches based on deep learning technologies have been widely used to address this problem, but accurate detection remains difficult due to the diversity of the fish and the lack of a fish lesion and disease dataset. This study augments fish lesion images using generative adversarial networks and image harmonization methods. Next, lesion detectors are trained separately for three body parts (head, fins, and body) to address individual lesions properly. Additionally, a flatfish lesion and disease image dataset, called FlatIMG, is created and verified using the proposed methods on the dataset. A flash salmon lesion dataset is also tested to validate the generalizability of the proposed methods. The results achieved 12% higher performance than the baseline framework. This study is the first attempt to create a high-quality flatfish lesion image dataset with detailed annotations and propose an effective lesion detection framework. Automatic lesion and disease monitoring can be achieved in farming environments using the proposed methods and dataset.

1.5CVFeb 5

DroneKey++: A Size Prior-free Method and New Benchmark for Drone 3D Pose Estimation from Sequential Images

Seo-Bin Hwang, Yeong-Jun Cho

Accurate 3D pose estimation of drones is essential for security and surveillance systems. However, existing methods often rely on prior drone information such as physical sizes or 3D meshes. At the same time, current datasets are small-scale, limited to single models, and collected under constrained environments, which makes reliable validation of generalization difficult. We present DroneKey++, a prior-free framework that jointly performs keypoint detection, drone classification, and 3D pose estimation. The framework employs a keypoint encoder for simultaneous keypoint detection and classification, and a pose decoder that estimates 3D pose using ray-based geometric reasoning and class embeddings. To address dataset limitations, we construct 6DroneSyn, a large-scale synthetic benchmark with over 50K images covering 7 drone models and 88 outdoor backgrounds, generated using 360-degree panoramic synthesis. Experiments show that DroneKey++ achieves MAE 17.34 deg and MedAE 17.1 deg for rotation, MAE 0.135 m and MedAE 0.242 m for translation, with inference speeds of 19.25 FPS (CPU) and 414.07 FPS (GPU), demonstrating both strong generalization across drone models and suitability for real-time applications. The dataset is publicly available.

3.6CVNov 11, 2025

RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

Hae-Won Jo, Yeong-Jun Cho

Dynamic Scene Graph Generation (DSGG) models how object relations evolve over time in videos. However, existing methods are trained only on annotated object pairs and lack guidance for non-related pairs, making it difficult to identify meaningful relations during inference. In this paper, we propose Relation Scoring Network (RS-Net), a modular framework that scores the contextual importance of object pairs using both spatial interactions and long-range temporal context. RS-Net consists of a spatial context encoder with learnable context tokens and a temporal encoder that aggregates video-level information. The resulting relation scores are integrated into a unified triplet scoring mechanism to enhance relation prediction. RS-Net can be easily integrated into existing DSGG models without architectural changes. Experiments on the Action Genome dataset show that RS-Net consistently improves both Recall and Precision across diverse baselines, with notable gains in mean Recall, highlighting its ability to address the long-tailed distribution of relations. Despite the increased number of parameters, RS-Net maintains competitive efficiency, achieving superior performance over state-of-the-art methods.

2.0CVAug 10, 2024

Object Re-identification via Spatial-temporal Fusion Networks and Causal Identity Matching

Hye-Geun Kim, Yong-Hyuk Moon, Yeong-Jun Cho

Object re-identification (ReID) in large camera networks faces numerous challenges. First, the similar appearances of objects degrade ReID performance, a challenge that needs to be addressed by existing appearance-based ReID methods. Second, most ReID studies are performed in laboratory settings and do not consider real-world scenarios. To overcome these challenges, we introduce a novel ReID framework that leverages a spatial-temporal fusion network and causal identity matching (CIM). Our framework estimates camera network topology using a proposed adaptive Parzen window and combines appearance features with spatial-temporal cues within the fusion network. This approach has demonstrated outstanding performance across several datasets, including VeRi776, Vehicle-3I, and Market-1501, achieving up to 99.70% rank-1 accuracy and 95.5% mAP. Furthermore, the proposed CIM approach, which dynamically assigns gallery sets based on camera network topology, has further improved ReID accuracy and robustness in real-world settings, evidenced by a 94.95% mAP and a 95.19% F1 score on the Vehicle-3I dataset. The experimental results support the effectiveness of incorporating spatial-temporal information and CIM for real-world ReID scenarios, regardless of the data domain (e.g., vehicle, person).

2.8CVSep 3, 2023

Spatial-temporal Vehicle Re-identification

Hye-Geun Kim, YouKyoung Na, Hae-Won Joe et al.

Vehicle re-identification (ReID) in a large-scale camera network is important in public safety, traffic control, and security. However, due to the appearance ambiguities of vehicle, the previous appearance-based ReID methods often fail to track vehicle across multiple cameras. To overcome the challenge, we propose a spatial-temporal vehicle ReID framework that estimates reliable camera network topology based on the adaptive Parzen window method and optimally combines the appearance and spatial-temporal similarities through the fusion network. Based on the proposed methods, we performed superior performance on the public dataset (VeRi776) by 99.64% of rank-1 accuracy. The experimental results support that utilizing spatial and temporal information for ReID can leverage the accuracy of appearance-based methods and effectively deal with appearance ambiguities.

8.7CVJul 21, 2021Code

Weighted Intersection over Union (wIoU) for Evaluating Image Segmentation

Yeong-Jun Cho

In recent years, many semantic segmentation methods have been proposed to predict label of pixels in the scene. In general, we measure area prediction errors or boundary prediction errors for comparing methods. However, there is no intuitive evaluation metric that evaluates both aspects. In this work, we propose a new evaluation measure called weighted Intersection over Union (wIoU) for semantic segmentation. First, it builds a weight map generated from a boundary distance map, allowing weighted evaluation for each pixel based on a boundary importance factor. The proposed wIoU can evaluate both contour and region by setting a boundary importance factor. We validated the effectiveness of wIoU on a dataset of 33 scenes and demonstrated its flexibility. Using the proposed metric, we expect more flexible and intuitive evaluation in semantic segmentation field are possible.

0.9CVDec 1, 2017

Distance-based Camera Network Topology Inference for Person Re-identification

Yeong-Jun Cho, Kuk-Jin Yoon

In this paper, we propose a novel distance-based camera network topology inference method for efficient person re-identification. To this end, we first calibrate each camera and estimate relative scales between cameras. Using the calibration results of multiple cameras, we calculate the speed of each person and infer the distance between cameras to generate distance-based camera network topology. The proposed distance-based topology can be applied adaptively to each person according to its speed and handle diverse transition time of people between non-overlapping cameras. To validate the proposed method, we tested the proposed method using an open person re-identification dataset and compared to state-of-the-art methods. The experimental results show that the proposed method is effective for person re-identification in the large-scale camera network with various people transition time.

7.1CVOct 3, 2017

Joint Person Re-identification and Camera Network Topology Inference in Multiple Cameras

Yeong-Jun Cho, Su-A Kim, Jae-Han Park et al.

Person re-identification is the task of recognizing or identifying a person across multiple views in multi-camera networks. Although there has been much progress in person re-identification, person re-identification in large-scale multi-camera networks still remains a challenging task because of the large spatio-temporal uncertainty and high complexity due to a large number of cameras and people. To handle these difficulties, additional information such as camera network topology should be provided, which is also difficult to automatically estimate, unfortunately. In this study, we propose a unified framework which jointly solves both person re-identification and camera network topology inference problems with minimal prior knowledge about the environments. The proposed framework takes general multi-camera network environments into account and can be applied to online person re-identification in large-scale multi-camera networks. In addition, to effectively show the superiority of the proposed framework, we provide a new person re-identification dataset with full annotations, named SLP, captured in the multi-camera network consisting of nine non-overlapping cameras. Experimental results using our person re-identification and public datasets show that the proposed methods are promising for both person re-identification and camera topology inference tasks.

6.6CVMay 17, 2017

PaMM: Pose-aware Multi-shot Matching for Improving Person Re-identification

Yeong-Jun Cho, Kuk-Jin Yoon

Person re-identification is the problem of recognizing people across different images or videos with non-overlapping views. Although there has been much progress in person re-identification over the last decade, it remains a challenging task because appearances of people can seem extremely different across diverse camera viewpoints and person poses. In this paper, we propose a novel framework for person re-identification by analyzing camera viewpoints and person poses in a so-called Pose-aware Multi-shot Matching (PaMM), which robustly estimates people's poses and efficiently conducts multi-shot matching based on pose information. Experimental results using public person re-identification datasets show that the proposed methods outperform state-of-the-art methods and are promising for person re-identification from diverse viewpoints and pose variances.

2.4CVApr 24, 2017

Unified Framework for Automated Person Re-identification and Camera Network Topology Inference in Camera Networks

Yeong-Jun Cho, Jae-Han Park, Su-A Kim et al.

Person re-identification in large-scale multi-camera networks is a challenging task because of the spatio-temporal uncertainty and high complexity due to large numbers of cameras and people. To handle these difficulties, additional information such as camera network topology should be provided, which is also difficult to automatically estimate. In this paper, we propose a unified framework which jointly solves both person re-id and camera network topology inference problems. The proposed framework takes general multi-camera network environments into account. To effectively show the superiority of the proposed framework, we also provide a new person re-id dataset with full annotations, named SLP, captured in the synchronized multi-camera network. Experimental results show that the proposed methods are promising for both person re-id and camera topology inference tasks.