Minseok Seo

CV
h-index10
28papers
268citations
Novelty50%
AI Score57

28 Papers

CVDec 20, 2022Code
Self-Pair: Synthesizing Changes from Single Source for Object Change Detection in Remote Sensing Imagery

Minseok Seo, Hakjin Lee, Yongjin Jeon et al.

For change detection in remote sensing, constructing a training dataset for deep learning models is difficult due to the requirements of bi-temporal supervision. To overcome this issue, single-temporal supervision which treats change labels as the difference of two semantic masks has been proposed. This novel method trains a change detector using two spatially unrelated images with corresponding semantic labels such as building. However, training on unpaired datasets could confuse the change detector in the case of pixels that are labeled unchanged but are visually significantly different. In order to maintain the visual similarity in unchanged area, in this paper, we emphasize that the change originates from the source image and show that manipulating the source image as an after-image is crucial to the performance of change detection. Extensive experiments demonstrate the importance of maintaining visual information between pre- and post-event images, and our method outperforms existing methods based on single-temporal supervision. code is available at https://github.com/seominseok0429/Self-Pair-for-Change-Detection.

CVMar 21, 2023Code
CAFS: Class Adaptive Framework for Semi-Supervised Semantic Segmentation

Jingi Ju, Hyeoncheol Noh, Yooseung Wang et al.

Semi-supervised semantic segmentation learns a model for classifying pixels into specific classes using a few labeled samples and numerous unlabeled images. The recent leading approach is consistency regularization by selftraining with pseudo-labeling pixels having high confidences for unlabeled images. However, using only highconfidence pixels for self-training may result in losing much of the information in the unlabeled datasets due to poor confidence calibration of modern deep learning networks. In this paper, we propose a class-adaptive semisupervision framework for semi-supervised semantic segmentation (CAFS) to cope with the loss of most information that occurs in existing high-confidence-based pseudolabeling methods. Unlike existing semi-supervised semantic segmentation frameworks, CAFS constructs a validation set on a labeled dataset, to leverage the calibration performance for each class. On this basis, we propose a calibration aware class-wise adaptive thresholding and classwise adaptive oversampling using the analysis results from the validation set. Our proposed CAFS achieves state-ofthe-art performance on the full data partition of the base PASCAL VOC 2012 dataset and on the 1/4 data partition of the Cityscapes dataset with significant margins of 83.0% and 80.4%, respectively. The code is available at https://github.com/cjf8899/CAFS.

CVMar 26, 2023
VisDA 2022 Challenge: Domain Adaptation for Industrial Waste Sorting

Dina Bashkirova, Samarth Mishra, Diala Lteif et al.

Label-efficient and reliable semantic segmentation is essential for many real-life applications, especially for industrial settings with high visual diversity, such as waste sorting. In industrial waste sorting, one of the biggest challenges is the extreme diversity of the input stream depending on factors like the location of the sorting facility, the equipment available in the facility, and the time of year, all of which significantly impact the composition and visual appearance of the waste stream. These changes in the data are called ``visual domains'', and label-efficient adaptation of models to such domains is needed for successful semantic segmentation of industrial waste. To test the abilities of computer vision models on this task, we present the VisDA 2022 Challenge on Domain Adaptation for Industrial Waste Sorting. Our challenge incorporates a fully-annotated waste sorting dataset, ZeroWaste, collected from two real material recovery facilities in different locations and seasons, as well as a novel procedurally generated synthetic waste sorting dataset, SynthWaste. In this competition, we aim to answer two questions: 1) can we leverage domain adaptation techniques to minimize the domain gap? and 2) can synthetic data augmentation improve performance on this task and help adapt to changing data distributions? The results of the competition show that industrial waste detection poses a real domain adaptation problem, that domain generalization techniques such as augmentations, ensembling, etc., improve the overall performance on the unlabeled target domain examples, and that leveraging synthetic data effectively remains an open problem. See https://ai.bu.edu/visda-2022/

CVMay 31, 2022Code
Bag of Tricks for Domain Adaptive Multi-Object Tracking

Minseok Seo, Jeongwon Ryu, Kwangjin Yoon

In this paper, SIA_Track is presented which is developed by a research team from SI Analytics. The proposed method was built from pre-existing detector and tracker under the tracking-by-detection paradigm. The tracker we used is an online tracker that merely links newly received detections with existing tracks. The core part of our method is training procedure of the object detector where synthetic and unlabeled real data were only used for training. To maximize the performance on real data, we first propose to use pseudo-labeling that generates imperfect labels for real data using a model trained with synthetic dataset. After that model soups scheme was applied to aggregate weights produced during iterative pseudo-labeling. Besides, cross-domain mixed sampling also helped to increase detection performance on real data. Our method, SIA_Track, takes the first place on MOTSynth2MOT17 track at BMTT 2022 challenge. The code is available on https://github.com/SIAnalytics/BMTT2022_SIA_track.

CVNov 26, 2022Code
1st Place Solution to NeurIPS 2022 Challenge on Visual Domain Adaptation

Daehan Kim, Minseok Seo, YoungJin Jeon et al.

The Visual Domain Adaptation(VisDA) 2022 Challenge calls for an unsupervised domain adaptive model in semantic segmentation tasks for industrial waste sorting. In this paper, we introduce the SIA_Adapt method, which incorporates several methods for domain adaptive models. The core of our method in the transferable representation from large-scale pre-training. In this process, we choose a network architecture that differs from the state-of-the-art for domain adaptation. After that, self-training using pseudo-labels helps to make the initial adaptation model more adaptable to the target domain. Finally, the model soup scheme helped to improve the generalization performance in the target domain. Our method SIA_Adapt achieves 1st place in the VisDA2022 challenge. The code is available on https: //github.com/DaehanKim-Korea/VisDA2022_Winner_Solution.

CVMar 17, 2023
Bidirectional Domain Mixup for Domain Adaptive Semantic Segmentation

Daehan Kim, Minseok Seo, Kwanyong Park et al.

Mixup provides interpolated training samples and allows the model to obtain smoother decision boundaries for better generalization. The idea can be naturally applied to the domain adaptation task, where we can mix the source and target samples to obtain domain-mixed samples for better adaptation. However, the extension of the idea from classification to segmentation (i.e., structured output) is nontrivial. This paper systematically studies the impact of mixup under the domain adaptaive semantic segmentation task and presents a simple yet effective mixup strategy called Bidirectional Domain Mixup (BDM). In specific, we achieve domain mixup in two-step: cut and paste. Given the warm-up model trained from any adaptation techniques, we forward the source and target samples and perform a simple threshold-based cut out of the unconfident regions (cut). After then, we fill-in the dropped regions with the other domain region patches (paste). In doing so, we jointly consider class distribution, spatial structure, and pseudo label confidence. Based on our analysis, we found that BDM leaves domain transferable regions by cutting, balances the dataset-level class distribution while preserving natural scene context by pasting. We coupled our proposal with various state-of-the-art adaptation models and observe significant improvement consistently. We also provide extensive ablation experiments to empirically verify our main components of the framework. Visit our project page with the code at https://sites.google.com/view/bidirectional-domain-mixup

CVApr 4, 2022
Unsupervised Change Detection Based on Image Reconstruction Loss

Hyeoncheol Noh, Jingi Ju, Minseok Seo et al.

To train the change detector, bi-temporal images taken at different times in the same area are used. However, collecting labeled bi-temporal images is expensive and time consuming. To solve this problem, various unsupervised change detection methods have been proposed, but they still require unlabeled bi-temporal images. In this paper, we propose unsupervised change detection based on image reconstruction loss using only unlabeled single temporal single image. The image reconstruction model is trained to reconstruct the original source image by receiving the source image and the photometrically transformed source image as a pair. During inference, the model receives bi-temporal images as the input, and tries to reconstruct one of the inputs. The changed region between bi-temporal images shows high reconstruction loss. Our change detector showed significant performance in various change detection benchmark datasets even though only a single temporal single source image was used. The code and trained models will be publicly available for reproducibility.

CVMar 14, 2023
Implicit Stacked Autoregressive Model for Video Prediction

Minseok Seo, Hakjin Lee, Doyi Kim et al.

Future frame prediction has been approached through two primary methods: autoregressive and non-autoregressive. Autoregressive methods rely on the Markov assumption and can achieve high accuracy in the early stages of prediction when errors are not yet accumulated. However, their performance tends to decline as the number of time steps increases. In contrast, non-autoregressive methods can achieve relatively high performance but lack correlation between predictions for each time step. In this paper, we propose an Implicit Stacked Autoregressive Model for Video Prediction (IAM4VP), which is an implicit video prediction model that applies a stacked autoregressive method. Like non-autoregressive methods, stacked autoregressive methods use the same observed frame to estimate all future frames. However, they use their own predictions as input, similar to autoregressive methods. As the number of time steps increases, predictions are sequentially stacked in the queue. To evaluate the effectiveness of IAM4VP, we conducted experiments on three common future frame prediction benchmark datasets and weather\&climate prediction benchmark datasets. The results demonstrate that our proposed model achieves state-of-the-art performance.

CVDec 6, 2022
Domain Generalization Strategy to Train Classifiers Robust to Spatial-Temporal Shift

Minseok Seo, Doyi Kim, Seungheon Shin et al.

Deep learning-based weather prediction models have advanced significantly in recent years. However, data-driven models based on deep learning are difficult to apply to real-world applications because they are vulnerable to spatial-temporal shifts. A weather prediction task is especially susceptible to spatial-temporal shifts when the model is overfitted to locality and seasonality. In this paper, we propose a training strategy to make the weather prediction model robust to spatial-temporal shifts. We first analyze the effect of hyperparameters and augmentations of the existing training strategy on the spatial-temporal shift robustness of the model. Next, we propose an optimal combination of hyperparameters and augmentation based on the analysis results and a test-time augmentation. We performed all experiments on the W4C22 Transfer dataset and achieved the 1st performance.

CVDec 6, 2022
Simple Baseline for Weather Forecasting Using Spatiotemporal Context Aggregation Network

Minseok Seo, Doyi Kim, Seungheon Shin et al.

Traditional weather forecasting relies on domain expertise and computationally intensive numerical simulation systems. Recently, with the development of a data-driven approach, weather forecasting based on deep learning has been receiving attention. Deep learning-based weather forecasting has made stunning progress, from various backbone studies using CNN, RNN, and Transformer to training strategies using weather observations datasets with auxiliary inputs. All of this progress has contributed to the field of weather forecasting; however, many elements and complex structures of deep learning models prevent us from reaching physical interpretations. This paper proposes a SImple baseline with a spatiotemporal context Aggregation Network (SIANet) that achieved state-of-the-art in 4 parts of 5 benchmarks of W4C22. This simple but efficient structure uses only satellite images and CNNs in an end-to-end fashion without using a multi-model ensemble or fine-tuning. This simplicity of SIANet can be used as a solid baseline that can be easily applied in weather forecasting using deep learning.

CVJul 14, 2023
Improved Flood Insights: Diffusion-Based SAR to EO Image Translation

Minseok Seo, Youngtack Oh, Doyi Kim et al.

Driven by rapid climate change, the frequency and intensity of flood events are increasing. Electro-Optical (EO) satellite imagery is commonly utilized for rapid response. However, its utilities in flood situations are hampered by issues such as cloud cover and limitations during nighttime, making accurate assessment of damage challenging. Several alternative flood detection techniques utilizing Synthetic Aperture Radar (SAR) data have been proposed. Despite the advantages of SAR over EO in the aforementioned situations, SAR presents a distinct drawback: human analysts often struggle with data interpretation. To tackle this issue, this paper introduces a novel framework, Diffusion-Based SAR to EO Image Translation (DSE). The DSE framework converts SAR images into EO images, thereby enhancing the interpretability of flood insights for humans. Experimental results on the Sen1Floods11 and SEN12-FLOOD datasets confirm that the DSE framework not only delivers enhanced visual information but also improves performance across all tested flood segmentation baselines.

CVApr 30, 2022
Source Domain Subset Sampling for Semi-Supervised Domain Adaptation in Semantic Segmentation

Daehan Kim, Minseok Seo, Jinsun Park et al.

In this paper, we introduce source domain subset sampling (SDSS) as a new perspective of semi-supervised domain adaptation. We propose domain adaptation by sampling and exploiting only a meaningful subset from source data for training. Our key assumption is that the entire source domain data may contain samples that are unhelpful for the adaptation. Therefore, the domain adaptation can benefit from a subset of source data composed solely of helpful and relevant samples. The proposed method effectively subsamples full source data to generate a small-scale meaningful subset. Therefore, training time is reduced, and performance is improved with our subsampled source data. To further verify the scalability of our method, we construct a new dataset called Ocean Ship, which comprises 500 real and 200K synthetic sample images with ground-truth labels. The SDSS achieved a state-of-the-art performance when applied on GTA5 to Cityscapes and SYNTHIA to Cityscapes public benchmark datasets and a 9.13 mIoU improvement on our Ocean Ship dataset over a baseline model.

CVMar 8, 2023
Intermediate and Future Frame Prediction of Geostationary Satellite Imagery With Warp and Refine Network

Minseok Seo, Yeji Choi, Hyungon Ry et al.

Geostationary satellite imagery has applications in climate and weather forecasting, planning natural energy resources, and predicting extreme weather events. For precise and accurate prediction, higher spatial and temporal resolution of geostationary satellite imagery is important. Although recent geostationary satellite resolution has improved, the long-term analysis of climate applications is limited to using multiple satellites from the past to the present due to the different resolutions. To solve this problem, we proposed warp and refine network (WR-Net). WR-Net is divided into an optical flow warp component and a warp image refinement component. We used the TV-L1 algorithm instead of deep learning-based approaches to extract the optical flow warp component. The deep-learning-based model is trained on the human-centric view of the RGB channel and does not work on geostationary satellites, which is gray-scale one-channel imagery. The refinement network refines the warped image through a multi-temporal fusion layer. We evaluated WR-Net by interpolation of temporal resolution at 4 min intervals to 2 min intervals in large-scale GK2A geostationary meteorological satellite imagery. Furthermore, we applied WR-Net to the future frame prediction task and showed that the explicit use of optical flow can help future frame prediction.

CVDec 5, 2023Code
Deterministic Guidance Diffusion Model for Probabilistic Weather Forecasting

Donggeun Yoon, Minseok Seo, Doyi Kim et al.

Weather forecasting requires not only accuracy but also the ability to perform probabilistic prediction. However, deterministic weather forecasting methods do not support probabilistic predictions, and conversely, probabilistic models tend to be less accurate. To address these challenges, in this paper, we introduce the \textbf{\textit{D}}eterministic \textbf{\textit{G}}uidance \textbf{\textit{D}}iffusion \textbf{\textit{M}}odel (DGDM) for probabilistic weather forecasting, integrating benefits of both deterministic and probabilistic approaches. During the forward process, both the deterministic and probabilistic models are trained end-to-end. In the reverse process, weather forecasting leverages the predicted result from the deterministic model, using as an intermediate starting point for the probabilistic model. By fusing deterministic models with probabilistic models in this manner, DGDM is capable of providing accurate forecasts while also offering probabilistic predictions. To evaluate DGDM, we assess it on the global weather forecasting dataset (WeatherBench) and the common video frame prediction benchmark (Moving MNIST). We also introduce and evaluate the Pacific Northwest Windstorm (PNW)-Typhoon weather satellite dataset to verify the effectiveness of DGDM in high-resolution regional forecasting. As a result of our experiments, DGDM achieves state-of-the-art results not only in global forecasting but also in regional forecasting. The code is available at: \url{https://github.com/DongGeun-Yoon/DGDM}.

CVNov 20, 2023
A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow

Jaemin Lee, Minseok Seo, Sangwoo Lee et al.

In general, deep learning-based video frame interpolation (VFI) methods have predominantly focused on estimating motion vectors between two input frames and warping them to the target time. While this approach has shown impressive performance for linear motion between two input frames, it exhibits limitations when dealing with occlusions and nonlinear movements. Recently, generative models have been applied to VFI to address these issues. However, as VFI is not a task focused on generating plausible images, but rather on predicting accurate intermediate frames between two given frames, performance limitations still persist. In this paper, we propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation, allowing it to effectively model occlusions and nonlinear motion. Additionally, we introduce a novel motion perceptual loss that enables MISO-VFI to better capture the spatio-temporal correlations within the video frames. Our MISO-VFI method achieves state-of-the-art results on VFI benchmarks Vimeo90K, Middlebury, and UCF101, with a significant performance gap compared to existing approaches.

CVDec 16, 2024Code
Data-driven Precipitation Nowcasting Using Satellite Imagery

Young-Jae Park, Doyi Kim, Minseok Seo et al.

Accurate precipitation forecasting is crucial for early warnings of disasters, such as floods and landslides. Traditional forecasts rely on ground-based radar systems, which are space-constrained and have high maintenance costs. Consequently, most developing countries depend on a global numerical model with low resolution, instead of operating their own radar systems. To mitigate this gap, we propose the Neural Precipitation Model (NPM), which uses global-scale geostationary satellite imagery. NPM predicts precipitation for up to six hours, with an update every hour. We take three key channels to discriminate rain clouds as input: infrared radiation (at a wavelength of 10.5 $μm$), upper- (6.3 $μm$), and lower- (7.3 $μm$) level water vapor channels. Additionally, NPM introduces positional encoders to capture seasonal and temporal patterns, accounting for variations in precipitation. Our experimental results demonstrate that NPM can predict rainfall in real-time with a resolution of 2 km. The code and dataset are available at https://github.com/seominseok0429/Data-driven-Precipitation-Nowcasting-Using-Satellite-Imagery.

CLMay 13
Query-Conditioned Test-Time Self-Training for Large Language Models

Chaehee Song, Minseok Seo, Yeeun Seong et al.

Large language models (LLMs) are typically deployed with fixed parameters, and their performance is often improved by allocating more computation at inference time. While such test-time scaling can be effective, it cannot correct model misconceptions or adapt the model to the specific structure of an individual query. Test-time optimization addresses this limitation by enabling parameter updates during inference, but existing approaches either rely on external data or optimize generic self-supervised objectives that lack query-specific alignment. In this work, we propose Query-Conditioned Test-Time Self-Training (QueST), a framework that adapts model parameters during inference using supervision derived directly from the input query. Our key insight is that the input query itself encodes latent signals sufficient for constructing structurally related problem--solution pairs. Based on this, QueST generates such query-conditioned pairs and uses them as supervision for parameter-efficient fine-tuning at test time. The adapted model is then used to produce the final answer, enabling query-specific adaptation without any external data. Across seven mathematical reasoning benchmarks and the GPQA-Diamond scientific reasoning benchmark, QueST consistently outperforms strong test-time optimization baselines. These results demonstrate that query-conditioned self-training is an effective and practical paradigm for test-time adaptation in LLMs.

CVSep 30, 2024
Masked Autoregressive Model for Weather Forecasting

Doyi Kim, Minseok Seo, Hakjin Lee et al.

The growing impact of global climate change amplifies the need for accurate and reliable weather forecasting. Traditional autoregressive approaches, while effective for temporal modeling, suffer from error accumulation in long-term prediction tasks. The lead time embedding method has been suggested to address this issue, but it struggles to maintain crucial correlations in atmospheric events. To overcome these challenges, we propose the Masked Autoregressive Model for Weather Forecasting (MAM4WF). This model leverages masked modeling, where portions of the input data are masked during training, allowing the model to learn robust spatiotemporal relationships by reconstructing the missing information. MAM4WF combines the advantages of both autoregressive and lead time embedding methods, offering flexibility in lead time modeling while iteratively integrating predictions. We evaluate MAM4WF across weather, climate forecasting, and video frame prediction datasets, demonstrating superior performance on five test datasets.

CVMar 2
Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation

Minseok Seo, Wonjun Lee, Jaehyuk Jang et al.

Zero-shot depth completion has gained attention for its ability to generalize across environments without sensor-specific datasets or retraining. However, most existing approaches rely on diffusion-based test-time optimization, which is computationally expensive due to iterative denoising. Recent visual-prompt-based methods reduce training cost but still require repeated forward--backward passes through the full frozen network to optimize input-level prompts, resulting in slow inference. In this work, we show that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace. Based on this insight, we propose a lightweight test-time adaptation method that updates only this low-dimensional subspace using sparse depth supervision. Our approach achieves state-of-the-art performance, establishing a new Pareto frontier between accuracy and efficiency for test-time adaptation. Extensive experiments on five indoor and outdoor datasets demonstrate consistent improvements over prior methods, highlighting the practicality of fast zero-shot depth completion.

CVOct 15, 2023
Prototype-oriented Unsupervised Change Detection for Disaster Management

Youngtack Oh, Minseok Seo, Doyi Kim et al.

Climate change has led to an increased frequency of natural disasters such as floods and cyclones. This emphasizes the importance of effective disaster monitoring. In response, the remote sensing community has explored change detection methods. These methods are primarily categorized into supervised techniques, which yield precise results but come with high labeling costs, and unsupervised techniques, which eliminate the need for labeling but involve intricate hyperparameter tuning. To address these challenges, we propose a novel unsupervised change detection method named Prototype-oriented Unsupervised Change Detection for Disaster Management (PUCD). PUCD captures changes by comparing features from pre-event, post-event, and prototype-oriented change synthesis images via a foundational model, and refines results using the Segment Anything Model (SAM). Although PUCD is an unsupervised change detection, it does not require complex hyperparameter tuning. We evaluate PUCD framework on the LEVIR-Extension dataset and the disaster dataset and it achieves state-of-the-art performance compared to other methods on the LEVIR-Extension dataset.

CVMay 4
Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM

Hyobin Park, Minseok Seo, Dong-Geol Choi

Vision foundation models have attracted significant attention for their ability to leverage large-scale unlabeled visual data. This advantage is particularly important in remote sensing, where data acquisition is costly and annotation often requires expert knowledge. Recent electro-optical vision foundation models aim to learn domain-specific representations from remote sensing imagery, but it remains unclear whether they are more effective than strong generalist vision foundation models under retrieval-based evaluation. In this study, we conduct a controlled comparison between representative EO-specific and generalist vision foundation models for remote sensing image retrieval. Using the same datasets, retrieval protocol, and evaluation metric, we evaluate both in-domain performance and cross-scene generalization. Our results show that strong generalist vision foundation models are competitive with, and in some cases outperform, existing EO-specific models. Moreover, EO-specific models often suffer from substantial degradation under cross-scene evaluation, while generalist models show more stable transfer. These findings suggest that EO pretraining alone does not guarantee stronger retrieval-oriented remote sensing representations. We discuss the limitations of current EO-specific pretraining strategies and highlight the need for future EO vision foundation models to better exploit the physical, spatial, spectral, and geographic characteristics of remote sensing imagery.

CVJan 28, 2024
Long-Term Typhoon Trajectory Prediction: A Physics-Conditioned Approach Without Reanalysis Data

Young-Jae Park, Minseok Seo, Doyi Kim et al.

In the face of escalating climate changes, typhoon intensities and their ensuing damage have surged. Accurate trajectory prediction is crucial for effective damage control. Traditional physics-based models, while comprehensive, are computationally intensive and rely heavily on the expertise of forecasters. Contemporary data-driven methods often rely on reanalysis data, which can be considered to be the closest to the true representation of weather conditions. However, reanalysis data is not produced in real-time and requires time for adjustment because prediction models are calibrated with observational data. This reanalysis data, such as ERA5, falls short in challenging real-world situations. Optimal preparedness necessitates predictions at least 72 hours in advance, beyond the capabilities of standard physics models. In response to these constraints, we present an approach that harnesses real-time Unified Model (UM) data, sidestepping the limitations of reanalysis data. Our model provides predictions at 6-hour intervals for up to 72 hours in advance and outperforms both state-of-the-art data-driven methods and numerical weather prediction models. In line with our efforts to mitigate adversities inflicted by \rthree{typhoons}, we release our preprocessed \textit{PHYSICS TRACK} dataset, which includes ERA5 reanalysis data, typhoon best-track, and UM forecast data.

CVNov 20, 2025
Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling

Minseok Seo, Mark Hamilton, Changick Kim

We present \textbf{Upsample Anything}, a lightweight test-time optimization (TTO) framework that restores low-resolution features to high-resolution, pixel-wise outputs without any training. Although Vision Foundation Models demonstrate strong generalization across diverse downstream tasks, their representations are typically downsampled by 14x/16x (e.g., ViT), which limits their direct use in pixel-level applications. Existing feature upsampling approaches depend on dataset-specific retraining or heavy implicit optimization, restricting scalability and generalization. Upsample Anything addresses these issues through a simple per-image optimization that learns an anisotropic Gaussian kernel combining spatial and range cues, effectively bridging Gaussian Splatting and Joint Bilateral Upsampling. The learned kernel acts as a universal, edge-aware operator that transfers seamlessly across architectures and modalities, enabling precise high-resolution reconstruction of features, depth, or probability maps. It runs in only $\approx0.419 \text{s}$ per 224x224 image and achieves state-of-the-art performance on semantic segmentation, depth estimation, and both depth and probability map upsampling.

LGAug 11, 2025
AIS-LLM: A Unified Framework for Maritime Trajectory Prediction, Anomaly Detection, and Collision Risk Assessment with Explainable Forecasting

Hyobin Park, Jinwook Jung, Minseok Seo et al.

With the increase in maritime traffic and the mandatory implementation of the Automatic Identification System (AIS), the importance and diversity of maritime traffic analysis tasks based on AIS data, such as vessel trajectory prediction, anomaly detection, and collision risk assessment, is rapidly growing. However, existing approaches tend to address these tasks individually, making it difficult to holistically consider complex maritime situations. To address this limitation, we propose a novel framework, AIS-LLM, which integrates time-series AIS data with a large language model (LLM). AIS-LLM consists of a Time-Series Encoder for processing AIS sequences, an LLM-based Prompt Encoder, a Cross-Modality Alignment Module for semantic alignment between time-series data and textual prompts, and an LLM-based Multi-Task Decoder. This architecture enables the simultaneous execution of three key tasks: trajectory prediction, anomaly detection, and risk assessment of vessel collisions within a single end-to-end system. Experimental results demonstrate that AIS-LLM outperforms existing methods across individual tasks, validating its effectiveness. Furthermore, by integratively analyzing task outputs to generate situation summaries and briefings, AIS-LLM presents the potential for more intelligent and efficient maritime traffic management.

CVJun 7, 2024
ACE Metric: Advection and Convection Evaluation for Accurate Weather Forecasting

Doyi Kim, Minseok Seo, Yeji Choi

Recently, data-driven weather forecasting methods have received significant attention for surpassing the RMSE performance of traditional NWP (Numerical Weather Prediction)-based methods. However, data-driven models are tuned to minimize the loss between forecasted data and ground truths, often using pixel-wise loss. This can lead to models that produce blurred outputs, which, despite being significantly different in detail from the actual weather conditions, still demonstrate low RMSE values. Although evaluation metrics from the computer vision field, such as PSNR, SSIM, and FVD, can be used, they are not entirely suitable for weather variables. This is because weather variables exhibit continuous physical changes over time and lack the distinct boundaries of objects typically seen in computer vision images. To resolve these issues, we propose the advection and convection Error (ACE) metric, specifically designed to assess how well models predict advection and convection, which are significant atmospheric transfer methods. We have validated the ACE evaluation metric on the WeatherBench2 and MovingMNIST datasets.

CVJan 19, 2022
PT4AL: Using Self-Supervised Pretext Tasks for Active Learning

John Seon Keun Yi, Minseok Seo, Jongchan Park et al.

Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext task, such as rotation prediction, is closely correlated to the downstream task loss. Before the active learning iterations, the pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and split into batches by their pretext task losses. In each active learning iteration, the main task model is used to sample the most uncertain data in a batch to be annotated. We evaluate our method on various image classification and segmentation benchmarks and achieve compelling performances on CIFAR10, Caltech-101, ImageNet, and Cityscapes. We further show that our method performs well on imbalanced datasets, and can be an effective solution to the cold-start problem where active learning performance is affected by the randomly sampled initial labeled set.

CVAug 10, 2021
Exploiting Features with Split-and-Share Module

Jaemin Lee, Minseok Seo, Jongchan Park et al.

Deep convolutional neural networks (CNNs) have shown state-of-the-art performances in various computer vision tasks. Advances on CNN architectures have focused mainly on designing convolutional blocks of the feature extractors, but less on the classifiers that exploit extracted features. In this work, we propose Split-and-Share Module (SSM),a classifier that splits a given feature into parts, which are partially shared by multiple sub-classifiers. Our intuition is that the more the features are shared, the more common they will become, and SSM can encourage such structural characteristics in the split features. SSM can be easily integrated into any architecture without bells and whistles. We have extensively validated the efficacy of SSM on ImageNet-1K classification task, andSSM has shown consistent and significant improvements over baseline architectures. In addition, we analyze the effect of SSM using the Grad-CAM visualization.

CVJun 21, 2020
Sequential Feature Filtering Classifier

Minseok Seo, Jaemin Lee, Jongchan Park et al.

We propose Sequential Feature Filtering Classifier (FFC), a simple but effective classifier for convolutional neural networks (CNNs). With sequential LayerNorm and ReLU, FFC zeroes out low-activation units and preserves high-activation units. The sequential feature filtering process generates multiple features, which are fed into a shared classifier for multiple outputs. FFC can be applied to any CNNs with a classifier, and significantly improves performances with negligible overhead. We extensively validate the efficacy of FFC on various tasks: ImageNet-1K classification, MS COCO detection, Cityscapes segmentation, and HMDB51 action recognition. Moreover, we empirically show that FFC can further improve performances upon other techniques, including attention modules and augmentation techniques. The code and models will be publicly available.