Younghun Kim

CV
h-index10
5papers
16citations
Novelty48%
AI Score44

5 Papers

CVMay 7
Detecting AI-Generated Videos with Spiking Neural Networks

Minsuk Jang, Yujin Yang, Heeseon Kim et al.

Modern AI-generated videos are photorealistic at the single-frame level, leaving inter-frame dynamics as the main remaining axis for detection. Existing detectors typically handle this temporal evidence in three ways: feeding the full frame sequence to a generic temporal backbone, reducing one dominant temporal cue to fixed video-level descriptors, or comparing temporal features to real-video statistics through a detection metric. These strategies degrade sharply under cross-generator evaluation, where artifact type and timescale vary across generators. On caption-paired benchmark, GenVidBench, we identify two signatures that prior detectors do not jointly exploit: AI-generated videos exhibit smoother frame-to-frame temporal residuals at the pixel level, and more compact trajectories in the semantic feature space, indicating a temporal smoothness gap at both levels. We further observe that, when raw video is fed into a Spiking Neural Networks (SNNs), fake clips elicit firing predominantly at object and motion boundaries, unlike real clips, suggesting that the SNN responds to temporal artifacts localized at edges. These cues are sparse, asynchronous, and concentrated at moments of change, which makes SNNs a natural choice for this task: their event-driven, sparsely-activated dynamics align with the structure of the residual signal in a way that dense ANN backbones do not. Building on this observation, we propose MAST, a detector that processes multi-channel temporal residuals with a spike-driven temporal branch alongside a frozen semantic encoder for cross-generator generalization. On the GenVideo benchmark, MAST achieves 93.14\% mean accuracy across 10 unseen generators under strict cross-generator evaluation, matching or surpassing the strongest ANN-based detectors and demonstrating the practical applicability of SNNs to AI-generated video detection.

CVMar 21, 2024
PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference

Dougho Park, Younghun Kim, Harim Kang et al.

Bolus segmentation is crucial for the automated detection of swallowing disorders in videofluoroscopic swallowing studies (VFSS). However, it is difficult for the model to accurately segment a bolus region in a VFSS image because VFSS images are translucent, have low contrast and unclear region boundaries, and lack color information. To overcome these challenges, we propose PECI-Net, a network architecture for VFSS image analysis that combines two novel techniques: the preprocessing ensemble network (PEN) and the cascaded inference network (CIN). PEN enhances the sharpness and contrast of the VFSS image by combining multiple preprocessing algorithms in a learnable way. CIN reduces ambiguity in bolus segmentation by using context from other regions through cascaded inference. Moreover, CIN prevents undesirable side effects from unreliably segmented regions by referring to the context in an asymmetric way. In experiments, PECI-Net exhibited higher performance than four recently developed baseline models, outperforming TernausNet, the best among the baseline models, by 4.54\% and the widely used UNet by 10.83\%. The results of the ablation studies confirm that CIN and PEN are effective in improving bolus segmentation performance.

CVJun 21, 2025
SELFI: Selective Fusion of Identity for Generalizable Deepfake Detection

Younghun Kim, Minsuk Jang, Myung-Joon Kwon et al.

Face identity provides a powerful signal for deepfake detection. Prior studies show that even when not explicitly modeled, classifiers often learn identity features implicitly. This has led to conflicting views: some suppress identity cues to reduce bias, while others rely on them as forensic evidence. To reconcile these views, we analyze two hypotheses: (1) whether face identity alone is discriminative for detecting deepfakes, and (2) whether such identity features generalize poorly across manipulation methods. Our experiments confirm that identity is informative but context-dependent. While some manipulations preserve identity-consistent artifacts, others distort identity cues and harm generalization. We argue that identity features should neither be blindly suppressed nor relied upon, but instead be explicitly modeled and adaptively controlled based on per-sample relevance. We propose \textbf{SELFI} (\textbf{SEL}ective \textbf{F}usion of \textbf{I}dentity), a generalizable detection framework that dynamically modulates identity usage. SELFI consists of: (1) a Forgery-Aware Identity Adapter (FAIA) that extracts identity embeddings from a frozen face recognition model and projects them into a forgery-relevant space via auxiliary supervision; and (2) an Identity-Aware Fusion Module (IAFM) that selectively integrates identity and visual features using a relevance-guided fusion mechanism. Experiments on four benchmarks show that SELFI improves cross-manipulation generalization, outperforming prior methods by an average of 3.1\% AUC. On the challenging DFDC dataset, SELFI exceeds the previous best by 6\%. Code will be released upon paper acceptance.

CLJun 9, 2025
Safety-Aligned Weights Are Not Enough: Refusal-Teacher-Guided Finetuning Enhances Safety and Downstream Performance under Harmful Finetuning Attacks

Seokil Ham, Yubin Choi, Yujin Yang et al.

Recently, major AI providers such as Google and OpenAI have introduced Finetuning-as-a-Service (FaaS), which allows users to customize Large Language Models (LLMs) using their own data. However, this service is vulnerable to safety degradation when user data includes harmful prompts, a threat known as harmful finetuning attacks. Prior works attempt to mitigate this issue by first constructing safety-aligned model and then finetuning the model on user data. However, we observe that the safety-aligned weights provide weak initialization for downstream task learning, leading to suboptimal safety-alignment and downstream task performance. To address this, we propose a Refusal-Teacher (Ref-Teacher)-guided finetuning framework. Instead of finetuning a safety-aligned model on user data, our approach directly finetunes the base model under the guidance of a safety-aligned Ref-Teacher, which filters harmful prompts from user data and distills safety-alignment knowledge into the base model. Extensive experiments demonstrate that our Ref-Teacher-guided finetuning strategy effectively minimizes harmful outputs and enhances finetuning accuracy for user-specific tasks, offering a practical solution for secure and reliable deployment of LLMs in FaaS.

MLJun 9, 2017
Time Series Using Exponential Smoothing Cells

Avner Abrami, Aleksandr Y. Aravkin, Younghun Kim

Time series analysis is used to understand and predict dynamic processes, including evolving demands in business, weather, markets, and biological rhythms. Exponential smoothing is used in all these domains to obtain simple interpretable models of time series and to forecast future values. Despite its popularity, exponential smoothing fails dramatically in the presence of outliers, large amounts of noise, or when the underlying time series changes. We propose a flexible model for time series analysis, using exponential smoothing cells for overlapping time windows. The approach can detect and remove outliers, denoise data, fill in missing observations, and provide meaningful forecasts in challenging situations. In contrast to classic exponential smoothing, which solves a nonconvex optimization problem over the smoothing parameters and initial state, the proposed approach requires solving a single structured convex optimization problem. Recent developments in efficient convex optimization of large-scale dynamic models make the approach tractable. We illustrate new capabilities using synthetic examples, and then use the approach to analyze and forecast noisy real-world time series. Code for the approach and experiments is publicly available.