Woojoo Kim

HC
h-index11
5papers
57citations
Novelty55%
AI Score48

5 Papers

61.6IRMar 18
VLM2Rec: Resolving Modality Collapse in Vision-Language Model Embedders for Multimodal Sequential Recommendation

Junyoung Kim, Woojoo Kim, Jaehyung Lim et al.

Sequential Recommendation (SR) in multimodal settings typically relies on small frozen pretrained encoders, which limits semantic capacity and prevents Collaborative Filtering (CF) signals from being fully integrated into item representations. Inspired by the recent success of Large Language Models (LLMs) as high-capacity embedders, we investigate the use of Vision-Language Models (VLMs) as CF-aware multimodal encoders for SR. However, we find that standard contrastive supervised fine-tuning (SFT), which adapts VLMs for embedding generation and injects CF signals, can amplify its inherent modality collapse. In this state, optimization is dominated by a single modality while the other degrades, ultimately undermining recommendation accuracy. To address this, we propose VLM2Rec, a VLM embedder-based framework for multimodal sequential recommendation designed to ensure balanced modality utilization. Specifically, we introduce Weak-modality Penalized Contrastive Learning to rectify gradient imbalance during optimization and Cross-Modal Relational Topology Regularization to preserve geometric consistency between modalities. Extensive experiments demonstrate that VLM2Rec consistently outperforms state-of-the-art baselines in both accuracy and robustness across diverse scenarios.

40.5IRApr 5Code
FLAME: Condensing Ensemble Diversity into a Single Network for Efficient Sequential Recommendation

WooJoo Kim, JunYoung Kim, JaeHyung Lim et al.

Sequential recommendation requires capturing diverse user behaviors, which a single network often fails to capture. While ensemble methods mitigate this by leveraging multiple networks, training them all from scratch leads to high computational cost and instability from noisy mutual supervision. We propose {\bf F}rozen and {\bf L}earnable networks with {\bf A}ligned {\bf M}odular {\bf E}nsemble ({\bf FLAME}), a novel framework that condenses ensemble-level diversity into a single network for efficient sequential recommendation. During training, FLAME simulates exponential diversity using only two networks via {\it modular ensemble}. By decomposing each network into sub-modules (e.g., layers or blocks) and dynamically combining them, FLAME generates a rich space of diverse representation patterns. To stabilize this process, we pretrain and freeze one network to serve as a semantic anchor and employ {\it guided mutual learning}. This aligns the diverse representations into the space of the remaining learnable network, ensuring robust optimization. Consequently, at inference, FLAME utilizes only the learnable network, achieving ensemble-level performance with zero overhead compared to a single network. Experiments on six datasets show that FLAME outperforms state-of-the-art baselines, achieving up to 7.69$\times$ faster convergence and 9.70\% improvement in NDCG@20. We provide the source code of FLAME at https://github.com/woo-joo/FLAME_SIGIR26.

LGAug 6, 2025
Federated Continual Recommendation

Jaehyung Lim, Wonbin Kweon, Woojoo Kim et al.

The increasing emphasis on privacy in recommendation systems has led to the adoption of Federated Learning (FL) as a privacy-preserving solution, enabling collaborative training without sharing user data. While Federated Recommendation (FedRec) effectively protects privacy, existing methods struggle with non-stationary data streams, failing to maintain consistent recommendation quality over time. On the other hand, Continual Learning Recommendation (CLRec) methods address evolving user preferences but typically assume centralized data access, making them incompatible with FL constraints. To bridge this gap, we introduce Federated Continual Recommendation (FCRec), a novel task that integrates FedRec and CLRec, requiring models to learn from streaming data while preserving privacy. As a solution, we propose F3CRec, a framework designed to balance knowledge retention and adaptation under the strict constraints of FCRec. F3CRec introduces two key components: Adaptive Replay Memory on the client side, which selectively retains past preferences based on user-specific shifts, and Item-wise Temporal Mean on the server side, which integrates new knowledge while preserving prior information. Extensive experiments demonstrate that F3CRec outperforms existing approaches in maintaining recommendation quality over time in a federated environment.

HCDec 21, 2021
Pseudo-Haptic Button for Improving User Experience of Mid-Air Interaction in VR

Woojoo Kim, Shuping Xiong

Mid-air interaction is one of the promising interaction modalities in virtual reality (VR) due to its merits in naturalness and intuitiveness, but the interaction suffers from the lack of haptic feedback as no force or vibrotactile feedback can be provided in mid-air. As a breakthrough to compensate for this insufficiency, the application of pseudo-haptic features which create the visuo-haptic illusion without actual physical haptic stimulus can be explored. Therefore, this study aimed to investigate the effect of four pseudo-haptic features: proximity feedback, protrusion, hit effect, and penetration blocking on user experience for free-hand mid-air button interaction in VR. We conducted a user study on 21 young subjects to collect user ratings on various aspects of user experience while users were freely interacting with 16 buttons with different combinations of four features. Results indicated that all investigated features significantly improved user experience in terms of haptic illusion, embodiment, sense of reality, spatiotemporal perception, satisfaction, and hedonic quality. In addition, protrusion and hit effect were more beneficial in comparison with the other two features. It is recommended to utilize the four proposed pseudo-haptic features in 3D user interfaces (UIs) to make users feel more pleased and amused, but caution is needed when using proximity feedback together with other features. The findings of this study could be helpful for VR developers and UI designers in providing better interactive buttons in the 3D interfaces.

HCOct 6, 2021
ViewfinderVR: Configurable Viewfinder for Selection of Distant Objects in VR

Woojoo Kim, Shuping Xiong

Selection is one of the fundamental user interactions in virtual reality (VR) and 3D user interaction, and raycasting has been one of the most popular object selection techniques in VR. However, the selection of small or distant objects through raycasting has been known to be difficult. To overcome this limitation, this study proposed a new technique called ViewfinderVR for improved selection of distant objects in VR, utilizing a virtual viewfinder panel with a modern adaptation of the through-the-lens metaphor. ViewfinderVR enables faster and more accurate target selection by allowing customization of the interaction space projected onto a virtual panel within reach, and users can select objects reflected on the panel with either ray-based or touch interaction. Experimental results of Fitts' law-based tests with 20 participants showed that ViewfinderVR outperformed traditional raycasting in terms of task performance (movement time, error rate, and throughput) and perceived workload (NASA-TLX ratings), where touch interaction was superior to ray-based interaction. The associated user behavior was also recorded and analyzed to understand the underlying reasons for the improved task performance and reduced workload. The proposed technique can be used in VR applications to enhance the selection of distant objects.