77.5IRApr 20
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage RetrievalJunyoung Kim, Anton Korikov, Jiazhou Liang et al.
While Large Language Models (LLMs) exhibit exceptional zero-shot relevance modeling, their high computational cost necessitates framing passage retrieval as a budget-constrained global optimization problem. Existing approaches passively rely on first-stage dense retrievers, which leads to two limitations: (1) failing to retrieve relevant passages in semantically distinct clusters, and (2) failing to propagate relevance signals to the broader corpus. To address these limitations, we propose Bayesian Active Learning with Gaussian Processes guided by LLM relevance scoring (BAGEL), a novel framework that propagates sparse LLM relevance signals across the embedding space to guide global exploration. BAGEL models the multimodal relevance distribution across the entire embedding space with a query-specific Gaussian Process (GP) based on LLM relevance scores. Subsequently, it iteratively selects passages for scoring by strategically balancing the exploitation of high-confidence regions with the exploration of uncertain areas. Extensive experiments across four benchmark datasets and two LLM backbones demonstrate that BAGEL effectively explores and captures complex relevance distributions and outperforms LLM reranking methods under the same LLM budget on all four datasets.
61.6IRMar 18
VLM2Rec: Resolving Modality Collapse in Vision-Language Model Embedders for Multimodal Sequential RecommendationJunyoung Kim, Woojoo Kim, Jaehyung Lim et al.
Sequential Recommendation (SR) in multimodal settings typically relies on small frozen pretrained encoders, which limits semantic capacity and prevents Collaborative Filtering (CF) signals from being fully integrated into item representations. Inspired by the recent success of Large Language Models (LLMs) as high-capacity embedders, we investigate the use of Vision-Language Models (VLMs) as CF-aware multimodal encoders for SR. However, we find that standard contrastive supervised fine-tuning (SFT), which adapts VLMs for embedding generation and injects CF signals, can amplify its inherent modality collapse. In this state, optimization is dominated by a single modality while the other degrades, ultimately undermining recommendation accuracy. To address this, we propose VLM2Rec, a VLM embedder-based framework for multimodal sequential recommendation designed to ensure balanced modality utilization. Specifically, we introduce Weak-modality Penalized Contrastive Learning to rectify gradient imbalance during optimization and Cross-Modal Relational Topology Regularization to preserve geometric consistency between modalities. Extensive experiments demonstrate that VLM2Rec consistently outperforms state-of-the-art baselines in both accuracy and robustness across diverse scenarios.
40.5IRApr 5Code
FLAME: Condensing Ensemble Diversity into a Single Network for Efficient Sequential RecommendationWooJoo Kim, JunYoung Kim, JaeHyung Lim et al.
Sequential recommendation requires capturing diverse user behaviors, which a single network often fails to capture. While ensemble methods mitigate this by leveraging multiple networks, training them all from scratch leads to high computational cost and instability from noisy mutual supervision. We propose {\bf F}rozen and {\bf L}earnable networks with {\bf A}ligned {\bf M}odular {\bf E}nsemble ({\bf FLAME}), a novel framework that condenses ensemble-level diversity into a single network for efficient sequential recommendation. During training, FLAME simulates exponential diversity using only two networks via {\it modular ensemble}. By decomposing each network into sub-modules (e.g., layers or blocks) and dynamically combining them, FLAME generates a rich space of diverse representation patterns. To stabilize this process, we pretrain and freeze one network to serve as a semantic anchor and employ {\it guided mutual learning}. This aligns the diverse representations into the space of the remaining learnable network, ensuring robust optimization. Consequently, at inference, FLAME utilizes only the learnable network, achieving ensemble-level performance with zero overhead compared to a single network. Experiments on six datasets show that FLAME outperforms state-of-the-art baselines, achieving up to 7.69$\times$ faster convergence and 9.70\% improvement in NDCG@20. We provide the source code of FLAME at https://github.com/woo-joo/FLAME_SIGIR26.
SDJun 11, 2025Code
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio GenerationTaesoo Park, Mungwi Jeong, Mingyu Park et al.
This paper presents a tutorial-style survey and implementation guide of BemaGANv2, an advanced GAN-based vocoder designed for high-fidelity and long-term audio generation. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which internally applies the Snake activation function to better model periodic structures. In the discriminator framework, we integrate the Multi-Envelope Discriminator (MED), a novel architecture we originally proposed, to extract rich temporal envelope features crucial for periodicity detection. Coupled with the Multi-Resolution Discriminator (MRD), this combination enables more accurate modeling of long-range dependencies in audio. We systematically evaluate various discriminator configurations, including MSD + MED, MSD + MRD, and MPD + MED + MRD, using objective metrics (FAD, SSIM, PLCC, MCD) and subjective evaluations (MOS, SMOS). This paper also provides a comprehensive tutorial on the model architecture, training methodology, and implementation to promote reproducibility. The code and pre-trained models are available at: https://github.com/dinhoitt/BemaGANv2.
PLMay 12, 2023Code
AMULET: Adaptive Matrix-Multiplication-Like TasksJunyoung Kim, Kenneth Ross, Eric Sedlar et al.
Many useful tasks in data science and machine learning applications can be written as simple variations of matrix multiplication. However, users have difficulty performing such tasks as existing matrix/vector libraries support only a limited class of computations hand-tuned for each unique hardware platform. Users can alternatively write the task as a simple nested loop but current compilers are not sophisticated enough to generate fast code for the task written in this way. To address these issues, we extend an open-source compiler to recognize and optimize these matrix multiplication-like tasks. Our framework, called Amulet, uses both database-style and compiler optimization techniques to generate fast code tailored to its execution environment. We show through experiments that Amulet achieves speedups on a variety of matrix multiplication-like tasks compared to existing compilers. For large matrices Amulet typically performs within 15% of hand-tuned matrix multiplication libraries, while handling a much broader class of computations.
CLAug 28, 2024
ConCSE: Unified Contrastive Learning and Augmentation for Code-Switched EmbeddingsJangyeong Jeon, Sangyeon Cho, Minuk Ma et al.
This paper examines the Code-Switching (CS) phenomenon where two languages intertwine within a single utterance. There exists a noticeable need for research on the CS between English and Korean. We highlight that the current Equivalence Constraint (EC) theory for CS in other languages may only partially capture English-Korean CS complexities due to the intrinsic grammatical differences between the languages. We introduce a novel Koglish dataset tailored for English-Korean CS scenarios to mitigate such challenges. First, we constructed the Koglish-GLUE dataset to demonstrate the importance and need for CS datasets in various tasks. We found the differential outcomes of various foundation multilingual language models when trained on a monolingual versus a CS dataset. Motivated by this, we hypothesized that SimCSE, which has shown strengths in monolingual sentence embedding, would have limitations in CS scenarios. We construct a novel Koglish-NLI (Natural Language Inference) dataset using a CS augmentation-based approach to verify this. From this CS-augmented dataset Koglish-NLI, we propose a unified contrastive learning and augmentation method for code-switched embeddings, ConCSE, highlighting the semantics of CS sentences. Experimental results validate the proposed ConCSE with an average performance enhancement of 1.77\% on the Koglish-STS(Semantic Textual Similarity) tasks.
ROMar 21, 2024
Evidential Semantic Mapping in Off-road Environments with Uncertainty-aware Bayesian Kernel InferenceJunyoung Kim, Junwon Seo, Jihong Min
Robotic mapping with Bayesian Kernel Inference (BKI) has shown promise in creating semantic maps by effectively leveraging local spatial information. However, existing semantic mapping methods face challenges in constructing reliable maps in unstructured outdoor scenarios due to unreliable semantic predictions. To address this issue, we propose an evidential semantic mapping, which can enhance reliability in perceptually challenging off-road environments. We integrate Evidential Deep Learning into the semantic segmentation network to obtain the uncertainty estimate of semantic prediction. Subsequently, this semantic uncertainty is incorporated into an uncertainty-aware BKI, tailored to prioritize more confident semantic predictions when accumulating semantic information. By adaptively handling semantic uncertainties, the proposed framework constructs robust representations of the surroundings even in previously unseen environments. Comprehensive experiments across various off-road datasets demonstrate that our framework enhances accuracy and robustness, consistently outperforming existing methods in scenes with high perceptual uncertainties.
ROMay 10, 2024
Uncertainty-aware Semantic Mapping in Off-road Environments with Dempster-Shafer Theory of EvidenceJunyoung Kim, Junwon Seo
Semantic mapping with Bayesian Kernel Inference (BKI) has shown promise in providing a richer understanding of environments by effectively leveraging local spatial information. However, existing methods face challenges in constructing accurate semantic maps or reliable uncertainty maps in perceptually challenging environments due to unreliable semantic predictions. To address this issue, we propose an evidential semantic mapping framework, which integrates the evidential reasoning of Dempster-Shafer Theory of Evidence (DST) into the entire mapping pipeline by adopting Evidential Deep Learning (EDL) and Dempster's rule of combination. Additionally, the extended belief is devised to incorporate local spatial information based on their uncertainty during the mapping process. Comprehensive experiments across various off-road datasets demonstrate that our framework enhances the reliability of uncertainty maps, consistently outperforming existing methods in scenes with high perceptual uncertainties while showing semantic accuracy comparable to the best-performing semantic mapping techniques.
CLOct 3, 2025
Transparent Reference-free Automated Evaluation of Open-Ended User Survey ResponsesSubin An, Yugyeong Ji, Junyoung Kim et al.
Open-ended survey responses provide valuable insights in marketing research, but low-quality responses not only burden researchers with manual filtering but also risk leading to misleading conclusions, underscoring the need for effective evaluation. Existing automatic evaluation methods target LLM-generated text and inadequately assess human-written responses with their distinct characteristics. To address such characteristics, we propose a two-stage evaluation framework specifically designed for human survey responses. First, gibberish filtering removes nonsensical responses. Then, three dimensions-effort, relevance, and completeness-are evaluated using LLM capabilities, grounded in empirical analysis of real-world survey data. Validation on English and Korean datasets shows that our framework not only outperforms existing metrics but also demonstrates high practical applicability for real-world applications such as response quality prediction and response rejection, showing strong correlations with expert assessment.
LGAug 6, 2025
Federated Continual RecommendationJaehyung Lim, Wonbin Kweon, Woojoo Kim et al.
The increasing emphasis on privacy in recommendation systems has led to the adoption of Federated Learning (FL) as a privacy-preserving solution, enabling collaborative training without sharing user data. While Federated Recommendation (FedRec) effectively protects privacy, existing methods struggle with non-stationary data streams, failing to maintain consistent recommendation quality over time. On the other hand, Continual Learning Recommendation (CLRec) methods address evolving user preferences but typically assume centralized data access, making them incompatible with FL constraints. To bridge this gap, we introduce Federated Continual Recommendation (FCRec), a novel task that integrates FedRec and CLRec, requiring models to learn from streaming data while preserving privacy. As a solution, we propose F3CRec, a framework designed to balance knowledge retention and adaptation under the strict constraints of FCRec. F3CRec introduces two key components: Adaptive Replay Memory on the client side, which selectively retains past preferences based on user-specific shifts, and Item-wise Temporal Mean on the server side, which integrates new knowledge while preserving prior information. Extensive experiments demonstrate that F3CRec outperforms existing approaches in maintaining recommendation quality over time in a federated environment.
CVApr 9, 2025
Crafting Query-Aware Selective Attention for Single Image Super-ResolutionJunyoung Kim, Youngrok Kim, Siyeol Jung et al.
Single Image Super-Resolution (SISR) reconstructs high-resolution images from low-resolution inputs, enhancing image details. While Vision Transformer (ViT)-based models improve SISR by capturing long-range dependencies, they suffer from quadratic computational costs or employ selective attention mechanisms that do not explicitly focus on query-relevant regions. Despite these advancements, prior work has overlooked how selective attention mechanisms should be effectively designed for SISR. We propose SSCAN, which dynamically selects the most relevant key-value windows based on query similarity, ensuring focused feature extraction while maintaining efficiency. In contrast to prior approaches that apply attention globally or heuristically, our method introduces a query-aware window selection strategy that better aligns attention computation with important image regions. By incorporating fixed-sized windows, SSCAN reduces memory usage and enforces linear token-to-token complexity, making it scalable for large images. Our experiments demonstrate that SSCAN outperforms existing attention-based SISR methods, achieving up to 0.14 dB PSNR improvement on urban datasets, guaranteeing both computational efficiency and reconstruction quality in SISR.
CVMar 11, 2025
A Data Aggregation Visualization System supported by Processing-in-MemoryJunyoung Kim, Madhulika Balakumar, Kenneth Ross
Data visualization of aggregation queries is one of the most common ways of doing data exploration and data science as it can help identify correlations and patterns in the data. We propose DIVAN, a system that automatically normalizes the one-dimensional axes by frequency to generate large numbers of two-dimensional visualizations. DIVAN normalizes the input data via binning to allocate more pixels to data values that appear more frequently in the dataset. DIVAN can utilize either CPUs or Processing-in-Memory (PIM) architectures to quickly calculate aggregates to support the visualizations. On real world datasets, we show that DIVAN generates visualizations that highlight patterns and correlations, some expected and some unexpected. By using PIM, we can calculate aggregates 45%-64% faster than modern CPUs on large datasets. For use cases with 100 million rows and 32 columns, our system is able to compute 4,960 aggregates (each of size 128x128x128) in about a minute.
LGNov 22, 2024
FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer AccelerationDonghyeon Yi, Seoyoung Lee, Jongho Kim et al.
Encoder-based transformers, powered by self-attention layers, have revolutionized machine learning with their context-aware representations. However, their quadratic growth in computational and memory demands presents significant bottlenecks. Analog-Mixed-Signal Process-in-Memory (AMS-PiM) architectures address these challenges by enabling efficient on-chip processing. Traditionally, AMS-PiM relies on Quantization-Aware Training (QAT), which is hardware-efficient but requires extensive retraining to adapt models to AMS-PiMs, making it increasingly impractical for transformer models. Post-Training Quantization (PTQ) mitigates this training overhead but introduces significant hardware inefficiencies. PTQ relies on dequantization-quantization (DQ-Q) processes, floating-point units (FPUs), and high-ENOB (Effective Number of Bits) analog-to-digital converters (ADCs). Particularly, High-ENOB ADCs scale exponentially in area and energy ($2^{ENOB}$), reduce sensing margins, and increase susceptibility to process, voltage, and temperature (PVT) variations, further compounding PTQ's challenges in AMS-PiM systems. To overcome these limitations, we propose RAP, an AMS-PiM architecture that eliminates DQ-Q processes, introduces FPU- and division-free nonlinear processing, and employs a low-ENOB-ADC-based sparse Matrix Vector multiplication technique. Using the proposed techniques, RAP improves error resiliency, area/energy efficiency, and computational speed while preserving numerical stability. Experimental results demonstrate that RAP outperforms state-of-the-art GPUs and conventional PiM architectures in energy efficiency, latency, and accuracy, making it a scalable solution for the efficient deployment of transformers.
IVJun 17, 2019
Differentiated Backprojection Domain Deep Learning for Conebeam Artifact RemovalYoseob Han, Junyoung Kim, Jong Chul Ye
Conebeam CT using a circular trajectory is quite often used for various applications due to its relative simple geometry. For conebeam geometry, Feldkamp, Davis and Kress algorithm is regarded as the standard reconstruction method, but this algorithm suffers from so-called conebeam artifacts as the cone angle increases. Various model-based iterative reconstruction methods have been developed to reduce the cone-beam artifacts, but these algorithms usually require multiple applications of computational expensive forward and backprojections. In this paper, we develop a novel deep learning approach for accurate conebeam artifact removal. In particular, our deep network, designed on the differentiated backprojection domain, performs a data-driven inversion of an ill-posed deconvolution problem associated with the Hilbert transform. The reconstruction results along the coronal and sagittal directions are then combined using a spectral blending technique to minimize the spectral leakage. Experimental results show that our method outperforms the existing iterative methods despite significantly reduced runtime complexity.
CVJan 28, 2019
CollaGAN : Collaborative GAN for Missing Image Data ImputationDongwook Lee, Junyoung Kim, Won-Jin Moon et al.
In many applications requiring multiple inputs to obtain a desired output, if any of the input data is missing, it often introduces large amounts of bias. Although many techniques have been developed for imputing missing data, the image imputation is still difficult due to complicated nature of natural images. To address this problem, here we proposed a novel framework for missing image data imputation, called Collaborative Generative Adversarial Network (CollaGAN). CollaGAN converts an image imputation problem to a multi-domain images-to-image translation task so that a single generator and discriminator network can successfully estimate the missing data using the remaining clean data set. We demonstrate that CollaGAN produces the images with a higher visual quality compared to the existing competing approaches in various image imputation tasks.