ROFeb 24, 2025Code
Tidiness Score-Guided Monte Carlo Tree Search for Visual Tabletop RearrangementHogun Kee, Wooseok Oh, Minjae Kang et al.
In this paper, we present the tidiness score-guided Monte Carlo tree search (TSMCTS), a novel framework designed to address the tabletop tidying up problem using only an RGB-D camera. We address two major problems for tabletop tidying up problem: (1) the lack of public datasets and benchmarks, and (2) the difficulty of specifying the goal configuration of unseen objects. We address the former by presenting the tabletop tidying up (TTU) dataset, a structured dataset collected in simulation. Using this dataset, we train a vision-based discriminator capable of predicting the tidiness score. This discriminator can consistently evaluate the degree of tidiness across unseen configurations, including real-world scenes. Addressing the second problem, we employ Monte Carlo tree search (MCTS) to find tidying trajectories without specifying explicit goals. Instead of providing specific goals, we demonstrate that our MCTS-based planner can find diverse tidied configurations using the tidiness score as a guidance. Consequently, we propose TSMCTS, which integrates a tidiness discriminator with an MCTS-based tidying planner to find optimal tidied arrangements. TSMCTS has successfully demonstrated its capability across various environments, including coffee tables, dining tables, office desks, and bathrooms. The TTU dataset is available at: https://github.com/rllab-snu/TTU-Dataset.
RODec 6, 2023
Diffused Task-Agnostic Milestone PlannerMineui Hong, Minjae Kang, Songhwai Oh
Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.
CLJun 13, 2025
Personalized LLM Decoding via Contrasting Personal PreferenceHyungjune Bu, Chanjoo Jung, Minjae Kang et al.
As large language models (LLMs) are progressively deployed in various real-world applications, personalization of LLMs has become increasingly important. While various approaches to LLM personalization such as prompt-based and training-based methods have been actively explored, the development of effective decoding-time algorithms remains largely overlooked, despite their demonstrated potential. In this paper, we propose CoPe (Contrasting Personal Preference), a novel decoding-time approach applied after performing parameter-efficient fine-tuning (PEFT) on user-specific data. Our core idea is to leverage reward-guided decoding specifically for personalization by maximizing each user's implicit reward signal. We evaluate CoPe across five open-ended personalized text generation tasks. Our empirical results demonstrate that CoPe achieves strong performance, improving personalization by an average of 10.57% in ROUGE-L, without relying on external reward models or additional training procedures.
LGAug 25, 2025
Riemannian Optimization for LoRA on the Stiefel ManifoldJuneyoung Park, Minjae Kang, Seongbae Lee et al.
While powerful, large language models (LLMs) present significant fine-tuning challenges due to their size. Parameter-efficient fine-tuning (PEFT) methods like LoRA provide solutions, yet suffer from critical optimizer inefficiencies; notably basis redundancy in LoRA's $B$ matrix when using AdamW, which fundamentally limits performance. We address this by optimizing the $B$ matrix on the Stiefel manifold, imposing explicit orthogonality constraints that achieve near-perfect orthogonality and full effective rank. This geometric approach dramatically enhances parameter efficiency and representational capacity. Our Stiefel optimizer consistently outperforms AdamW across benchmarks with both LoRA and DoRA, demonstrating that geometric constraints are the key to unlocking LoRA's full potential for effective LLM fine-tuning.
CVApr 25, 2025
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual SeparatorMinjae Kang, Martim Brandão
Recent audio-visual generative models have made substantial progress in generating images from audio. However, existing approaches focus on generating images from single-class audio and fail to generate images from mixed audio. To address this, we propose an Audio-Visual Generation and Separation model (AV-GAS) for generating images from soundscapes (mixed audio containing multiple classes). Our contribution is threefold: First, we propose a new challenge in the audio-visual generation task, which is to generate an image given a multi-class audio input, and we propose a method that solves this task using an audio-visual separator. Second, we introduce a new audio-visual separation task, which involves generating separate images for each class present in a mixed audio input. Lastly, we propose new evaluation metrics for the audio-visual generation task: Class Representation Score (CRS) and a modified R@K. Our model is trained and evaluated on the VGGSound dataset. We show that our method outperforms the state-of-the-art, achieving 7% higher CRS and 4% higher R@2* in generating plausible images with mixed audio.
LGNov 1, 2024
Fast Adaptation with Kernel and Gradient based Meta LeaningJuneYoung Park, MinJae Kang
Model Agnostic Meta Learning or MAML has become the standard for few-shot learning as a meta-learning problem. MAML is simple and can be applied to any model, as its name suggests. However, it often suffers from instability and computational inefficiency during both training and inference times. In this paper, we propose two algorithms to improve both the inner and outer loops of MAML, then pose an important question about what 'meta' learning truly is. Our first algorithm redefines the optimization problem in the function space to update the model using closed-form solutions instead of optimizing parameters through multiple gradient steps in the inner loop. In the outer loop, the second algorithm adjusts the learning of the meta-learner by assigning weights to the losses from each task of the inner loop. This method optimizes convergence during both the training and inference stages of MAML. In conclusion, our algorithms offer a new perspective on meta-learning and make significant discoveries in both theory and experiments. This research suggests a more efficient approach to few-shot learning and fast task adaptation compared to existing methods. Furthermore, it lays the foundation for establishing a new paradigm in meta-learning.