CLMay 27
GeneralThinker: Domain-General Reasoning through Likelihood-Guided Answer-Conditioned OptimizationShengmin Piao, Sanghyun Park
Reinforcement learning with verifiable rewards improves language model reasoning, but its reliance on domain-specific verifiers, sparse outcome rewards, and coarse-grained credit assignment limits its applicability. We introduce GeneralThinker, an on-policy framework that reformulates reasoning supervision as dense answer-conditioned optimization, enabling response-level evaluation and token-level credit assignment without domain-specific verifiers. GeneralThinker evaluates generated reasoning trajectories using the likelihood of the ground-truth answer and derives token-wise compatibility signals for fine-grained credit assignment. To stabilize optimization, it constrains token-level updates through clipping and direction-preserving modulation. Across 11 benchmarks spanning mathematics, STEM, and general reasoning, GeneralThinker achieves the best average performance. Further analyses show that uncontrolled token-level modulation can destabilize training, whereas controlled modulation makes fine-grained credit assignment consistently effective.
CVSep 24, 2024Code
VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly DetectionSunghyun Ahn, Youngwan Jo, Kijung Lee et al.
Video anomaly detection (VAD) is a crucial task in video analysis and surveillance within computer vision. Currently, VAD is gaining attention with memory techniques that store the features of normal frames. The stored features are utilized for frame reconstruction, identifying an abnormality when a significant difference exists between the reconstructed and input frames. However, this approach faces several challenges due to the simultaneous optimization required for both the memory and encoder-decoder model. These challenges include increased optimization difficulty, complexity of implementation, and performance variability depending on the memory size. To address these challenges,we propose an effective memory method for VAD, called VideoPatchCore. Inspired by PatchCore, our approach introduces a structure that prioritizes memory optimization and configures three types of memory tailored to the characteristics of video data. This method effectively addresses the limitations of existing memory-based methods, achieving good performance comparable to state-of-the-art methods. Furthermore, our method requires no training and is straightforward to implement, making VAD tasks more accessible. Our code is available online at github.com/SkiddieAhn/Paper-VideoPatchCore.
CVMar 6, 2025Code
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLMSunghyun Ahn, Youngwan Jo, Kijung Lee et al.
Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. However, existing VAD models rely on learned normal patterns, which makes them difficult to apply to diverse environments. Consequently, users should retrain models or develop separate AI models for new environments, which requires expertise in machine learning, high-performance hardware, and extensive data collection, limiting the practical usability of VAD. To address these challenges, this study proposes customizable video anomaly detection (C-VAD) technique and the AnyAnomaly model. C-VAD considers user-defined text as an abnormal event and detects frames containing a specified event in a video. We effectively implemented AnyAnomaly using a context-aware visual question answering without fine-tuning the large vision language model. To validate the effectiveness of the proposed model, we constructed C-VAD datasets and demonstrated the superiority of AnyAnomaly. Furthermore, our approach showed competitive results on VAD benchmarks, achieving state-of-the-art performance on UBnormal and UCF-Crime and surpassing other methods in generalization across all datasets. Our code is available online at github.com/SkiddieAhn/Paper-AnyAnomaly.
LGJul 10, 2024
SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity PredictionSeungyeon Choi, Sangmin Seo, Sanghyun Park
Accurate prediction of protein-ligand binding affinity is crucial for rapid and efficient drug development. Recently, the importance of predicting binding affinity has led to increased attention on research that models the three-dimensional structure of protein-ligand complexes using graph neural networks to predict binding affinity. However, traditional methods often fail to accurately model the complex's spatial information or rely solely on geometric features, neglecting the principles of protein-ligand binding. This can lead to overfitting, resulting in models that perform poorly on independent datasets and ultimately reducing their usefulness in real drug development. To address this issue, we propose SPIN, a model designed to achieve superior generalization by incorporating various inductive biases applicable to this task, beyond merely training on empirical data from datasets. For prediction, we defined two types of inductive biases: a geometric perspective that maintains consistent binding affinity predictions regardless of the complexs rotations and translations, and a physicochemical perspective that necessitates minimal binding free energy along their reaction coordinate for effective protein-ligand binding. These prior knowledge inputs enable the SPIN to outperform comparative models in benchmark sets such as CASF-2016 and CSAR HiQ. Furthermore, we demonstrated the practicality of our model through virtual screening experiments and validated the reliability and potential of our proposed model based on experiments assessing its interpretability.
CVDec 26, 2024Code
Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge ComputingInpyo Hong, Youngwan Jo, Hyojeong Lee et al.
We introduce AKT (Advanced Knowledge Transfer), a novel method to enhance the training ability of low-bit quantized (Q) models in the field of zero-shot quantization (ZSQ). Existing research in ZSQ has focused on generating high-quality data from full-precision (FP) models. However, these approaches struggle with reduced learning ability in low-bit quantization due to its limited information capacity. To overcome this limitation, we propose effective training strategy compared to data generation. Particularly, we analyzed that refining feature maps in the feature distillation process is an effective way to transfer knowledge to the Q model. Based on this analysis, AKT efficiently transfer core information from the FP model to the Q model. AKT is the first approach to utilize both spatial and channel attention information in feature distillation in ZSQ. Our method addresses the fundamental gradient exploding problem in low-bit Q models. Experiments on CIFAR-10 and CIFAR-100 datasets demonstrated the effectiveness of the AKT. Our method led to significant performance enhancement in existing generative models. Notably, AKT achieved significant accuracy improvements in low-bit Q models, achieving state-of-the-art in the 3,5bit scenarios on CIFAR-10. The code is available at https://github.com/Inpyo-Hong/AKT-Advanced-knowledge-Transfer.
CLDec 11, 2024Code
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-ReflectionShengmin Piao, Sanghyun Park
Large Language Models exhibit impressive reasoning capabilities across diverse tasks, motivating efforts to distill these capabilities into smaller models through generated reasoning data. However, direct training on such synthesized reasoning data may lead to superficial imitation of reasoning process, rather than fostering a genuine integration of reasoning capabilities with underlying knowledge. To address this, we propose TinyThinker, a framework introducing two novel approaches. First, we introduce a three-stage process that incrementally guides the student model through the reasoning process, progressively refining knowledge from coarse to fine granularity. Second, we develop a two-phase training framework comprising an initial reasoning acquisition phase followed by a self-reflection phase utilizing self-generated data. Experiments on commonsense reasoning benchmarks demonstrate that TinyThinker achieves superior performance compared to baselines. Ablation studies further validate the effectiveness of each component in our framework. We expect that TinyThinker can be extended to other knowledge-intensive reasoning tasks, offering an alternative strategy for developing effective reasoning capabilities in smaller language models. Codes are available at https://github.com/shengminp/TinyThinker
CVDec 18, 2024Code
Mitigating Adversarial Attacks in LLMs through Defensive Suffix GenerationMinkyoung Kim, Yunha Kim, Hyeram Seo et al.
Large language models (LLMs) have exhibited outstanding performance in natural language processing tasks. However, these models remain susceptible to adversarial attacks in which slight input perturbations can lead to harmful or misleading outputs. A gradient-based defensive suffix generation algorithm is designed to bolster the robustness of LLMs. By appending carefully optimized defensive suffixes to input prompts, the algorithm mitigates adversarial influences while preserving the models' utility. To enhance adversarial understanding, a novel total loss function ($L_{\text{total}}$) combining defensive loss ($L_{\text{def}}$) and adversarial loss ($L_{\text{adv}}$) generates defensive suffixes more effectively. Experimental evaluations conducted on open-source LLMs such as Gemma-7B, mistral-7B, Llama2-7B, and Llama2-13B show that the proposed method reduces attack success rates (ASR) by an average of 11\% compared to models without defensive suffixes. Additionally, the perplexity score of Gemma-7B decreased from 6.57 to 3.93 when applying the defensive suffix generated by openELM-270M. Furthermore, TruthfulQA evaluations demonstrate consistent improvements with Truthfulness scores increasing by up to 10\% across tested configurations. This approach significantly enhances the security of LLMs in critical applications without requiring extensive retraining.
ROMar 19
Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging ConditionsEunseon Choi, Junwoo Hong, Daehan Lee et al.
Accurate localization in autonomous driving is critical for successful missions including environmental mapping and survivor searches. In visually challenging environments, including low-light conditions, overexposure, illumination changes, and high parallax, the performance of conventional visual odometry methods significantly degrade undermining robust robotic navigation. Researchers have recently proposed LiDAR-inertial-visual odometry (LIVO) frameworks, that integrate LiDAR, IMU, and camera sensors, to address these challenges. This paper extends the FAST-LIVO2-based framework by introducing a hybrid approach that integrates direct photometric methods with descriptor-based feature matching. For the descriptor-based feature matching, this work proposes pairs of ORB with the Hamming distance, SuperPoint with SuperGlue, SuperPoint with LightGlue, and XFeat with the mutual nearest neighbor. The proposed configurations are benchmarked by accuracy, computational cost, and feature tracking stability, enabling a quantitative comparison of the adaptability and applicability of visual descriptors. The experimental results reveal that the proposed hybrid approach outperforms the conventional sparse-direct method. Although the sparse-direct method often fails to converge in regions where photometric inconsistency arises due to illumination changes, the proposed approach still maintains robust performance under the same conditions. Furthermore, the hybrid approach with learning-based descriptors enables robust and reliable visual state estimation across challenging environments.
COMay 6
Hamilton decompositions of all directed tori at odd modulusSangHyun Park
Let $D_d(m) = \operatorname{Cay}((\mathbb{Z}/m\mathbb{Z})^d, \{e_0, \ldots, e_{d-1}\})$ be the directed Cartesian product of $d$ directed $m$-cycles. We prove that $D_d(m)$ admits a directed Hamilton decomposition for every dimension $d \geq 2$ and every odd modulus $m \geq 3$. The proof combines two new closure mechanisms with a small set of base dimensions. The high-modulus count branch handles every odd $d \geq 5$ and every odd $m \geq d$ via triangular prefix coordinates and a primitivity criterion controlled by gcd conditions on symbol counts. The base-tail modular-trade branch handles the complementary range $m < d$ by decomposing a base multigraph into cylinders and scheduling active tail residues by local symbol trades; it yields the successor closure $b \mapsto 2b+1$ for $b \geq 5$. Together with multiplicative product closure, these reduce the all-dimensions theorem to the four base dimensions $d \in \{2, 3, 5, 7\}$. Dimensions $2$ and $3$ are proved here; dimensions $5$ and $7$ are imported from companion arXiv preprints. A Lean 4 formalization records the same all-dimensions endpoint. As an independent consequence, the dimensions $2$ and $3$ alone solve every odd $d \geq 29$, by a dyadic-triadic interval-hitting argument.
CVMar 28
Let Triggers Control: Frequency-Aware Dropout for Effective Token ControlJunyoung Koh, Hoyeon Moon, Dongha Kim et al.
Text-to-image models such as Stable Diffusion have achieved unprecedented levels of high-fidelity visual synthesis. As these models advance, personalization of generative models -- commonly facilitated through Low-Rank Adaptation (LoRA) with a dedicated trigger token -- has become a significant area of research. Previous works have naively assumed that fine-tuning with a single trigger token to represent new concepts. However, this often results in poor controllability, where the trigger token alone fails to reliably evoke the intended concept. We attribute this issue to the frequent co-occurrence of the trigger token with the surrounding context during fine-tuning, which entangles their representations and compromises the token's semantic distinctiveness. To disentangle this, we propose Frequency-Aware Dropout (FAD) -- a novel regularization technique that improves prompt controllability without adding new parameters. FAD consists of two key components: co-occurrence analysis and curriculum-inspired scheduling. Qualitative and quantitative analyses across token-based diffusion models (SD~1.5 and SDXL) and natural language--driven backbones (FLUX and Qwen-Image) demonstrate consistent gains in prompt fidelity, stylistic precision, and user-perceived quality. Our method provides a simple yet effective dropout strategy that enhances controllability and personalization in text-to-image generation. Notably, it achieves these improvements without introducing additional parameters or architectural modifications, making it readily applicable to existing models with minimal computational overhead.
COMay 1
Hamilton decompositions of the directed 7-torus at odd modulus via root-flat certificates and a prefix-count constructionSangHyun Park
We prove that the directed seven-dimensional equal-side torus D_7(m) = Cay((Z/mZ)^7, {e_0, e_1, ..., e_6}) admits a directed Hamilton decomposition for every odd integer m >= 3. The proof has two main contributions. First, we introduce the root-flat certificate: a named verification framework in which a Hamilton decomposition of D_n(m) follows from three local conditions on a single root flat -- row Latinness, layer bijectivity, and primitive return maps. This abstraction was used informally in the earlier odd D_5(m) construction; here it appears as a definition and a theorem, providing a common verification interface for prime-dimensional base cases. Second, for every odd m >= 7, we give a uniform prefix-coordinate construction: one-layer prefix maps, a symbol-count criterion, and explicit 7x7 count matrices produce all seven Hamilton factors without a finite search. The remaining moduli m = 3 and m = 5 are exactly the boundary where the prefix-count method provably cannot work; they are handled by finite root-flat certificates whose validity is checked in Lean 4. A Lean 4 formalization verifies the Cayley statement, with the symbolic branch and the finite boundary certificates checked in the same development.
ROMar 17
GenZ-LIO: Generalizable LiDAR-Inertial Odometry Beyond Indoor--Outdoor BoundariesDaehan Lee, Hyungtae Lim, Seongjun Kim et al.
Light detection and ranging (LiDAR)-inertial odometry (LIO) enables accurate localization and mapping for autonomous navigation in various scenes. However, its performance remains sensitive to variations in spatial scale, which refers to the spatial extent of the scene reflected in the distribution of point ranges in a LiDAR scan. Transitions between confined indoor and expansive outdoor spaces induce substantial variations in point density, which may reduce robustness and computational efficiency. To address this issue, we propose GenZ-LIO, a LIO framework generalizable across both indoor and outdoor environments. GenZ-LIO comprises three key components. First, inspired by the principle of the proportional-integral-derivative (PID) controller, it adaptively regulates the voxel size for downsampling via feedback control, driving the voxelized point count toward a scale-informed setpoint while enabling stable and efficient processing across varying scene scales. Second, we formulate a hybrid-metric state update that jointly leverages point-to-plane and point-to-point residuals to mitigate LiDAR degeneracy arising from directionally insufficient geometric constraints. Third, to alleviate the computational burden introduced by point-to-point matching, we introduce a voxel-pruned correspondence search strategy that discards non-promising voxel candidates and reduces unnecessary computations. Experimental results demonstrate that GenZ-LIO achieves robust odometry estimation and improved computational efficiency across confined indoor, open outdoor, and transitional environments. Our code will be made publicly available upon publication.
ROMar 19
ROFT-VINS: Robust Feature Tracking-based Visual-Inertial State Estimation for Harsh EnvironmentSanghyun Park, Soohee Han
SLAM (Simultaneous Localization and Mapping) and Odometry are important systems for estimating the position of mobile devices, such as robots and cars, utilizing one or more sensors. Particularly in camera-based SLAM or Odometry, effectively tracking visual features is important as it significantly impacts system performance. In this paper, we propose a method that leverages deep learning to robustly track visual features in monocular camera images. This method operates reliably even in textureless environments and situations with rapid lighting changes. Additionally, we evaluate the performance of our proposed method by integrating it into VINS-Fusion (Monocular-Inertial), a commonly used Visual-Inertial Odometry (VIO) system.
LGOct 31, 2025
Relation-Aware Bayesian Optimization of DBMS Configurations Guided by Affinity ScoresSein Kwon, Seulgi Baek, Hyunseo Yang et al.
Database Management Systems (DBMSs) are fundamental for managing large-scale and heterogeneous data, and their performance is critically influenced by configuration parameters. Effective tuning of these parameters is essential for adapting to diverse workloads and maximizing throughput while minimizing latency. Recent research has focused on automated configuration optimization using machine learning; however, existing approaches still exhibit several key limitations. Most tuning frameworks disregard the dependencies among parameters, assuming that each operates independently. This simplification prevents optimizers from leveraging relational effects across parameters, limiting their capacity to capture performancesensitive interactions. Moreover, to reduce the complexity of the high-dimensional search space, prior work often selects only the top few parameters for optimization, overlooking others that contribute meaningfully to performance. Bayesian Optimization (BO), the most common method for automatic tuning, is also constrained by its reliance on surrogate models, which can lead to unstable predictions and inefficient exploration. To overcome these limitations, we propose RelTune, a novel framework that represents parameter dependencies as a Relational Graph and learns GNN-based latent embeddings that encode performancerelevant semantics. RelTune further introduces Hybrid-Score-Guided Bayesian Optimization (HBO), which combines surrogate predictions with an Affinity Score measuring proximity to previously high-performing configurations. Experimental results on multiple DBMSs and workloads demonstrate that RelTune achieves faster convergence and higher optimization efficiency than conventional BO-based methods, achieving state-of-the-art performance across all evaluated scenarios.
CLNov 12, 2025
SpiralThinker: Latent Reasoning through an Iterative Process with Text-Latent InterleavingShengmin Piao, Sanghyun Park
Recent advances in large reasoning models have been driven by reinforcement learning and test-time scaling, accompanied by growing interest in latent rather than purely textual reasoning. However, existing latent reasoning methods lack mechanisms to ensure stable evolution of latent representations and a systematic way to interleave implicit and explicit reasoning. We introduce SpiralThinker, a unified framework that performs iterative updates over latent representations, enabling extended implicit reasoning without generating additional tokens. A progressive alignment objective combined with structured annotations maintains coherence between latent and textual reasoning. Across mathematical, logical, and commonsense reasoning tasks, SpiralThinker achieves the best overall performance among latent reasoning approaches, consistently surpassing previous methods across all benchmarks. Detailed analyses reveal that both iteration and alignment are indispensable, the numbers of latent tokens and iterations exhibit dataset-specific optima, and appropriate alignment proves critical for an effective iterative process. Overall, SpiralThinker bridges iterative computation and latent reasoning, demonstrating that aligned iterative updates can reliably steer reasoning in the latent space.
ROApr 3
ALIVE-LIO: Degeneracy-Aware Learning of Inertial Velocity for Enhancing ESKF-Based LiDAR-Inertial OdometrySeongjun Kim, Daehan Lee, Junwoo Hong et al.
Odometry estimation using light detection and ranging (LiDAR) and an inertial measurement unit (IMU), known as LiDAR-inertial odometry (LIO), often suffers from performance degradation in degenerate environments, such as long corridors or single-wall scenarios with narrow field-of-view LiDAR. To address this limitation, we propose ALIVE-LIO, a degeneracy-aware LiDAR-inertial odometry framework that explicitly enhances state estimation in degenerate directions. The key contribution of ALIVE-LIO is the strategic integration of a deep neural network into a classical error-state Kalman filter (ESKF) to compensate for the loss of LiDAR observability. Specifically, ALIVE-LIO employs a neural network to predict the body-frame velocity and selectively fuses this prediction into the ESKF only when degeneracy is detected, providing effective state updates along degenerate directions. This design enables ALIVE-LIO to utilize the probabilistic structure and consistency of the ESKF while benefiting from learning-based motion estimation. The proposed method was evaluated on publicly available datasets exhibiting degeneracy, as well as on our own collected data. Experimental results demonstrate that ALIVE-LIO substantially reduces pose drift in degenerate environments, yielding the most competitive results in 22 out of 32 sequences. The implementation of ALIVE-LIO will be publicly available.
CVApr 21, 2025
Real-Time Sleepiness Detection for Driver State Monitoring SystemDeepak Ghimire, Sunghwan Jeong, Sunhong Yoon et al.
A driver face monitoring system can detect driver fatigue, which is a significant factor in many accidents, using computer vision techniques. In this paper, we present a real-time technique for driver eye state detection. First, the face is detected, and the eyes are located within the face region for tracking. A normalized cross-correlation-based online dynamic template matching technique, combined with Kalman filter tracking, is proposed to track the detected eye positions in subsequent image frames. A support vector machine with histogram of oriented gradients (HOG) features is used to classify the state of the eyes as open or closed. If the eyes remain closed for a specified period, the driver is considered to be asleep, and an alarm is triggered.
CVFeb 19, 2024
NOTE: Notable generation Of patient Text summaries through Efficient approach based on direct preference optimizationImjin Ahn, Hansle Gwon, Young-Hak Kim et al.
The discharge summary is a one of critical documents in the patient journey, encompassing all events experienced during hospitalization, including multiple visits, medications, tests, surgery/procedures, and admissions/discharge. Providing a summary of the patient's progress is crucial, as it significantly influences future care and planning. Consequently, clinicians face the laborious and resource-intensive task of manually collecting, organizing, and combining all the necessary data for a discharge summary. Therefore, we propose "NOTE", which stands for "Notable generation Of patient Text summaries through an Efficient approach based on direct preference optimization". NOTE is based on Medical Information Mart for Intensive Care- III dataset and summarizes a single hospitalization of a patient. Patient events are sequentially combined and used to generate a discharge summary for each hospitalization. In the present circumstances, large language models' application programming interfaces (LLMs' APIs) are widely available, but importing and exporting medical data presents significant challenges due to privacy protection policies in healthcare institutions. Moreover, to ensure optimal performance, it is essential to implement a lightweight model for internal server or program within the hospital. Therefore, we utilized DPO and parameter efficient fine tuning (PEFT) techniques to apply a fine-tuning method that guarantees superior performance. To demonstrate the practical application of the developed NOTE, we provide a webpage-based demonstration software. In the future, we will aim to deploy the software available for actual use by clinicians in hospital. NOTE can be utilized to generate various summaries not only discharge summaries but also throughout a patient's journey, thereby alleviating the labor-intensive workload of clinicians and aiming for increased efficiency.
COMar 25
Hamilton decompositions of the directed 3-torus: a return-map and odometer viewSangHyun Park
We prove that the directed 3-torus D_3(m), or equivalently the Cartesian product of three directed m-cycles, admits a decomposition into three arc-disjoint directed Hamilton cycles for every integer m >= 3. The proof reduces Hamiltonicity to the m-step return maps on the layer section S=i+j+k=0. For odd m, five Kempe swaps of the canonical coloring produce return maps that are explicitly affine-conjugate to the standard 2-dimensional odometer. For even m, a sign-product invariant rules out Kempe-from-canonical constructions, and a different low-layer witness reduces after one further first-return map to a finite-defect clock-and-carry system. The remaining closure is a finite splice analysis, and the case m=4 is handled separately by a finite witness. A Lean 4 formalization accompanies the construction.
CLJan 4, 2025
Thinking with Many Minds: Using Large Language Models for Multi-Perspective Problem-SolvingSanghyun Park, Boris Maciejovsky, Phanish Puranam
Complex problem-solving requires cognitive flexibility--the capacity to entertain multiple perspectives while preserving their distinctiveness. This flexibility replicates the "wisdom of crowds" within a single individual, allowing them to "think with many minds." While mental simulation enables imagined deliberation, cognitive constraints limit its effectiveness. We propose synthetic deliberation, a Large Language Model (LLM)-based method that simulates discourse between agents embodying diverse perspectives, as a solution. Using a custom GPT-based model, we showcase its benefits: concurrent processing of multiple viewpoints without cognitive degradation, parallel exploration of perspectives, and precise control over viewpoint synthesis. By externalizing the deliberative process and distributing cognitive labor between parallel search and integration, synthetic deliberation transcends mental simulation's limitations. This approach shows promise for strategic planning, policymaking, and conflict resolution.
ROMar 13
Motion-Specific Battery Health Assessment for Quadrotors Using High-Fidelity Battery ModelsJoonhee Kim, Sanghyun Park, Donghyeong Kim et al.
Quadrotor endurance is ultimately limited by battery behavior, yet most energy aware planning treats the battery as a simple energy reservoir and overlooks how flight motions induce dynamic current loads that accelerate battery degradation. This work presents an end to end framework for motion aware battery health assessment in quadrotors. We first design a wide range current sensing module to capture motion specific current profiles during real flights, preserving transient features. In parallel, a high fidelity battery model is calibrated using reference performance tests and a metaheuristic based on a degradation coupled electrochemical model.By simulating measured flight loads in the calibrated model, we systematically resolve how different flight motions translate into degradation modes loss of lithium inventory and loss of active material as well as internal side reactions. The results demonstrate that even when two flight profiles consume the same average energy, their transient load structures can drive different degradation pathways, emphasizing the need for motion-aware battery management that balances efficiency with battery degradation.
CLOct 10, 2025
LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-CorrectionShengmin Piao, Jieun Lee, Sanghyun Park
The Text-to-SQL task translates natural language questions into SQL queries, enabling intuitive database interaction for non-experts. While recent methods leveraging Large Language Models (LLMs) achieve strong performance, their reliance on proprietary models raise concerns about deployment feasibility and data privacy. In this work, we introduce LitE-SQL, a Lightweight and Efficient framework with two components: (i) a Schema Retriever that performs efficient schema linking using a vector database of pre-computed schema embeddings, and (ii) a SQL Generator fine-tuned in two stages-supervised fine-tuning followed by execution-guided reinforcement-enabling self-correction without costly multi-candidate generation. On BIRD, LitE-SQL achieves 72.10% execution accuracy, and on Spider 1.0 it reaches 88.45%, demonstrating comparable or superior performance to LLM-based methods despite using 2x to 30x fewer parameters. Our findings demonstrate that high-quality Text-to-SQL generation is feasible with lightweight models, offering a practical solution for privacy-sensitive and resource-constrained settings.
LGAug 29, 2025
Controllable 3D Molecular Generation for Structure-Based Drug Design Through Bayesian Flow Networks and Gradient IntegrationSeungyeon Choi, Hwanhee Kim, Chihyun Park et al.
Recent advances in Structure-based Drug Design (SBDD) have leveraged generative models for 3D molecular generation, predominantly evaluating model performance by binding affinity to target proteins. However, practical drug discovery necessitates high binding affinity along with synthetic feasibility and selectivity, critical properties that were largely neglected in previous evaluations. To address this gap, we identify fundamental limitations of conventional diffusion-based generative models in effectively guiding molecule generation toward these diverse pharmacological properties. We propose CByG, a novel framework extending Bayesian Flow Network into a gradient-based conditional generative model that robustly integrates property-specific guidance. Additionally, we introduce a comprehensive evaluation scheme incorporating practical benchmarks for binding affinity, synthetic feasibility, and selectivity, overcoming the limitations of conventional evaluation methods. Extensive experiments demonstrate that our proposed CByG framework significantly outperforms baseline models across multiple essential evaluation criteria, highlighting its effectiveness and practicality for real-world drug discovery applications.
CVMar 24, 2025
GranQ: Efficient Channel-wise Quantization via Vectorized Pre-Scaling for Zero-Shot QATInpyo Hong, Youngwan Jo, Hyojeong Lee et al.
Zero-shot quantization (ZSQ) enables neural network compression without original training data, making it a promising solution for restricted data access scenarios. To compensate for the lack of data, recent ZSQ methods typically rely on synthetic inputs generated from the full-precision model. However, these synthetic inputs often lead to activation distortion, especially under low-bit settings. To mitigate this, existing methods typically employ per-channel scaling, but they still struggle due to the severe computational overhead during the accumulation process. To overcome this critical bottleneck, we propose GranQ, a novel activation quantization framework that introduces an efficient pre-scaling strategy. Unlike conventional channel-wise methods that repeatedly perform scaling operations during accumulation, GranQ applies scaling factors in a pre-scaling step through fully vectorized computation, eliminating runtime scaling overhead. This design enables GranQ to maintain fine-grained quantization accuracy while significantly reducing computational burden, particularly in low-bit quantization settings. Extensive experiments under quantization-aware training (QAT) settings demonstrate that GranQ consistently outperforms state-of-the-art ZSQ methods across CIFAR and ImageNet. In particular, our method achieves up to 5.45% higher accuracy in the 3-bit setting on CIFAR-100 and even surpasses the full-precision baseline on CIFAR-10.
CLFeb 12, 2025
Style Extraction on Text Embeddings Using VAE and Parallel DatasetInJin Kong, Shinyee Kang, Yuna Park et al.
This study investigates the stylistic differences among various Bible translations using a Variational Autoencoder (VAE) model. By embedding textual data into high-dimensional vectors, the study aims to detect and analyze stylistic variations between translations, with a specific focus on distinguishing the American Standard Version (ASV) from other translations. The results demonstrate that each translation exhibits a unique stylistic distribution, which can be effectively identified using the VAE model. These findings suggest that the VAE model is proficient in capturing and differentiating textual styles, although it is primarily optimized for distinguishing a single style. The study highlights the model's potential for broader applications in AI-based text generation and stylistic analysis, while also acknowledging the need for further model refinement to address the complexity of multi-dimensional stylistic relationships. Future research could extend this methodology to other text domains, offering deeper insights into the stylistic features embedded within various types of textual data.
CLDec 10, 2024
Multi-Response Preference Optimization with Augmented Ranking DatasetHansle Gwon, Imjin Ahn, Young-Hak Kim et al.
Recent advancements in Large Language Models (LLMs) have been remarkable, with new models consistently surpassing their predecessors. These advancements are underpinned by extensive research on various training mechanisms. Among these, Preference Optimization has played a significant role in improving the performance of LLMs by incorporating human preferences into the training process. However, constructing preference optimization datasets is challenging and the optimization process is highly sensitive to the dataset quality. In this study, we propose a novel approach to augment Preference Optimization datasets. Additionally, we introduce a Multi-response-based Preference Optimization training method that enables the simultaneous learning of multiple responses.
IVDec 4, 2024
MRNet: Multifaceted Resilient Networks for Medical Image-to-Image TranslationHyojeong Lee, Youngwan Jo, Inpyo Hong et al.
We propose a Multifaceted Resilient Network(MRNet), a novel architecture developed for medical image-to-image translation that outperforms state-of-the-art methods in MRI-to-CT and MRI-to-MRI conversion. MRNet leverages the Segment Anything Model (SAM) to exploit frequency-based features to build a powerful method for advanced medical image transformation. The architecture extracts comprehensive multiscale features from diverse datasets using a powerful SAM image encoder and performs resolution-aware feature fusion that consistently integrates U-Net encoder outputs with SAM-derived features. This fusion optimizes the traditional U-Net skip connection while leveraging transformer-based contextual analysis. The translation is complemented by an innovative dual-mask configuration incorporating dynamic attention patterns and a specialized loss function designed to address regional mapping mismatches, preserving both the gross anatomy and tissue details. Extensive validation studies have shown that MRNet outperforms state-of-the-art architectures, particularly in maintaining anatomical fidelity and minimizing translation artifacts.
THJul 30, 2020
Learning what they think vs. learning what they do: The micro-foundations of vicarious learningSanghyun Park, Phanish Puranam
Vicarious learning is a vital component of organizational learning. We theorize and model two fundamental processes underlying vicarious learning: observation of actions (learning what they do) vs. belief sharing (learning what they think). The analysis of our model points to three key insights. First, vicarious learning through either process is beneficial even when no agent in a system of vicarious learners begins with a knowledge advantage. Second, vicarious learning through belief sharing is not universally better than mutual observation of actions and outcomes. Specifically, enabling mutual observability of actions and outcomes is superior to sharing of beliefs when the task environment features few alternatives with large differences in their value and there are no time pressures. Third, symmetry in vicarious learning in fact adversely affects belief sharing but improves observational learning. All three results are shown to be the consequence of how vicarious learning affects self-confirming biased beliefs.