Dongseok Kim

LG
h-index17
13papers
11citations
Novelty54%
AI Score51

13 Papers

78.4CVApr 14
Motif-Video 2B: Technical Report

Junghwan Lim, Wai Ting Cheung, Minsu Ha et al.

Training strong video generation models usually requires massive datasets, large parameter counts, and substantial compute. In this work, we ask whether strong text-to-video quality is possible at a much smaller budget: fewer than 10M clips and less than 100,000 H200 GPU hours. Our core claim is that part of the answer lies in how model capacity is organized, not only in how much of it is used. In video generation, prompt alignment, temporal consistency, and fine-detail recovery can interfere with one another when they are handled through the same pathway. Motif-Video 2B addresses this by separating these roles architecturally, rather than relying on scale alone. The model combines two key ideas. First, Shared Cross-Attention strengthens text control when video token sequences become long. Second, a three-part backbone separates early fusion, joint representation learning, and detail refinement. To make this design effective under a limited compute budget, we pair it with an efficient training recipe based on dynamic token routing and early-phase feature alignment to a frozen pretrained video encoder. Our analysis shows that later blocks develop clearer cross-frame attention structure than standard single-stream baselines. On VBench, Motif-Video~2B reaches 83.76\%, surpassing Wan2.1 14B while using 7$\times$ fewer parameters and substantially less training data. These results suggest that careful architectural specialization, combined with an efficiency-oriented training recipe, can narrow or exceed the quality gap typically associated with much larger video models.

LGDec 1, 2025
CLAPS: Posterior-Aware Conformal Intervals via Last-Layer Laplace

Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool et al.

We present CLAPS, a posterior-aware conformal regression method that pairs a Last-Layer Laplace Approximation with split-conformal calibration. From the resulting Gaussian posterior, CLAPS defines a simple two-sided posterior CDF score that aligns the conformity metric with the full predictive shape, not just a point estimate. This alignment yields narrower prediction intervals at the same target coverage, especially on small to medium tabular datasets where data are scarce and uncertainty modeling matters. We also provide a lightweight diagnostic suite that separates aleatoric and epistemic components and visualizes posterior behavior, helping practitioners understand why intervals shrink when they do. Across multiple benchmarks using the same MLP backbone, CLAPS consistently attains nominal coverage with improved efficiency and minimal overhead, offering a clear, practical upgrade to residual-based conformal baselines.

MLDec 8, 2025
$φ$-test: Global Feature Selection and Inference for Shapley Additive Explanations

Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool et al.

We propose $φ$-test, a global feature-selection and significance procedure for black-box predictors that combines Shapley attributions with selective inference. Given a trained model and an evaluation dataset, $φ$-test performs SHAP-guided screening and fits a linear surrogate on the screened features via a selection rule with a tractable selective-inference form. For each retained feature, it outputs a Shapley-based global score, a surrogate coefficient, and post-selection $p$-values and confidence intervals in a global feature-importance table. Experiments on real tabular regression tasks with tree-based and neural backbones suggest that $φ$-test can retain much of the predictive ability of the original model while using only a few features and producing feature sets that remain fairly stable across resamples and backbone classes. In these settings, $φ$-test acts as a practical global explanation layer linking Shapley-based importance summaries with classical statistical inference.

CLNov 7, 2025
Motif 2 12.7B technical report

Junghwan Lim, Sungmin Lee, Dongseok Kim et al.

We introduce Motif-2-12.7B, a new open-weight foundation model that pushes the efficiency frontier of large language models by combining architectural innovation with system-level optimization. Designed for scalable language understanding and robust instruction generalization under constrained compute budgets, Motif-2-12.7B builds upon Motif-2.6B with the integration of Grouped Differential Attention (GDA), which improves representational efficiency by disentangling signal and noise-control attention pathways. The model is pre-trained on 5.5 trillion tokens spanning diverse linguistic, mathematical, scientific, and programming domains using a curriculum-driven data scheduler that gradually changes the data composition ratio. The training system leverages the MuonClip optimizer alongside custom high-performance kernels, including fused PolyNorm activations and the Parallel Muon algorithm, yielding significant throughput and memory efficiency gains in large-scale distributed environments. Post-training employs a three-stage supervised fine-tuning pipeline that successively enhances general instruction adherence, compositional understanding, and linguistic precision. Motif-2-12.7B demonstrates competitive performance across diverse benchmarks, showing that thoughtful architectural scaling and optimized training design can rival the capabilities of much larger models.

LGOct 8, 2025
Grouped Differential Attention

Junghwan Lim, Sungmin Lee, Dongseok Kim et al.

The self-attention mechanism, while foundational to modern Transformer architectures, suffers from a critical inefficiency: it frequently allocates substantial attention to redundant or noisy context. Differential Attention addressed this by using subtractive attention maps for signal and noise, but its required balanced head allocation imposes rigid constraints on representational flexibility and scalability. To overcome this, we propose Grouped Differential Attention (GDA), a novel approach that introduces unbalanced head allocation between signal-preserving and noise-control groups. GDA significantly enhances signal focus by strategically assigning more heads to signal extraction and fewer to noise-control, stabilizing the latter through controlled repetition (akin to GQA). This design achieves stronger signal fidelity with minimal computational overhead. We further extend this principle to group-differentiated growth, a scalable strategy that selectively replicates only the signal-focused heads, thereby ensuring efficient capacity expansion. Through large-scale pretraining and continual training experiments, we demonstrate that moderate imbalance ratios in GDA yield substantial improvements in generalization and stability compared to symmetric baselines. Our results collectively establish that ratio-aware head allocation and selective expansion offer an effective and practical path toward designing scalable, computation-efficient Transformer architectures.

AIDec 11, 2025
Motif-2-12.7B-Reasoning: A Practitioner's Guide to RL Training Recipes

Junghwan Lim, Sungmin Lee, Dongseok Kim et al.

We introduce Motif-2-12.7B-Reasoning, a 12.7B parameter language model designed to bridge the gap between open-weight systems and proprietary frontier models in complex reasoning and long-context understanding. Addressing the common challenges of model collapse and training instability in reasoning adaptation, we propose a comprehensive, reproducible training recipe spanning system, data, and algorithmic optimizations. Our approach combines memory-efficient infrastructure for 64K-token contexts using hybrid parallelism and kernel-level optimizations with a two-stage Supervised Fine-Tuning (SFT) curriculum that mitigates distribution mismatch through verified, aligned synthetic data. Furthermore, we detail a robust Reinforcement Learning Fine-Tuning (RLFT) pipeline that stabilizes training via difficulty-aware data filtering and mixed-policy trajectory reuse. Empirical results demonstrate that Motif-2-12.7B-Reasoning achieves performance comparable to models with significantly larger parameter counts across mathematics, coding, and agentic benchmarks, offering the community a competitive open model and a practical blueprint for scaling reasoning capabilities under realistic compute constraints.

LGDec 14, 2025
Theoretical Foundations of Prompt Engineering: From Heuristics to Expressivity

Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool et al.

Prompts can switch a model's behavior even when the weights are fixed, yet this phenomenon is rarely treated as a clean theoretical object rather than a heuristic. We study the family of functions obtainable by holding a Transformer backbone fixed as an executor and varying only the prompt. Our core idea is to view the prompt as an externally injected program and to construct a simplified Transformer that interprets it to implement different computations. The construction exposes a mechanism-level decomposition: attention performs selective routing from prompt memory, the FFN performs local arithmetic conditioned on retrieved fragments, and depth-wise stacking composes these local updates into a multi-step computation. Under this viewpoint, we prove a constructive existential result showing that a single fixed backbone can approximate a broad class of target behaviors via prompts alone. The framework provides a unified starting point for formalizing trade-offs under prompt length/precision constraints and for studying structural limits of prompt-based switching, while remaining distinct from empirical claims about pretrained LLMs.

LGSep 13, 2025
FACTORS: Factorial Approximation for Complementary Two-factor Optimization with Risk-aware Scoring

Dongseok Kim, Wonjun Jeong, Gisung Oh

We propose FACTORS, a framework that combines design of experiments with Shapley decomposition to address performance and stability issues that are sensitive to combinations of training factors. Our approach consistently estimates main effects and two-factor interactions, then integrates them into a risk-adjusted objective function that jointly accounts for uncertainty and cost, enabling reliable selection of configurations under a fixed budget. Effect estimation is implemented through two complementary paths: a plug-in path based on conditional means, and a least-squares path that reconstructs Shapley contributions from samples. These paths are designed to work complementarily even when design density and bias levels differ. By incorporating standardization of estimates, bias correction, and uncertainty quantification, our procedure ensures comparability across heterogeneous factor spaces and designs, while a lightweight search routine yields configurations within practical time even for large factor spaces. On the theoretical side, we provide error decompositions, sample complexity analysis, and upper bounds on optimality gaps. On the interpretive side, we summarize main effects and interactions in map form, highlighting adjustment priorities and safe improvement pathways. Across diverse datasets and design conditions, our approach improves rank preservation and optimal configuration identification, reduces decision-making risks, and offers a tuning foundation that delivers interpretable justification alongside stable performance gains even under budget constraints.

CLSep 4, 2025
Expanding Foundational Language Capabilities in Open-Source LLMs through a Korean Case Study

Junghwan Lim, Gangwon Jo, Sungmin Lee et al.

We introduce Llama-3-Motif, a language model consisting of 102 billion parameters, specifically designed to enhance Korean capabilities while retaining strong performance in English. Developed on the Llama 3 architecture, Llama-3-Motif employs advanced training techniques, including LlamaPro and Masked Structure Growth, to effectively scale the model without altering its core Transformer architecture. Using the MoAI platform for efficient training across hyperscale GPU clusters, we optimized Llama-3-Motif using a carefully curated dataset that maintains a balanced ratio of Korean and English data. Llama-3-Motif shows decent performance on Korean-specific benchmarks, outperforming existing models and achieving results comparable to GPT-4.

LGSep 2, 2025
Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It

Dongseok Kim, Wonjun Jeong, Gisung Oh

The success of Federated Learning depends on the actions that participants take out of sight. We model Federated Learning not as a mere optimization task but as a strategic system entangled with rules and incentives. From this perspective, we present an analytical framework that makes it possible to clearly identify where behaviors that genuinely improve performance diverge from those that merely target metrics. We introduce two indices that respectively quantify behavioral incentives and collective performance loss, and we use them as the basis for consistently interpreting the impact of operational choices such as rule design, the level of information disclosure, evaluation methods, and aggregator switching. We further summarize thresholds, auto-switch rules, and early warning signals into a checklist that can be applied directly in practice, and we provide both a practical algorithm for allocating limited audit resources and a performance guarantee. Simulations conducted across diverse environments consistently validate the patterns predicted by our framework, and we release all procedures for full reproducibility. While our approach operates most strongly under several assumptions, combining periodic recalibration, randomization, and connectivity-based alarms enables robust application under the variability of real-world operations. We present both design principles and operational guidelines that lower the incentives for metric gaming while sustaining and expanding stable cooperation.

LGAug 24, 2025
Convergence and Generalization of Anti-Regularization for Parametric Models

Dongseok Kim, Wonjun Jeong, Gisung Oh

Anti-regularization introduces a reward term with a reversed sign into the loss function, deliberately amplifying model expressivity in small-sample regimes while ensuring that the intervention gradually vanishes as the sample size grows through a power-law decay schedule. We formalize spectral safety conditions and trust-region constraints, and we design a lightweight safeguard that combines a projection operator with gradient clipping to guarantee stable intervention. Theoretical analysis extends to linear smoothers and the Neural Tangent Kernel regime, providing practical guidance on the choice of decay exponents through the balance between empirical risk and variance. Empirical results show that Anti-regularization mitigates underfitting in both regression and classification while preserving generalization and improving calibration. Ablation studies confirm that the decay schedule and safeguards are essential to avoiding overfitting and instability. As an alternative, we also propose a degrees-of-freedom targeting schedule that maintains constant per-sample complexity. Anti-regularization constitutes a simple and reproducible procedure that integrates seamlessly into standard empirical risk minimization pipelines, enabling robust learning under limited data and resource constraints by intervening only when necessary and vanishing otherwise.

LGAug 2, 2025
Motif 2.6B Technical Report

Junghwan Lim, Sungmin Lee, Dongseok Kim et al.

Recent advancements in Large Language Models (LLMs) have revolutionized artificial intelligence, yet developing an effective foundational LLM that balances high performance with computational efficiency remains challenging, especially for emerging research groups. To address this gap, we introduce Motif-2.6B, a 2.6-billion-parameter foundation model designed to democratize advanced LLM capabilities. Motif-2.6B incorporates several innovative architectural enhancements, including Differential Attention and PolyNorm activation functions, which improve long-context comprehension, reduce hallucination, and enhance in-context learning capabilities. We rigorously tested multiple novel architectural components through extensive experimentation to determine the optimal architecture for Motif-2.6B. Comprehensive evaluations demonstrate that Motif-2.6B consistently meets or exceeds the performance of similarly sized state-of-the-art models across diverse benchmarks, showcasing its effectiveness, scalability, and real-world applicability. Through detailed experiments and tailored techniques, Motif-2.6B significantly advances the landscape of efficient, scalable, and powerful foundational LLMs, offering valuable insights and a robust foundation for future research and deployment.

CLJul 24, 2025
SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

Wonjun Jeong, Dongseok Kim, Taegkeun Whangbo

Large Language Models (LLMs) can achieve inflated scores on multiple-choice tasks by exploiting inherent biases in option positions or labels, rather than demonstrating genuine understanding. This study introduces SCOPE, an evaluation framework designed to measure and mitigate such selection bias in a dataset-independent manner. By repeatedly invoking a null prompt that lacks semantic content, SCOPE estimates each model's unique position-bias distribution. It then redistributes the answer slot according to the inverse-bias distribution, thereby equalizing the lucky-rate, the probability of selecting the correct answer by chance. Furthermore, it prevents semantically similar distractors from being placed adjacent to the answer, thereby blocking near-miss guesses based on superficial proximity cues. Across multiple benchmark experiments, SCOPE consistently outperformed existing debiasing methods in terms of stable performance improvements and showed clearer confidence distributions over correct options. This framework thus offers a new standard for enhancing the fairness and reliability of LLM evaluations.