Jinwoo Baek

LG
h-index1
4papers
3citations
Novelty55%
AI Score47

4 Papers

CLJan 14Code
Mi:dm 2.0 Korea-centric Bilingual Language Models

Donghoon Shin, Sejung Lee, Soonmin Bae et al.

We introduce Mi:dm 2.0, a bilingual large language model (LLM) specifically engineered to advance Korea-centric AI. This model goes beyond Korean text processing by integrating the values, reasoning patterns, and commonsense knowledge inherent to Korean society, enabling nuanced understanding of cultural contexts, emotional subtleties, and real-world scenarios to generate reliable and culturally appropriate responses. To address limitations of existing LLMs, often caused by insufficient or low-quality Korean data and lack of cultural alignment, Mi:dm 2.0 emphasizes robust data quality through a comprehensive pipeline that includes proprietary data cleansing, high-quality synthetic data generation, strategic data mixing with curriculum learning, and a custom Korean-optimized tokenizer to improve efficiency and coverage. To realize this vision, we offer two complementary configurations: Mi:dm 2.0 Base (11.5B parameters), built with a depth-up scaling strategy for general-purpose use, and Mi:dm 2.0 Mini (2.3B parameters), optimized for resource-constrained environments and specialized tasks. Mi:dm 2.0 achieves state-of-the-art performance on Korean-specific benchmarks, with top-tier zero-shot results on KMMLU and strong internal evaluation results across language, humanities, and social science tasks. The Mi:dm 2.0 lineup is released under the MIT license to support extensive research and commercial use. By offering accessible and high-performance Korea-centric LLMs, KT aims to accelerate AI adoption across Korean industries, public services, and education, strengthen the Korean AI developer community, and lay the groundwork for the broader vision of K-intelligence. Our models are available at https://huggingface.co/K-intelligence. For technical inquiries, please contact midm-llm@kt.com.

NAOct 19, 2025
Matrix Phylogeny: Compact Spectral Fingerprints for Trap-Robust Preconditioner Selection

Jinwoo Baek

Matrix Phylogeny introduces compact spectral fingerprints (CSF/ASF) that characterize matrices at the family level. These fingerprints are low-dimensional, eigendecomposition-free descriptors built from Chebyshev trace moments estimated by Hutchinson sketches. A simple affine rescaling to [-1,1] makes them permutation/similarity invariant and robust to global scaling. Across synthetic and real tests, we observe phylogenetic compactness: only a few moments are needed. CSF with K=3-5 already yields perfect clustering (ARI=1.0; silhouettes ~0.89) on four synthetic families and a five-family set including BA vs ER, while ASF adapts the dimension on demand (median K*~9). On a SuiteSparse mini-benchmark (Hutchinson p~100), both CSF-H and ASF-H reach ARI=1.0. Against strong alternatives (eigenvalue histograms + Wasserstein, heat-kernel traces, WL-subtree), CSF-K=5 matches or exceeds accuracy while avoiding eigendecompositions and using far fewer features (K<=10 vs 64/9153). The descriptors are stable to noise (log-log slope ~1.03, R^2~0.993) and support a practical trap->recommend pipeline for automated preconditioner selection. In an adversarial E6+ setting with a probe-and-switch mechanism, our physics-guided recommender attains near-oracle iteration counts (p90 regret=0), whereas a Frobenius 1-NN baseline exhibits large spikes (p90~34-60). CSF/ASF deliver compact (K<=10), fast, invariant fingerprints that enable scalable, structure-aware search and recommendation over large matrix repositories. We recommend CSF with K=5 by default, and ASF when domain-specific adaptivity is desired.

LGOct 17, 2025
Chebyshev Moment Regularization (CMR): Condition-Number Control with Moment Shaping

Jinwoo Baek

We introduce \textbf{Chebyshev Moment Regularization (CMR)}, a simple, architecture-agnostic loss that directly optimizes layer spectra. CMR jointly controls spectral edges via a log-condition proxy and shapes the interior via Chebyshev moments, with a decoupled, capped mixing rule that preserves task gradients. We prove strictly monotone descent for the condition proxy, bounded moment gradients, and orthogonal invariance. In an adversarial ``$κ$-stress'' setting (MNIST, 15-layer MLP), \emph{compared to vanilla training}, CMR reduces mean layer condition numbers by $\sim\!10^3$ (from $\approx3.9\!\times\!10^3$ to $\approx3.4$ in 5 epochs), increases average gradient magnitude, and restores test accuracy ( $\approx10\%\!\to\!\approx86\%$ ). These results support \textbf{optimization-driven spectral preconditioning}: directly steering models toward well-conditioned regimes for stable, accurate learning.

LGOct 17, 2025
Numerical Fragility in Transformers: A Layer-wise Theory for Explaining, Forecasting, and Mitigating Instability

Jinwoo Baek

Transformers trained in low precision can suffer forward-error amplification. We give a first-order, module-wise theory that predicts when and where errors grow. For self-attention we derive a per-layer bound that factorizes into three interpretable diagnostics: a score-scale ratio $κ_{\rm score}$, a rowwise softmax sensitivity $κ_{\rm softmax}$, and value conditioning $κ(V)$. We prove a residual relaxation inequality showing that residual blocks attenuate depth-wise accumulation, and we introduce a precision- and width-aware LayerNorm indicator $ρ_{\rm LN}$ with a matching first-order bound in the $ε$-dominated regime. These pieces yield a unified forward-stability bound whose right-hand side is directly estimable during training. On Tiny-ViT/CIFAR-10 we evaluate the bound and components. (1) The combined predictor $κ_{\rm softmax},(1+κ_{\rm score}),κ(V),|W_O|2+κ{\rm eff}+C_{\rm LN}$ tracks FP32$\leftrightarrow$LP mismatches across seeds, widths, and precisions; scaling by $ε_{\rm mach}$ collapses mixed-precision points. (2) The time-series maximum of $κ_{\rm softmax}$ acts as an early-warning signal, leading error spikes by 16-24 steps (corr. 0.65-0.82; permutation $p!\approx!10^{-3}$; Precision@K 0.89-1.00). (3) Guided by $ρ_{\rm LN}$, a small LayerNorm-$ε$ tweak targeting $ρ_\star$ gives consistent stabilization (mean tail-loss $\downarrow\ \approx0.010$ at $ρ_\star!=!0.6$, cap$=10^{-2}$) with negligible overhead. Overall, our theory supplies actionable, unitless diagnostics that (i) explain when self-attention is fragile, (ii) forecast instability, and (iii) motivate a minimally invasive mitigation.