Sungmin Cho

CV
h-index11
9papers
15,622citations
Novelty52%
AI Score53

9 Papers

AIJul 31, 2024
The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri et al. · allen-ai, berkeley

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

CLJan 21
Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

Unggi Lee, Jiyeong Bae, Jaehyeon Park et al.

Large language models (LLMs) are increasingly deployed as intelligent tutoring systems, yet research on optimizing LLMs specifically for educational contexts remains limited. Recent works have proposed reinforcement learning approaches for training LLM tutors, but these methods focus solely on optimizing visible responses while neglecting the model's internal thinking process. We introduce PedagogicalRL-Thinking, a framework that extends pedagogical alignment to reasoning LLMs in education through two novel approaches: (1) Pedagogical Reasoning Prompting, which guides internal reasoning using domain-specific educational theory rather than generic instructions; and (2) Thinking Reward, which explicitly evaluates and reinforces the pedagogical quality of the model's reasoning traces. Our experiments reveal that domain-specific, theory-grounded prompting outperforms generic prompting, and that Thinking Reward is most effective when combined with pedagogical prompting. Furthermore, models trained only on mathematics tutoring dialogues show improved performance on educational benchmarks not seen during training, while preserving the base model's factual knowledge. Our quantitative and qualitative analyses reveal that pedagogical thinking reward produces systematic reasoning trace changes, with increased pedagogical reasoning and more structured instructional decision-making in the tutor's thinking process.

CLJan 20
OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models

Unggi Lee, Sookbun Lee, Heungsoo Choi et al.

Large Language Models are increasingly deployed as educational tools, yet existing benchmarks focus on narrow skills and lack grounding in learning sciences. We introduce OpenLearnLM Benchmark, a theory-grounded framework evaluating LLMs across three dimensions derived from educational assessment theory: Knowledge (curriculum-aligned content and pedagogical understanding), Skills (scenario-based competencies organized through a four-level center-role-scenario-subscenario hierarchy), and Attitude (alignment consistency and deception resistance). Our benchmark comprises 124K+ items spanning multiple subjects, educational roles, and difficulty levels based on Bloom's taxonomy. The Knowledge domain prioritizes authentic assessment items from established benchmarks, while the Attitude domain adapts Anthropic's Alignment Faking methodology to detect behavioral inconsistency under varying monitoring conditions. Evaluation of seven frontier models reveals distinct capability profiles: Claude-Opus-4.5 excels in practical skills despite lower content knowledge, while Grok-4.1-fast leads in knowledge but shows alignment concerns. Notably, no single model dominates all dimensions, validating the necessity of multi-axis evaluation. OpenLearnLM provides an open, comprehensive framework for advancing LLM readiness in authentic educational contexts.

CVAug 19, 2025
MF-LPR$^2$: Multi-Frame License Plate Image Restoration and Recognition using Optical Flow

Kihyun Na, Junseok Oh, Youngkwan Cho et al.

License plate recognition (LPR) is important for traffic law enforcement, crime investigation, and surveillance. However, license plate areas in dash cam images often suffer from low resolution, motion blur, and glare, which make accurate recognition challenging. Existing generative models that rely on pretrained priors cannot reliably restore such poor-quality images, frequently introducing severe artifacts and distortions. To address this issue, we propose a novel multi-frame license plate restoration and recognition framework, MF-LPR$^2$, which addresses ambiguities in poor-quality images by aligning and aggregating neighboring frames instead of relying on pretrained knowledge. To achieve accurate frame alignment, we employ a state-of-the-art optical flow estimator in conjunction with carefully designed algorithms that detect and correct erroneous optical flow estimations by leveraging the spatio-temporal consistency inherent in license plate image sequences. Our approach enhances both image quality and recognition accuracy while preserving the evidential content of the input images. In addition, we constructed a novel Realistic LPR (RLPR) dataset to evaluate MF-LPR$^2$. The RLPR dataset contains 200 pairs of low-quality license plate image sequences and high-quality pseudo ground-truth images, reflecting the complexities of real-world scenarios. In experiments, MF-LPR$^2$ outperformed eight recent restoration models in terms of PSNR, SSIM, and LPIPS by significant margins. In recognition, MF-LPR$^2$ achieved an accuracy of 86.44%, outperforming both the best single-frame LPR (14.04%) and the multi-frame LPR (82.55%) among the eleven baseline models. The results of ablation studies confirm that our filtering and refinement algorithms significantly contribute to these improvements.

CVOct 15, 2025
MUSE: Model-based Uncertainty-aware Similarity Estimation for zero-shot 2D Object Detection and Segmentation

Sungmin Cho, Sungbum Park, Insoo Oh

In this work, we introduce MUSE (Model-based Uncertainty-aware Similarity Estimation), a training-free framework designed for model-based zero-shot 2D object detection and segmentation. MUSE leverages 2D multi-view templates rendered from 3D unseen objects and 2D object proposals extracted from input query images. In the embedding stage, it integrates class and patch embeddings, where the patch embeddings are normalized using generalized mean pooling (GeM) to capture both global and local representations efficiently. During the matching stage, MUSE employs a joint similarity metric that combines absolute and relative similarity scores, enhancing the robustness of matching under challenging scenarios. Finally, the similarity score is refined through an uncertainty-aware object prior that adjusts for proposal reliability. Without any additional training or fine-tuning, MUSE achieves state-of-the-art performance on the BOP Challenge 2025, ranking first across the Classic Core, H3, and Industrial tracks. These results demonstrate that MUSE offers a powerful and generalizable framework for zero-shot 2D object detection and segmentation.

CLFeb 19, 2025
UniKnow: A Unified Framework for Reliable Language Model Behavior across Parametric and External Knowledge

Youna Kim, Hyuhng Joon Kim, Minjoon Choi et al.

Language models often benefit from external knowledge beyond parametric knowledge. While this combination enhances performance, achieving reliable knowledge utilization remains challenging, as it requires assessing the state of each knowledge source based on the presence of relevant information. Yet, prior work on knowledge integration often overlooks this challenge by assuming ideal conditions and provides limited coverage of knowledge scenarios. To address this gap, we introduce UniKnow, a Unified framework for reliable LM behavior across parametric and external Knowledge. UniKnow enables controlled evaluation across knowledge scenarios such as knowledge conflict, distraction, and absence conditions that are rarely addressed together. Beyond evaluating existing methods under this setting, we extend our work by introducing UniKnow-Aware methods to support comprehensive evaluation. Experiments on UniKnow reveal that existing methods struggle to generalize across a broader range of knowledge configurations and exhibit scenario-specific biases. UniKnow thus provides a foundation for systematically exploring and improving reliability under knowledge scenarios.

CVJun 1, 2024
Upright adjustment with graph convolutional networks

Raehyuk Jung, Sungmin Cho, Junseok Kwon

We present a novel method for the upright adjustment of 360 images. Our network consists of two modules, which are a convolutional neural network (CNN) and a graph convolutional network (GCN). The input 360 images is processed with the CNN for visual feature extraction, and the extracted feature map is converted into a graph that finds a spherical representation of the input. We also introduce a novel loss function to address the issue of discrete probability distributions defined on the surface of a sphere. Experimental results demonstrate that our method outperforms fully connected based methods.

CVFeb 10, 2022
Spherical Transformer

Sungmin Cho, Raehyuk Jung, Junseok Kwon

Using convolutional neural networks for 360images can induce sub-optimal performance due to distortions entailed by a planar projection. The distortion gets deteriorated when a rotation is applied to the 360image. Thus, many researches based on convolutions attempt to reduce the distortions to learn accurate representation. In contrast, we leverage the transformer architecture to solve image classification problems for 360images. Using the proposed transformer for 360images has two advantages. First, our method does not require the erroneous planar projection process by sampling pixels from the sphere surface. Second, our sampling method based on regular polyhedrons makes low rotation equivariance errors, because specific rotations can be reduced to permutations of faces. In experiments, we validate our network on two aspects, as follows. First, we show that using a transformer with highly uniform sampling methods can help reduce the distortion. Second, we demonstrate that the transformer architecture can achieve rotation equivariance on specific rotations. We compare our method to other state-of-the-art algorithms using the SPH-MNIST, SPH-CIFAR, and SUN360 datasets and show that our method is competitive with other methods.

IRDec 29, 2021
On the Overlooked Significance of Underutilized Contextual Features in Recent News Recommendation Models

Sungmin Cho, Hongjun Lim, Keunchan Park et al.

Personalized news recommendation aims to provide attractive articles for readers by predicting their likelihood of clicking on a certain article. To accurately predict this probability, plenty of studies have been proposed that actively utilize content features of articles, such as words, categories, or entities. However, we observed that the articles' contextual features, such as CTR (click-through-rate), popularity, or freshness, were either neglected or underutilized recently. To prove that this is the case, we conducted an extensive comparison between recent deep-learning models and naive contextual models that we devised and surprisingly discovered that the latter easily outperforms the former. Furthermore, our analysis showed that the recent tendency to apply overly sophisticated deep-learning operations to contextual features was actually hindering the recommendation performance. From this knowledge, we design a purposefully simple contextual module that can boost the previous news recommendation models by a large margin.