Gayoung Kim

1.4CVApr 20, 2022

Vision System of Curling Robots: Thrower and Skip

Seongwook Yoon, Gayoung Kim, Myungpyo Hong et al.

We built a vision system of curling robot which can be expected to play with human curling player. Basically, we built two types of vision systems for thrower and skip robots, respectively. First, the thrower robot drives towards a given point of curling sheet to release a stone. Our vision system in the thrower robot initialize 3DoF pose on two dimensional curling sheet and updates the pose to decide for the decision of stone release. Second, the skip robot stands at the opposite side of the thrower robot and monitors the state of the game to make a strategic decision. Our vision system in the skip robot recognize every stones on the curling sheet precisely. Since the viewpoint is quite perspective, many stones are occluded by each others so it is challenging to estimate the accurate position of stone. Thus, we recognize the ellipses of stone handles outline to find the exact midpoint of the stones using perspective Hough transform. Furthermore, we perform tracking of a thrown stone to produce a trajectory for ice condition analysis. Finally, we implemented our vision systems on two mobile robots and successfully perform a single turn and even careful gameplay. Specifically, our vision system includes three cameras with different viewpoint for their respective purposes.

2.7CLSep 23, 2025

Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity

Yeongbin Seo, Gayoung Kim, Jaehyung Kim et al.

As large language models (LLMs) are pretrained on massive web corpora, careful selection of data becomes essential to ensure effective and efficient learning. While perplexity (PPL)-based filtering has shown strong performance, it suffers from drawbacks: substantial time costs and inherent unreliability of the model when handling noisy or out-of-distribution samples. In this work, we propose a simple yet powerful alternative: a prior-based data filtering method that estimates token priors using corpus-level term frequency statistics, inspired by linguistic insights on word roles and lexical density. Our approach filters documents based on the mean and standard deviation of token priors, serving as a fast proxy to PPL while requiring no model inference. Despite its simplicity, the prior-based filter achieves the highest average performance across 20 downstream benchmarks, while reducing time cost by over 1000x compared to PPL-based filtering. We further demonstrate its applicability to symbolic languages such as code and math, and its dynamic adaptability to multilingual corpora without supervision

Gayoung Kim

2 Papers