Sungchul Choi

CL
h-index18
12papers
891citations
Novelty41%
AI Score44

12 Papers

CVSep 25, 2024Code
DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

Kyuheon Jung, Yongdeuk Seo, Seongwoo Cho et al.

In this paper, we present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) to tackle the challenges inherent in data-scarce scenarios. Recently, DMs have opened up the possibility of generating synthetic images to complement a few training images. However, increasing the diversity of synthetic images also raises the risk of generating samples outside the target distribution. Our approach addresses this issue by embedding novel semantic information into text prompts via LLM and utilizing real images as visual prompts, thus generating semantically rich images. To ensure that the generated images remain within the target distribution, we dynamically adjust the guidance weight based on each image's CLIPScore to control the diversity. Experimental results show that our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution. Consequently, our approach proves to be more efficient in the few-shot setting on several benchmarks. Our code is available at https://github.com/kkyuhun94/dalda .

CLFeb 13, 2023
Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

Jaeyoung Kim, Dongbin Na, Sungchul Choi et al.

While pre-trained language models (PLMs) have become a de-facto standard promoting the accuracy of text classification tasks, recent studies find that PLMs often predict over-confidently. Although various calibration methods have been proposed, such as ensemble learning and data augmentation, most of the methods have been verified in computer vision benchmarks rather than in PLM-based text classification tasks. In this paper, we present an empirical study on confidence calibration for PLMs, addressing three categories, including confidence penalty losses, data augmentations, and ensemble methods. We find that the ensemble model overfitted to the training set shows sub-par calibration performance and also observe that PLMs trained with confidence penalty loss have a trade-off between calibration and accuracy. Building on these observations, we propose the Calibrated PLM (CALL), a combination of calibration techniques. The CALL complements the drawbacks that may occur when utilizing a calibration method individually and boosts both classification and calibration accuracy. Design choices in CALL's training procedures are extensively studied, and we provide a detailed analysis of how calibration techniques affect the calibration performance of PLMs.

CLJul 18, 2023
Pseudo Outlier Exposure for Out-of-Distribution Detection using Pretrained Transformers

Jaeyoung Kim, Kyuheon Jung, Dongbin Na et al.

For real-world language applications, detecting an out-of-distribution (OOD) sample is helpful to alert users or reject such unreliable samples. However, modern over-parameterized language models often produce overconfident predictions for both in-distribution (ID) and OOD samples. In particular, language models suffer from OOD samples with a similar semantic representation to ID samples since these OOD samples lie near the ID manifold. A rejection network can be trained with ID and diverse outlier samples to detect test OOD samples, but explicitly collecting auxiliary OOD datasets brings an additional burden for data collection. In this paper, we propose a simple but effective method called Pseudo Outlier Exposure (POE) that constructs a surrogate OOD dataset by sequentially masking tokens related to ID classes. The surrogate OOD sample introduced by POE shows a similar representation to ID data, which is most effective in training a rejection network. Our method does not require any external OOD data and can be easily implemented within off-the-shelf Transformers. A comprehensive comparison with state-of-the-art algorithms demonstrates POE's competitiveness on several text classification benchmarks.

CLJan 8Code
PILOT-Bench: A Benchmark for Legal Reasoning in the Patent Domain with IRAC-Aligned Classification Tasks

Yehoon Jang, Chaewon Lee, Hyun-seok Min et al.

The Patent Trial and Appeal Board (PTAB) of the USPTO adjudicates thousands of ex parte appeals each year, requiring the integration of technical understanding and legal reasoning. While large language models (LLMs) are increasingly applied in patent and legal practice, their use has remained limited to lightweight tasks, with no established means of systematically evaluating their capacity for structured legal reasoning in the patent domain. In this work, we introduce PILOT-Bench, the first PTAB-centric benchmark that aligns PTAB decisions with USPTO patent data at the case-level and formalizes three IRAC-aligned classification tasks: Issue Type, Board Authorities, and Subdecision. We evaluate a diverse set of closed-source (commercial) and open-source LLMs and conduct analyses across multiple perspectives, including input-variation settings, model families, and error tendencies. Notably, on the Issue Type task, closed-source models consistently exceed 0.75 in Micro-F1 score, whereas the strongest open-source model (Qwen-8B) achieves performance around 0.56, highlighting a substantial gap in reasoning capabilities. PILOT-Bench establishes a foundation for the systematic evaluation of patent-domain legal reasoning and points toward future directions for improving LLMs through dataset design and model alignment. All data, code, and benchmark resources are available at https://github.com/TeamLab/pilot-bench.

CLJan 5, 2025Code
Multi-LLM Collaborative Caption Generation in Scientific Documents

Jaeyoung Kim, Jongho Lee, Hong-Jun Choi et al.

Scientific figure captioning is a complex task that requires generating contextually appropriate descriptions of visual content. However, existing methods often fall short by utilizing incomplete information, treating the task solely as either an image-to-text or text summarization problem. This limitation hinders the generation of high-quality captions that fully capture the necessary details. Moreover, existing data sourced from arXiv papers contain low-quality captions, posing significant challenges for training large language models (LLMs). In this paper, we introduce a framework called Multi-LLM Collaborative Figure Caption Generation (MLBCAP) to address these challenges by leveraging specialized LLMs for distinct sub-tasks. Our approach unfolds in three key modules: (Quality Assessment) We utilize multimodal LLMs to assess the quality of training data, enabling the filtration of low-quality captions. (Diverse Caption Generation) We then employ a strategy of fine-tuning/prompting multiple LLMs on the captioning task to generate candidate captions. (Judgment) Lastly, we prompt a prominent LLM to select the highest quality caption from the candidates, followed by refining any remaining inaccuracies. Human evaluations demonstrate that informative captions produced by our approach rank better than human-written captions, highlighting its effectiveness. Our code is available at https://github.com/teamreboott/MLBCAP

CVNov 13, 2025
STELLAR: Scene Text Editor for Low-Resource Languages and Real-World Data

Yongdeuk Seo, Hyun-seok Min, Sungchul Choi

Scene Text Editing (STE) is the task of modifying text content in an image while preserving its visual style, such as font, color, and background. While recent diffusion-based approaches have shown improvements in visual quality, key limitations remain: lack of support for low-resource languages, domain gap between synthetic and real data, and the absence of appropriate metrics for evaluating text style preservation. To address these challenges, we propose STELLAR (Scene Text Editor for Low-resource LAnguages and Real-world data). STELLAR enables reliable multilingual editing through a language-adaptive glyph encoder and a multi-stage training strategy that first pre-trains on synthetic data and then fine-tunes on real images. We also construct a new dataset, STIPLAR(Scene Text Image Pairs of Low-resource lAnguages and Real-world data), for training and evaluation. Furthermore, we propose Text Appearance Similarity (TAS), a novel metric that assesses style preservation by independently measuring font, color, and background similarity, enabling robust evaluation even without ground truth. Experimental results demonstrate that STELLAR outperforms state-of-the-art models in visual consistency and recognition accuracy, achieving an average TAS improvement of 2.2% across languages over the baselines.

IROct 21, 2020
Deep learning-based citation recommendation system for patents

Jaewoong Choi, Sion Jang, Jaeyoung Kim et al.

In this study, we address the challenges in developing a deep learning-based automatic patent citation recommendation system. Although deep learning-based recommendation systems have exhibited outstanding performance in various domains (such as movies, products, and paper citations), their validity in patent citations has not been investigated, owing to the lack of a freely available high-quality dataset and relevant benchmark model. To solve these problems, we present a novel dataset called PatentNet that includes textual information and metadata for approximately 110,000 patents from the Google Big Query service. Further, we propose strong benchmark models considering the similarity of textual information and metadata (such as cooperative patent classification code). Compared with existing recommendation methods, the proposed benchmark method achieved a mean reciprocal rank of 0.2377 on the test set, whereas the existing state-of-the-art recommendation method achieved 0.2073.

LGSep 29, 2020
Machine-Learning Approach to Analyze the Status of Forklift Vehicles with Irregular Movement in a Shipyard

Hyeonju Lee, Jongho Lee, Minji An et al.

In large shipyards, the management of equipment, which are used for building a variety of ships, is critical. Because orders vary year to year, shipyard managers are required to determine methods to make the most of their limited resources. A particular difficulty that arises because of the nature and size of shipyards is the management of moving vehicles. In recent years, shipbuilding companies have attempted to manage and track the locations and movements of vehicles using Global Positioning System (GPS) modules. However, because certain vehicles, such as forklifts, roam irregularly around a yard, identifying their working status without being onsite is difficult. Location information alone is not sufficient to determine whether a vehicle is working, moving, waiting, or resting. This study proposes an approach based on machine learning to identify the work status of each forklift. We use the DBSCAN and k-means algorithms to identify the area in which a particular forklift is operating and the type of work it is performing. We developed a business intelligence system to collect information from forklifts equipped with GPS and Internet of Things (IoT) devices. The system provides visual information on the status of individual forklifts and helps in the efficient management of their movements within large shipyards.

CLMar 15, 2019
A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks

Chanwoo Jeong, Sion Jang, Hyuna Shin et al.

With the tremendous growth in the number of scientific papers being published, searching for references while writing a scientific paper is a time-consuming process. A technique that could add a reference citation at the appropriate place in a sentence will be beneficial. In this perspective, context-aware citation recommendation has been researched upon for around two decades. Many researchers have utilized the text data called the context sentence, which surrounds the citation tag, and the metadata of the target paper to find the appropriate cited research. However, the lack of well-organized benchmarking datasets and no model that can attain high performance has made the research difficult. In this paper, we propose a deep learning based model and well-organized dataset for context-aware paper citation recommendation. Our model comprises a document encoder and a context encoder, which uses Graph Convolutional Networks (GCN) layer and Bidirectional Encoder Representations from Transformers (BERT), which is a pre-trained model of textual data. By modifying the related PeerRead dataset, we propose a new dataset called FullTextPeerRead containing context sentences to cited references and paper metadata. To the best of our knowledge, This dataset is the first well-organized dataset for context-aware paper recommendation. The results indicate that the proposed model with the proposed datasets can attain state-of-the-art performance and achieve a more than 28% improvement in mean average precision (MAP) and recall@k.

CLMar 14, 2019
Deep Patent Landscaping Model Using Transformer and Graph Embedding

Seokkyu Choi, Hyeonju Lee, Eunjeong Lucy Park et al.

Patent landscaping is a method used for searching related patents during a research and development (R&D) project. To avoid the risk of patent infringement and to follow current trends in technology, patent landscaping is a crucial task required during the early stages of an R&D project. As the process of patent landscaping requires advanced resources and can be tedious, the demand for automated patent landscaping has been gradually increasing. However, a shortage of well-defined benchmark datasets and comparable models makes it difficult to find related research studies. In this paper, we propose an automated patent landscaping model based on deep learning. To analyze the text of patents, the proposed model uses a modified transformer structure. To analyze the metadata of patents, we propose a graph embedding method that uses a diffusion graph called Diff2Vec. Furthermore, we introduce four benchmark datasets for comparing related research studies in patent landscaping. The datasets are produced by querying Google BigQuery, based on a search formula from a Korean patent attorney. The obtained results indicate that the proposed model and datasets can attain state-of-the-art performance, as compared with current patent landscaping models.

LGJan 28, 2019
Hybrid Machine Learning Approach to Popularity Prediction of Newly Released Contents for Online Video Streaming Service

Hongjun Jeon, Wonchul Seo, Eunjeong Lucy Park et al.

In the industry of video content providers such as VOD and IPTV, predicting the popularity of video contents in advance is critical not only from a marketing perspective but also from a network optimization perspective. By predicting whether the content will be successful or not in advance, the content file, which is large, is efficiently deployed in the proper service providing server, leading to network cost optimization. Many previous studies have done view count prediction research to do this. However, the studies have been making predictions based on historical view count data from users. In this case, the contents had been published to the users and already deployed on a service server. These approaches make possible to efficiently deploy a content already published but are impossible to use for a content that is not be published. To address the problems, this research proposes a hybrid machine learning approach to the classification model for the popularity prediction of newly video contents which is not published. In this paper, we create a new variable based on the related content of the specific content and divide entire dataset by the characteristics of the contents. Next, the prediction is performed using XGBoosting and deep neural net based model according to the data characteristics of the cluster. Our model uses metadata for contents for prediction, so we use categorical embedding techniques to solve the sparsity of categorical variables and make them learn efficiently for the deep neural net model. As well, we use the FTRL-proximal algorithm to solve the problem of the view-count volatility of video content. We achieve overall better performance than the previous standalone method with a dataset from one of the top streaming service company.

CLAug 12, 2018
Text Classification using Capsules

Jaeyoung Kim, Sion Jang, Sungchul Choi et al.

This paper presents an empirical exploration of the use of capsule networks for text classification. While it has been shown that capsule networks are effective for image classification, their validity in the domain of text has not been explored. In this paper, we show that capsule networks indeed have the potential for text classification and that they have several advantages over convolutional neural networks. We further suggest a simple routing method that effectively reduces the computational complexity of dynamic routing. We utilized seven benchmark datasets to demonstrate that capsule networks, along with the proposed routing method provide comparable results.