CLJan 27, 2025

LCTG Bench: LLM Controlled Text Generation Benchmark

arXiv:2501.15875v12 citationsh-index: 18
Originality Synthesis-oriented
AI Analysis

It addresses a domain-specific problem for users needing controlled text generation in Japanese, but is incremental as it extends existing benchmark concepts to a new language.

This research tackled the lack of benchmarks for evaluating LLM controllability in low-resource languages like Japanese by introducing LCTG Bench, the first Japanese benchmark, and found a significant gap between multilingual and Japanese-specific models in controllability.

The rise of large language models (LLMs) has led to more diverse and higher-quality machine-generated text. However, their high expressive power makes it difficult to control outputs based on specific business instructions. In response, benchmarks focusing on the controllability of LLMs have been developed, but several issues remain: (1) They primarily cover major languages like English and Chinese, neglecting low-resource languages like Japanese; (2) Current benchmarks employ task-specific evaluation metrics, lacking a unified framework for selecting models based on controllability across different use cases. To address these challenges, this research introduces LCTG Bench, the first Japanese benchmark for evaluating the controllability of LLMs. LCTG Bench provides a unified framework for assessing control performance, enabling users to select the most suitable model for their use cases based on controllability. By evaluating nine diverse Japanese-specific and multilingual LLMs like GPT-4, we highlight the current state and challenges of controllability in Japanese LLMs and reveal the significant gap between multilingual models and Japanese-specific models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes