CL AI LGDec 3, 2024

Scaling BERT Models for Turkish Automatic Punctuation and Capitalization Correction

Abdulkader Saoud, Mahmut Alomeyr, Himmet Toprak Kesgin, Mehmet Fatih Amasyali

arXiv:2412.02698v11.0h-index: 162024 Innovations in Intelligent Systems and Applications Conference (ASYU)

Originality Synthesis-oriented

AI Analysis

This addresses text quality enhancement for Turkish language users, but it is incremental as it applies an existing method to a new domain.

This paper tackled automated punctuation and capitalization correction in Turkish texts by scaling BERT models across five sizes, finding that larger models like Base achieved the highest correction precision and improved text readability and accuracy.

This paper investigates the effectiveness of BERT based models for automated punctuation and capitalization corrections in Turkish texts across five distinct model sizes. The models are designated as Tiny, Mini, Small, Medium, and Base. The design and capabilities of each model are tailored to address the specific challenges of the Turkish language, with a focus on optimizing performance while minimizing computational overhead. The study presents a systematic comparison of the performance metrics precision, recall, and F1 score of each model, offering insights into their applicability in diverse operational contexts. The results demonstrate a significant improvement in text readability and accuracy as model size increases, with the Base model achieving the highest correction precision. This research provides a comprehensive guide for selecting the appropriate model size based on specific user needs and computational resources, establishing a framework for deploying these models in real-world applications to enhance the quality of written Turkish.

View on arXiv PDF

Similar