CLAICYDBMay 21

A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works

arXiv:2605.2310313.2
Predicted impact top 87% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

For scholars of late-Ming and early-Qing Chinese history, this tool automates a previously manual classification task, enabling large-scale extraction of personal letters from collected works.

The paper presents Lepton, a fine-tuned BERT classifier that distinguishes personal letters from prefaces in Classical Chinese wenji titles, achieving deployment on Hugging Face and identifying ~55,000 letters for the Ming Letter Platform.

I present Lepton (Letter Prediction), a fine-tuned BERT classifier that predicts whether a title in a Classical Chinese wenji table of contents is a personal letter or a closely confusable preface (particularly the farewell-preface). Lepton fine-tunes bert-base-chinese on 5438 hand-labeled wenji titles from thirty-three late-Ming and early-Qing literati. I've deployed the model on Hugging Face and has been used at the China Biographical Database (CBDB) to identify approximately fifty-five thousand letters across mid-Ming through early-Qing wenji, populating the Ming Letter Platform.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes