A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works
For scholars of late-Ming and early-Qing Chinese history, this tool automates a previously manual classification task, enabling large-scale extraction of personal letters from collected works.
The paper presents Lepton, a fine-tuned BERT classifier that distinguishes personal letters from prefaces in Classical Chinese wenji titles, achieving deployment on Hugging Face and identifying ~55,000 letters for the Ming Letter Platform.
I present Lepton (Letter Prediction), a fine-tuned BERT classifier that predicts whether a title in a Classical Chinese wenji table of contents is a personal letter or a closely confusable preface (particularly the farewell-preface). Lepton fine-tunes bert-base-chinese on 5438 hand-labeled wenji titles from thirty-three late-Ming and early-Qing literati. I've deployed the model on Hugging Face and has been used at the China Biographical Database (CBDB) to identify approximately fifty-five thousand letters across mid-Ming through early-Qing wenji, populating the Ming Letter Platform.