CLApr 16

Domain Fine-Tuning FinBERT on Finnish Histopathological Reports: Train-Time Signals and Downstream Correlations

Rami Luisto, Liisa Petäinen, Tommi Grönholm, Jan Böhm, Maarit Ahtiainen, Tomi Lilja, Ilkka Pölönen, Sami Äyrämö

arXiv:2604.148156.6h-index: 18

AI Analysis

For NLP practitioners working with low-resource medical text in Finnish, this provides observations on domain fine-tuning dynamics, but the results are preliminary and lack quantitative validation.

The authors fine-tuned Finnish BERT on Finnish histopathological reports and attempted to predict downstream task performance from embedding geometry changes during fine-tuning, but no concrete performance numbers are reported.

In NLP classification tasks where little labeled data exists, domain fine-tuning of transformer models on unlabeled data is an established approach. In this paper we have two aims. (1) We describe our observations from fine-tuning the Finnish BERT model on Finnish medical text data. (2) We report on our attempts to predict the benefit of domain-specific pre-training of Finnish BERT from observing the geometry of embedding changes due to domain fine-tuning. Our driving motivation is the common\situation in healthcare AI where we might experience long delays in acquiring datasets, especially with respect to labels.

View on arXiv PDF

Similar