CLFeb 6, 2025

Quantification of Biodiversity from Historical Survey Text with LLM-based Best-Worst Scaling

Thomas Haider, Tobias Perschl, Malte Rehbein

arXiv:2502.04022v14.91 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the challenge of automated biodiversity quantification from text for ecologists and historians, but it is incremental as it applies existing LLM and scaling methods to a new domain.

The study tackled the problem of estimating species frequency from historical survey text by framing it as a regression task using Best-Worst Scaling with LLMs, finding that DeepSeek-V3 and GPT-4 achieved reasonable agreement with humans and each other, making the approach more cost-effective and robust compared to multi-class methods.

In this study, we evaluate methods to determine the frequency of species via quantity estimation from historical survey text. To that end, we formulate classification tasks and finally show that this problem can be adequately framed as a regression task using Best-Worst Scaling (BWS) with Large Language Models (LLMs). We test Ministral-8B, DeepSeek-V3, and GPT-4, finding that the latter two have reasonable agreement with humans and each other. We conclude that this approach is more cost-effective and similarly robust compared to a fine-grained multi-class approach, allowing automated quantity estimation across species.

View on arXiv PDF

Similar