CLFeb 6, 2025

Quantification of Biodiversity from Historical Survey Text with LLM-based Best-Worst Scaling

arXiv:2502.04022v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the challenge of automated biodiversity quantification from text for ecologists and historians, but it is incremental as it applies existing LLM and scaling methods to a new domain.

The study tackled the problem of estimating species frequency from historical survey text by framing it as a regression task using Best-Worst Scaling with LLMs, finding that DeepSeek-V3 and GPT-4 achieved reasonable agreement with humans and each other, making the approach more cost-effective and robust compared to multi-class methods.

In this study, we evaluate methods to determine the frequency of species via quantity estimation from historical survey text. To that end, we formulate classification tasks and finally show that this problem can be adequately framed as a regression task using Best-Worst Scaling (BWS) with Large Language Models (LLMs). We test Ministral-8B, DeepSeek-V3, and GPT-4, finding that the latter two have reasonable agreement with humans and each other. We conclude that this approach is more cost-effective and similarly robust compared to a fine-grained multi-class approach, allowing automated quantity estimation across species.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes