Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data
This addresses the challenge of pest identification for farmers by automating data extraction from limited sources, though it is incremental as it builds on existing retrieval and LLM methods.
The paper tackled the problem of extracting structured agricultural information from unstructured documents by using embedding-based retrieval and LLM question-answering, achieving consistently better accuracy in benchmarks while maintaining efficiency.
Pest identification is a crucial aspect of pest control in agriculture. However, most farmers are not capable of accurately identifying pests in the field, and there is a limited number of structured data sources available for rapid querying. In this work, we explored using domain-agnostic general pre-trained large language model(LLM) to extract structured data from agricultural documents with minimal or no human intervention. We propose a methodology that involves text retrieval and filtering using embedding-based retrieval, followed by LLM question-answering to automatically extract entities and attributes from the documents, and transform them into structured data. In comparison to existing methods, our approach achieves consistently better accuracy in the benchmark while maintaining efficiency.