Evaluation of Embedding-Based and Generative Methods for LLM-Driven Document Classification: Opportunities and Challenges

arXiv:2604.0499766.1h-index: 2

Predicted impact top 44% in IR · last 90 daysOriginality Incremental advance

AI Analysis

This addresses document classification for geoscience applications, but it is incremental as it focuses on comparative analysis of existing methods.

This work compared embedding-based and generative models for classifying geoscience documents, finding that generative Vision-Language Models with Chain-of-Thought prompting achieved 82% zero-shot accuracy, outperforming state-of-the-art multimodal embedding models at 63%.

This work presents a comparative analysis of embedding-based and generative models for classifying geoscience technical documents. Using a multi-disciplinary benchmark dataset, we evaluated the trade-offs between model accuracy, stability, and computational cost. We find that generative Vision-Language Models (VLMs) like Qwen2.5-VL, enhanced with Chain-of-Thought (CoT) prompting, achieve superior zero-shot accuracy (82%) compared to state-of-the-art multimodal embedding models like QQMM (63%). We also demonstrate that while supervised fine-tuning (SFT) can improve VLM performance, it is sensitive to training data imbalance.

View on arXiv PDF

Similar