CVDec 31, 2025Code
CPJ: Explainable Agricultural Pest Diagnosis via Caption-Prompt-Judge with LLM-Judged RefinementWentao Zhang, Tao Fang, Lina Lu et al.
Accurate and interpretable crop disease diagnosis is essential for agricultural decision-making, yet existing methods often rely on costly supervised fine-tuning and perform poorly under domain shifts. We propose Caption--Prompt--Judge (CPJ), a training-free few-shot framework that enhances Agri-Pest VQA through structured, interpretable image captions. CPJ employs large vision-language models to generate multi-angle captions, refined iteratively via an LLM-as-Judge module, which then inform a dual-answer VQA process for both recognition and management responses. Evaluated on CDDMBench, CPJ significantly improves performance: using GPT-5-mini captions, GPT-5-Nano achieves \textbf{+22.7} pp in disease classification and \textbf{+19.5} points in QA score over no-caption baselines. The framework provides transparent, evidence-based reasoning, advancing robust and explainable agricultural diagnosis without fine-tuning. Our code and data are publicly available at: https://github.com/CPJ-Agricultural/CPJ-Agricultural-Diagnosis.
LGMay 13
CoRe-Gen: Robust Spectrum-to-Structure Generation under Imperfect Fingerprint ConditionsTianbo Liu, Chixiang Lu, Jing Hao et al.
Molecular structure elucidation from tandem mass spectra (MS/MS) remains challenging, particularly for de novo generation beyond database coverage. A common approach decomposes the task into spectrum-to-fingerprint prediction followed by fingerprint-to-structure decoding, enabling the use of large-scale molecular corpora. However, at deployment, the decoder relies on predicted rather than oracle fingerprints, introducing structured errors that propagate into generation. This results in a fundamental condition mismatch, where models trained on clean inputs must operate under noisy, biased predictions, especially for long-tail substructures. We present CoRe-Gen that explicitly addresses this gap. CoRe-Gen improves the intermediate condition via synthetic-spectrum pretraining of the encoder, matches deployment-time noise through frequency-aware fingerprint corruption during decoder training, and mitigates residual errors using structure-aware autoregressive decoding with compositional SELFIES representations, auxiliary structural supervision, and lightweight chemical constraints. Experiments on standard benchmarks show that CoRe-Gen establishes a new state of the art on NPLIB1, achieving 19.54\% Top-1 and 29.92\% Top-10 exact-match accuracy, while remaining competitive on the more challenging MassSpecGym benchmark. Importantly, CoRe-Gen preserves the efficiency advantages of autoregressive decoding, providing a practical and scalable solution for robust spectrum-to-structure generation under realistic conditions.
CVJan 8
Agri-R1: Empowering Generalizable Agricultural Reasoning in Vision-Language Models with Reinforcement LearningWentao Zhang, Lifei Wang, Lina Lu et al.
Agricultural disease diagnosis challenges VLMs, as conventional fine-tuning requires extensive labels, lacks interpretability, and generalizes poorly. While reasoning improves model robustness, existing methods rely on costly expert annotations and rarely address the open-ended, diverse nature of agricultural queries. To address these limitations, we propose \textbf{Agri-R1}, a reasoning-enhanced large model for agriculture. Our framework automates high-quality reasoning data generation via vision-language synthesis and LLM-based filtering, using only 19\% of available samples. Training employs Group Relative Policy Optimization (GRPO) with a novel proposed reward function that integrates domain-specific lexicons and fuzzy matching to assess both correctness and linguistic flexibility in open-ended responses. Evaluated on CDDMBench, our resulting 3B-parameter model achieves performance competitive with 7B- to 13B-parameter baselines, showing a +23.2\% relative gain in disease recognition accuracy, +33.3\% in agricultural knowledge QA, and a +26.10-point improvement in cross-domain generalization over standard fine-tuning. Ablation studies confirm that the synergy between structured reasoning data and GRPO-driven exploration underpins these gains, with benefits scaling as question complexity increases.
HCOct 7, 2025
Evaluating Node-tree Interfaces for AI ExplainabilityLifei Wang, Natalie Friedman, Chengchao Zhu et al.
As large language models (LLMs) become ubiquitous in workplace tools and decision-making processes, ensuring explainability and fostering user trust are critical. Although advancements in LLM engineering continue, human-centered design is still catching up, particularly when it comes to embedding transparency and trust into AI interfaces. This study evaluates user experiences with two distinct AI interfaces - node-tree interfaces and chatbot interfaces - to assess their performance in exploratory, follow-up inquiry, decision-making, and problem-solving tasks. Our design-driven approach introduces a node-tree interface that visually structures AI-generated responses into hierarchically organized, interactive nodes, allowing users to navigate, refine, and follow up on complex information. In a comparative study with n=20 business users, we observed that while the chatbot interface effectively supports linear, step-by-step queries, it is the node-tree interface that enhances brainstorming. Quantitative and qualitative findings indicate that node-tree interfaces not only improve task performance and decision-making support but also promote higher levels of user trust by preserving context. Our findings suggest that adaptive AI interfaces capable of switching between structured visualizations and conversational formats based on task requirements can significantly enhance transparency and user confidence in AI-powered systems. This work contributes actionable insights to the fields of human-robot interaction and AI design, particularly for enterprise applications where trust-building is critical for teams.
IVDec 23, 2021
KFWC: A Knowledge-Driven Deep Learning Model for Fine-grained Classification of Wet-AMDHaihong E, Jiawen He, Tianyi Hu et al.
Automated diagnosis using deep neural networks can help ophthalmologists detect the blinding eye disease wet Age-related Macular Degeneration (AMD). Wet-AMD has two similar subtypes, Neovascular AMD and Polypoidal Choroidal Vessels (PCV). However, due to the difficulty in data collection and the similarity between images, most studies have only achieved the coarse-grained classification of wet-AMD rather than a finer-grained one of wet-AMD subtypes. To solve this issue, in this paper we propose a Knowledge-driven Fine-grained Wet-AMD Classification Model (KFWC), to classify fine-grained diseases with insufficient data. With the introduction of a priori knowledge of 10 lesion signs of input images into the KFWC, we aim to accelerate the KFWC by means of multi-label classification pre-training, to locate the decisive image features in the fine-grained disease classification task and therefore achieve better classification. Simultaneously, the KFWC can also provide good interpretability and effectively alleviate the pressure of data collection and annotation in the field of fine-grained disease classification for wet-AMD. The experiments demonstrate the effectiveness of the KFWC which reaches 99.71% in AU-ROC scores, and its considerable improvements over the data-driven w/o Knowledge and ophthalmologists, with the rates of 6.69% over the strongest baseline and 4.14% over ophthalmologists.