4.3CVMay 26
Hallucination Behavior in Multimodal LLMs Across Agricultural Image Interpretation and Generation TasksPartho Ghose, Al Bashir, Prem Raj et al.
Large Language Models (LLMs) are being rapidly adopted in agricultural imaging applications, ranging from crop interpretation to synthetic field image generation. However, these models frequently exhibit hallucinations outputs that appear confident yet deviate from biological or environmental reality potentially leading to misinformed agronomic insights. This study investigates such hallucinations in two complementary directions: image-to-text, where LLMs interpret crop or field imagery to describe conditions such as biotic and abiotic stresses, and text-to-image, where models generate synthetic agricultural scenes based on descriptive prompts. We examine errors involving biological inconsistency, contextual inaccuracy, and agronomic implausibility, evaluating the outputs under domain-informed criteria across multiple imaging modalities. Our analysis identifies recurring hallucination patterns within both interpretive and generative tasks. In image interpretation, LLMs (e.g., Gemma, LLAVA, Qwen, and MiniCPM) achieved modest zero-shot accuracy (63 to 75 percent), whereas few-shot prompting improved performance up to 86.8 percent, exhibiting false detections and missed infections, indicating residual hallucination effects. In text-to-image tasks, advanced models such as GPT-5 and Gemini 2.5 Flash generate up to 91 percent biologically inconsistent scenes under relaxed prompt constraints, revealing fundamental weaknesses in current LLMs. This systematic assessment of visual reasoning and generation offers critical insights toward enhancing the reliability and trustworthiness of LLM-based agricultural imaging platforms.
5.7ROMay 22
Robotic Strawberry Harvesting with Robust Vision and Deep Reinforcement Learning based Sim-to-Real ControlAl Bashir, Shao-Yang Chang, Partho Ghose et al.
This study presents a closed-loop robotic strawberry harvesting system that combines a robust vision module, simulation-trained deep reinforcement learning (DRL) control, and ROS-based realrobot execution. For perception, we propose HRAttnEdge-YOLO26-seg, a modified YOLO26-seg architecture that incorporates a high-resolution P2 branch, segmentation-path attention, and edgesupervised prototype learning to improve instance segmentation in cluttered scenes. For control, we train a target-conditioned Proximal Policy Optimization (PPO) policy in Isaac Lab to produce smooth joint-position commands for a UR10e manipulator and deploy it on a UR10e robot for targetfruit reaching and harvesting. This simulation-based approach reduces hardware dependency, lowers development cost, and allows scalable policy training without exhaustive physical trials before real deployment. The proposed vision model demonstrated the highest overall performance among the evaluated methods. On both self-collected and public datasets, the model showed a 10 to 14% improvement in segmentation performance. In controlled in-house tests, the PPO controller produced stable and dynamically smoother motion than a inverse kinematics (IK)-based MoveIt baseline. In greenhouse trials, the proposed integrated system harvested 281 strawberries, achieving 96.6% reaching success, 91.3% grasp-and-pull success, and 84.3% overall harvesting success. These results illustrate that task-specific perception combined with simulation-trained PPO can serve as a practical and resource-efficient alternative to conventional planner-dependent reaching in manipulation, enabling reliable closed-loop robotic harvesting in complex agricultural environments.
29.5IRMay 11
An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food RecommendationsYibin Wang, Yanjie Yang, Grace Melo Guerrero et al.
Diet quality is a leading determinant of chronic disease risk. Advances in artificial intelligence (AI) have enabled food recommendation systems to adapt suggestions to user preferences and health goals. However, most current systems rely on loosely curated food databases and provide limited connection to a validated index. In this study, we propose a Healthy Eating Index (HEI) informed retrieval-augmented generation (RAG) framework that combines standardized nutrition databases with large language models (LLMs) for personalized food recommendations. Our proposed method anchors retrieval in the National Health and Nutrition Examination Survey (NHANES) and the Food Patterns Equivalents Database (FPED). A food-level embedding space is constructed from FPED-derived textual descriptions. For each entity, the system computes baseline HEI scores, retrieves candidate foods for intake recommendations, and estimates the HEI impact of simple substitutions or additions. A constrained RAG pipeline instantiated with a pretrained OpenAI LLM generates personalized recommendations and sources based on nutrient profiles and HEI contributions. The simulation results showed a mean HEI improvement of 6.45, with the proportion of users HEI over 50 increasing from 45.12 to 61.26. Quantile analysis revealed consistent improved shifts across the HEI distribution. Our findings suggest that the proposed LLM-RAG-based AI systems can support more precise, explainable, and personalized nutrition guidance to improve diet quality.
62.5GNMay 7
A Linear-Transformer Hybrid for SNP-Based Genotype-to-Phenotype Prediction in GrapevineYibin Wang, Murukarthick Jayakodi, Silvas Kirubakaran et al.
Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Transformer approach, LiT-G2P (Linear-Transformer Genotype-to-Phenotype), an automated predictive framework that integrates additive genetic variance effects with Transformer-based nonlinear interactions using genome-wide single-nucleotide polymorphisms (SNPs) data. We evaluated LiT-G2P on a panel of diverse grape accessions, genotyped with SNP markers and measured for phenotypes across two consecutive years. Target phenotypic traits include leaf hair density and trichome density of grapevines. Across both single-year and cross-year testing scenarios, LiT-G2P consistently improves prediction performance compared with baseline models. For hair density, LiT-G2P achieves the lowest error in both single-year and cross-year evaluations, with RMSEs of 0.469 and 0.454, respectively, while maintaining strong tolerance accuracies of 79.2% and 74.6%, respectively. For trichome density, LiT-G2P also presents the best overall G2P performance. In addition, we extract model-prioritized SNPs from attention weights and apply genotype-stratified analysis to provide interpretable candidate marker for downstream validation. These results demonstrate that integrating stable additive effects with learned interaction patterns can enhance cross-year robustness and support practical SNP-based predictive modeling for genomic selection.