CVApr 10, 2023Code
Agronav: Autonomous Navigation Framework for Agricultural Robots and Vehicles using Semantic Segmentation and Semantic Line DetectionShivam K Panda, Yongkyu Lee, M. Khalid Jawed
The successful implementation of vision-based navigation in agricultural fields hinges upon two critical components: 1) the accurate identification of key components within the scene, and 2) the identification of lanes through the detection of boundary lines that separate the crops from the traversable ground. We propose Agronav, an end-to-end vision-based autonomous navigation framework, which outputs the centerline from the input image by sequentially processing it through semantic segmentation and semantic line detection models. We also present Agroscapes, a pixel-level annotated dataset collected across six different crops, captured from varying heights and angles. This ensures that the framework trained on Agroscapes is generalizable across both ground and aerial robotic platforms. Codes, models and dataset will be released at \href{https://github.com/shivamkumarpanda/agronav}{github.com/shivamkumarpanda/agronav}.
LGJan 3, 2023
Metalearning generalizable dynamics from trajectoriesQiaofeng Li, Tianyi Wang, Vwani Roychowdhury et al.
We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.
CVDec 15, 2025Code
DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward PassVivek Alumootil, Tuan-Anh Vu, M. Khalid Jawed
Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, require known camera poses, or assume a temporal ordering to input frames, constraining their flexibility and applicability. Additionally, recent advances have successfully enabled efficient 3D reconstruction from large-scale, unposed image collections, underscoring opportunities for unified approaches to dynamic scene understanding. Motivated by this, we propose DePT3R, a novel framework that simultaneously performs dense point tracking and 3D reconstruction of dynamic scenes from multiple images in a single forward pass. This multi-task learning is achieved by extracting deep spatio-temporal features with a powerful backbone and regressing pixel-wise maps with dense prediction heads. Crucially, DePT3R operates without requiring camera poses, substantially enhancing its adaptability and efficiency-especially important in dynamic environments with rapid changes. We validate DePT3R on several challenging benchmarks involving dynamic scenes, demonstrating strong performance and significant improvements in memory efficiency over existing state-of-the-art methods. Data and codes are available via the open repository: https://github.com/StructuresComp/DePT3R
82.0ROMar 14
Your Vision-Language-Action Model Already Has Attention Heads For Path Deviation DetectionJaehwan Jeong, Evelyn Zhu, Jinying Lin et al.
Vision-Language-Action (VLA) models have demonstrated strong potential for predicting semantic actions in navigation tasks, demonstrating the ability to reason over complex linguistic instructions and visual contexts. However, they are fundamentally hindered by visual-reasoning hallucinations that lead to trajectory deviations. Addressing this issue has conventionally required training external critic modules or relying on complex uncertainty heuristics. In this work, we discover that monitoring a few attention heads within a frozen VLA model can accurately detect path deviations without incurring additional computational overhead. We refer to these heads, which inherently capture the spatiotemporal causality between historical visual sequences and linguistic instructions, as Navigation Heads. Using these heads, we propose an intuitive, training-free anomaly-detection framework that monitors their signals to detect hallucinations in real time. Surprisingly, among over a thousand attention heads, a combination of just three is sufficient to achieve a 44.6 % deviation detection rate with a low false-positive rate of 11.7 %. Furthermore, upon detecting a deviation, we bypass the heavy VLA model and trigger a lightweight Reinforcement Learning (RL) policy to safely execute a shortest-path rollback. By integrating this entire detection-to-recovery pipeline onto a physical robot, we demonstrate its practical robustness. All source code will be publicly available.
CVAug 20, 2025Code
Reconstruction Using the Invisible: Intuition from NIR and Metadata for Enhanced 3D Gaussian SplattingGyusam Chang, Tuan-Anh Vu, Vivek Alumootil et al.
While 3D Gaussian Splatting (3DGS) has rapidly advanced, its application in agriculture remains underexplored. Agricultural scenes present unique challenges for 3D reconstruction methods, particularly due to uneven illumination, occlusions, and a limited field of view. To address these limitations, we introduce \textbf{NIRPlant}, a novel multimodal dataset encompassing Near-Infrared (NIR) imagery, RGB imagery, textual metadata, Depth, and LiDAR data collected under varied indoor and outdoor lighting conditions. By integrating NIR data, our approach enhances robustness and provides crucial botanical insights that extend beyond the visible spectrum. Additionally, we leverage text-based metadata derived from vegetation indices, such as NDVI, NDWI, and the chlorophyll index, which significantly enriches the contextual understanding of complex agricultural environments. To fully exploit these modalities, we propose \textbf{NIRSplat}, an effective multimodal Gaussian splatting architecture employing a cross-attention mechanism combined with 3D point-based positional encoding, providing robust geometric priors. Comprehensive experiments demonstrate that \textbf{NIRSplat} outperforms existing landmark methods, including 3DGS, CoR-GS, and InstantSplat, highlighting its effectiveness in challenging agricultural scenarios. The code and dataset are publicly available at: https://github.com/StructuresComp/3D-Reconstruction-NIR
ROAug 26, 2025Code
AgriChrono: A Multi-modal Dataset Capturing Crop Growth and Lighting Variability with a Field RobotJaehwan Jeong, Tuan-Anh Vu, Mohammad Jony et al.
Existing datasets for precision agriculture have primarily been collected in static or controlled environments such as indoor labs or greenhouses, often with limited sensor diversity and restricted temporal span. These conditions fail to reflect the dynamic nature of real farmland, including illumination changes, crop growth variation, and natural disturbances. As a result, models trained on such data often lack robustness and generalization when applied to real-world field scenarios. In this paper, we present AgriChrono, a novel robotic data collection platform and multi-modal dataset designed to capture the dynamic conditions of real-world agricultural environments. Our platform integrates multiple sensors and enables remote, time-synchronized acquisition of RGB, Depth, LiDAR, and IMU data, supporting efficient and repeatable long-term data collection across varying illumination and crop growth stages. We benchmark a range of state-of-the-art 3D reconstruction models on the AgriChrono dataset, highlighting the difficulty of reconstruction in real-world field environments and demonstrating its value as a research asset for advancing model generalization under dynamic conditions. The code and dataset are publicly available at: https://github.com/StructuresComp/agri-chrono
CVDec 28, 2021Code
Deep-CNN based Robotic Multi-Class Under-Canopy Weed Control in Precision FarmingYayun Du, Guofeng Zhang, Darren Tsang et al.
Smart weeding systems to perform plant-specific operations can contribute to the sustainability of agriculture and the environment. Despite monumental advances in autonomous robotic technologies for precision weed management in recent years, work on under-canopy weeding in fields is yet to be realized. A prerequisite of such systems is reliable detection and classification of weeds to avoid mistakenly spraying and, thus, damaging the surrounding plants. Real-time multi-class weed identification enables species-specific treatment of weeds and significantly reduces the amount of herbicide use. Here, our first contribution is the first adequately large realistic image dataset \textit{AIWeeds} (one/multiple kinds of weeds in one image), a library of about 10,000 annotated images of flax, and the 14 most common weeds in fields and gardens taken from 20 different locations in North Dakota, California, and Central China. Second, we provide a full pipeline from model training with maximum efficiency to deploying the TensorRT-optimized model onto a single board computer. Based on \textit{AIWeeds} and the pipeline, we present a baseline for classification performance using five benchmark CNN models. Among them, MobileNetV2, with both the shortest inference time and lowest memory consumption, is the qualified candidate for real-time applications. Finally, we deploy MobileNetV2 onto our own compact autonomous robot \textit{SAMBot} for real-time weed detection. The 90\% test accuracy realized in previously unseen scenes in flax fields (with a row spacing of 0.2-0.3 m), with crops and weeds, distortion, blur, and shadows, is a milestone towards precision weed control in the real world. We have publicly released the dataset and code to generate the results at \url{https://github.com/StructuresComp/Multi-class-Weed-Classification}.
40.7ROMay 5
Neural Control: Adjoint Learning Through Equilibrium ConstraintsDezhong Tong, Jiawen Wang, Hengyi Zhou et al.
Many physical AI tasks are governed by implicit equilibrium: an agent actuates a subset of degrees of freedom (boundary DoFs), while the remaining free DoFs settle by minimizing a total potential energy. Even seemingly basic tasks such as bending a deformable linear object (DLO) to a target shape can exhibit strongly nonlinear behavior due to multi-stability: the same boundary conditions may yield multiple equilibrium shapes depending on the actuation trajectory. However, learning and control in such systems is brittle because the actuation-to-configuration map is defined only implicitly, and naive backpropagation through iterative equilibrium solvers is memory- and compute-intensive. We propose Neural Control, a boundary-control framework that computes trajectory-dependent, memory-efficient proxy gradients by differentiating equilibrium conditions via an adjoint formulation, avoiding unrolling of solver iterations. To improve robustness over long horizons, we integrate these sensitivities into a receding-horizon MPC scheme that repeatedly re-anchors optimization to realized equilibria and mitigates basin-switching in multi-stable regimes. We evaluate Neural Control in simulation and on physical robots manipulating DLOs, and show improved performance over gradient-free baselines such as SPSA and CEM.
CVAug 28, 2025
HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object DetectionHarris Song, Tuan-Anh Vu, Sanjith Menon et al.
Detecting hidden or partially concealed objects remains a fundamental challenge in multimodal environments, where factors like occlusion, camouflage, and lighting variations significantly hinder performance. Traditional RGB-based detection methods often fail under such adverse conditions, motivating the need for more robust, modality-agnostic approaches. In this work, we present HiddenObject, a fusion framework that integrates RGB, thermal, and depth data using a Mamba-based fusion mechanism. Our method captures complementary signals across modalities, enabling enhanced detection of obscured or camouflaged targets. Specifically, the proposed approach identifies modality-specific features and fuses them in a unified representation that generalizes well across challenging scenarios. We validate HiddenObject across multiple benchmark datasets, demonstrating state-of-the-art or competitive performance compared to existing methods. These results highlight the efficacy of our fusion design and expose key limitations in current unimodal and naïve fusion strategies. More broadly, our findings suggest that Mamba-based fusion architectures can significantly advance the field of multimodal object detection, especially under visually degraded or complex conditions.
RODec 27, 2021
Mechanics-based Analysis on Flagellated RobotsYayun Du, Andrew Miller, M. Khalid Jawed
We explore the locomotion of soft robots in granular medium (GM) resulting from the elastic deformation of slender rods. A low-cost, rapidly fabricable robot inspired by the physiological structure of bacteria is presented. It consists of a rigid head, with a motor and batteries embedded, and multiple elastic rods (our model for flagella) to investigate locomotion in GM. The elastic flagella are rotated at one end by the motor, and they deform due to the drag from GM, propelling the robot. The external drag is determined by the flagellar shape, while the latter changes due to the competition between external loading and elastic forces. In this coupled fluid-structure interaction problem, we observe that increasing the number of flagella can decrease or increase the propulsive speed of the robot, depending on the physical parameters of the system. This nonlinearity in the functional relation between propulsion and the parameters of this simple robot motivates us to fundamentally analyze its mechanics using theory, numerical simulation, and experiments. We present a simple Euler-Bernoulli beam theory-based analytical framework that is capable of qualitatively capturing both cases. Theoretical prediction quantitatively matches experiments when the flagellar deformation is small. To account for the geometrically nonlinear deformation often encountered in soft robots and microbes, we implement a simulation framework that incorporates discrete differential geometry-based simulations of elastic rods, a resistive force theory-based model for drag, and a modified Stokes law for the hydrodynamics of the robot head. Comparison with experimental data indicates that the simulations can quantitatively predict robotic motion. Overall, the theoretical and numerical tools presented in this paper can shed light on the design and control of this class of articulated robots in granular or fluid media.
RODec 16, 2021
Automated stability testing of elastic rods with helical centerlines using a robotic systemDezhong Tong, Andy Borum, M. Khalid Jawed
Experimental analysis of the mechanics of a deformable object, and particularly its stability, requires repetitive testing and, depending on the complexity of the object's shape, a testing setup that can manipulate many degrees of freedom at the object's boundary. Motivated by recent advancements in robotic manipulation of deformable objects, this paper addresses these challenges by constructing a method for automated stability testing of a slender elastic rod -- a canonical example of a deformable object -- using a robotic system. We focus on rod configurations with helical centerlines since the stability of a helical rod can be described using only three parameters, but experimentally determining the stability requires manipulation of both the position and orientation at one end of the rod, which is not possible using traditional experimental methods that only actuate a limited number of degrees of freedom. Using a recent geometric characterization of stability for helical rods, we construct and implement a manipulation scheme to explore the space of stable helices, and we use a vision system to detect the onset of instabilities within this space. The experimental results obtained by our automated testing system show good agreement with numerical simulations of elastic rods in helical configurations. The methods described in this paper lay the groundwork for automation to grow within the field of experimental mechanics.
ROOct 7, 2018
Control of uniflagellar soft robots at low Reynolds number using buckling instabilityMojtaba Forghani, Weicheng Huang, M. Khalid Jawed
In this paper, we analyze the inverse dynamics and control of a bacteria-inspired uniflagellar robot in a fluid medium at low Reynolds number. Inspired by the mechanism behind the locomotion of flagellated bacteria, we consider a robot comprised of a flagellum -- a flexible helical filament -- attached to a spherical head. The flagellum rotates about the head at a controlled angular velocity and generates a propulsive force that moves the robot forward. When the angular velocity exceeds a threshold value, the hydrodynamic force exerted by the fluid can cause the soft flagellum to buckle, characterized by a dramatic change in shape. In this computational study, a fluid-structure interaction model that combines Discrete Elastic Rods (DER) algorithm with Lighthill's Slender Body Theory (LSBT) is employed to simulate the locomotion and deformation of the robot. We demonstrate that the robot can follow a prescribed path in three dimensional space by exploiting buckling of the flagellum. The control scheme involves only a single (binary) scalar input -- the angular velocity of the flagellum. By triggering the buckling instability at the right moment, the robot can follow an arbitrary path in three dimensional space. We also show that the complexity of the dynamics of the helical filament can be captured using a deep neural network, from which we identify the input-output functional relationship between the control inputs and the trajectory of the robot. Furthermore, our study underscores the potential role of buckling in the locomotion of natural bacteria.