RODec 28, 2022
Learning When to Use Adaptive Adversarial Image Perturbations against Autonomous VehiclesHyung-Jin Yoon, Hamidreza Jafarnejadsani, Petros Voulgaris
The deep neural network (DNN) models for object detection using camera images are widely adopted in autonomous vehicles. However, DNN models are shown to be susceptible to adversarial image perturbations. In the existing methods of generating the adversarial image perturbations, optimizations take each incoming image frame as the decision variable to generate an image perturbation. Therefore, given a new image, the typically computationally-expensive optimization needs to start over as there is no learning between the independent optimizations. Very few approaches have been developed for attacking online image streams while considering the underlying physical dynamics of autonomous vehicles, their mission, and the environment. We propose a multi-level stochastic optimization framework that monitors an attacker's capability of generating the adversarial perturbations. Based on this capability level, a binary decision attack/not attack is introduced to enhance the effectiveness of the attacker. We evaluate our proposed multi-level image attack framework using simulations for vision-guided autonomous vehicles and actual tests with a small indoor drone in an office environment. The results show our method's capability to generate the image attack in real-time while monitoring when the attacker is proficient given state estimates.
LGSep 15, 2022
Multi-time Predictions of Wildfire Grid Map using Remote Sensing Local DataHyung-Jin Yoon, Petros Voulgaris
Due to recent climate changes, we have seen more frequent and severe wildfires in the United States. Predicting wildfires is critical for natural disaster prevention and mitigation. Advances in technologies in data processing and communication enabled us to access remote sensing data. With the remote sensing data, valuable spatiotemporal statistical models can be created and used for resource management practices. This paper proposes a distributed learning framework that shares local data collected in ten locations in the western USA throughout the local agents. The local agents aim to predict wildfire grid maps one, two, three, and four weeks in advance while online processing the remote sensing data stream. The proposed model has distinct features that address the characteristic need in prediction evaluations, including dynamic online estimation and time-series modeling. Local fire event triggers are not isolated between locations, and there are confounding factors when local data is analyzed due to incomplete state observations. Compared to existing approaches that do not account for incomplete state observation within wildfire time-series data, on average, we can achieve higher prediction performance.
33.4CLApr 17
DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity RecognitionSiun Kim, Hyung-Jin Yoon
Large language models (LLMs) have advanced information extraction (IE) by enabling zero-shot and few-shot named entity recognition (NER), yet their generative outputs still show persistent and systematic errors. Despite progress through instruction fine-tuning, zero-shot NER still lags far behind supervised systems. These recurring errors mirror inconsistencies observed in early-stage human annotation processes that resolve disagreements through pilot annotation. Motivated by this analogy, we introduce DiZiNER (Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition), a framework that simulates the pilot annotation process, employing LLMs to act as both annotators and supervisors. Multiple heterogeneous LLMs annotate shared texts, and a supervisor model analyzes inter-model disagreements to refine task instructions. Across 18 benchmarks, DiZiNER achieves zero-shot SOTA results on 14 datasets, improving prior bests by +8.0 F1 and reducing the zero-shot to supervised gap by over +11 points. It also consistently outperforms its supervisor, GPT-5 mini, indicating that improvements stem from disagreement-guided instruction refinement rather than model capacity. Pairwise agreement between models shows a strong correlation with NER performance, further supporting this finding.
RODec 10, 2025
Development and Testing for Perception Based Autonomous Landing of a Long-Range QuadPlaneAshik E Rasul, Humaira Tasnim, Ji Yu Kim et al.
QuadPlanes combine the range efficiency of fixed-wing aircraft with the maneuverability of multi-rotor platforms for long-range autonomous missions. In GPS-denied or cluttered urban environments, perception-based landing is vital for reliable operation. Unlike structured landing zones, real-world sites are unstructured and highly variable, requiring strong generalization capabilities from the perception system. Deep neural networks (DNNs) provide a scalable solution for learning landing site features across diverse visual and environmental conditions. While perception-driven landing has been shown in simulation, real-world deployment introduces significant challenges. Payload and volume constraints limit high-performance edge AI devices like the NVIDIA Jetson Orin Nano, which are crucial for real-time detection and control. Accurate pose estimation during descent is necessary, especially in the absence of GPS, and relies on dependable visual-inertial odometry. Achieving this with limited edge AI resources requires careful optimization of the entire deployment framework. The flight characteristics of large QuadPlanes further complicate the problem. These aircraft exhibit high inertia, reduced thrust vectoring, and slow response times further complicate stable landing maneuvers. This work presents a lightweight QuadPlane system for efficient vision-based autonomous landing and visual-inertial odometry, specifically developed for long-range QuadPlane operations such as aerial monitoring. It describes the hardware platform, sensor configuration, and embedded computing architecture designed to meet demanding real-time, physical constraints. This establishes a foundation for deploying autonomous landing in dynamic, unstructured, GPS-denied environments.
RODec 16, 2025
Expert Switching for Robust AAV Landing: A Dual-Detector Framework in SimulationHumaira Tasnim, Ashik E Rasul, Bruce Jo et al.
Reliable helipad detection is essential for Autonomous Aerial Vehicle (AAV) landing, especially under GPS-denied or visually degraded conditions. While modern detectors such as YOLOv8 offer strong baseline performance, single-model pipelines struggle to remain robust across the extreme scale transitions that occur during descent, where helipads appear small at high altitude and large near touchdown. To address this limitation, we propose a scale-adaptive dual-expert perception framework that decomposes the detection task into far-range and close-range regimes. Two YOLOv8 experts are trained on scale-specialized versions of the HelipadCat dataset, enabling one model to excel at detecting small, low-resolution helipads and the other to provide high-precision localization when the target dominates the field of view. During inference, both experts operate in parallel, and a geometric gating mechanism selects the expert whose prediction is most consistent with the AAV's viewpoint. This adaptive routing prevents the degradation commonly observed in single-detector systems when operating across wide altitude ranges. The dual-expert perception module is evaluated in a closed-loop landing environment that integrates CARLA's photorealistic rendering with NASA's GUAM flight-dynamics engine. Results show substantial improvements in alignment stability, landing accuracy, and overall robustness compared to single-detector baselines. By introducing a scale-aware expert routing strategy tailored to the landing problem, this work advances resilient vision-based perception for autonomous descent and provides a foundation for future multi-expert AAV frameworks.
CVMay 30, 2025
State Estimation and Control of Dynamic Systems from High-Dimensional Image DataAshik E Rasul, Hyung-Jin Yoon
Accurate state estimation is critical for optimal policy design in dynamic systems. However, obtaining true system states is often impractical or infeasible, complicating the policy learning process. This paper introduces a novel neural architecture that integrates spatial feature extraction using convolutional neural networks (CNNs) and temporal modeling through gated recurrent units (GRUs), enabling effective state representation from sequences of images and corresponding actions. These learned state representations are used to train a reinforcement learning agent with a Deep Q-Network (DQN). Experimental results demonstrate that our proposed approach enables real-time, accurate estimation and control without direct access to ground-truth states. Additionally, we provide a quantitative evaluation methodology for assessing the accuracy of the learned states, highlighting their impact on policy performance and control stability.
RODec 10, 2024
Bayesian Data Augmentation and Training for Perception DNN in Autonomous Aerial VehiclesAshik E Rasul, Humaira Tasnim, Hyung-Jin Yoon et al.
Learning-based solutions have enabled incredible capabilities for autonomous systems. Autonomous vehicles, both aerial and ground, rely on DNN for various integral tasks, including perception. The efficacy of supervised learning solutions hinges on the quality of the training data. Discrepancies between training data and operating conditions result in faults that can lead to catastrophic incidents. However, collecting vast amounts of context-sensitive data, with broad coverage of possible operating environments, is prohibitively difficult. Synthetic data generation techniques for DNN allow for the easy exploration of diverse scenarios. However, synthetic data generation solutions for aerial vehicles are still lacking. This work presents a data augmentation framework for aerial vehicle's perception training, leveraging photorealistic simulation integrated with high-fidelity vehicle dynamics. Safe landing is a crucial challenge in the development of autonomous air taxis, therefore, landing maneuver is chosen as the focus of this work. With repeated simulations of landing in varying scenarios we assess the landing performance of the VTOL type UAV and gather valuable data. The landing performance is used as the objective function to optimize the DNN through retraining. Given the high computational cost of DNN retraining, we incorporated Bayesian Optimization in our framework that systematically explores the data augmentation parameter space to retrain the best-performing models. The framework allowed us to identify high-performing data augmentation parameters that are consistently effective across different landing scenarios. Utilizing the capabilities of this data augmentation framework, we obtained a robust perception model. The model consistently improved the perception-based landing success rate by at least 20% under different lighting and weather conditions.
LGNov 28, 2021
Learning Wildfire Model from Incomplete State ObservationsAlissa Chavalithumrong, Hyung-Jin Yoon, Petros Voulgaris
As wildfires are expected to become more frequent and severe, improved prediction models are vital to mitigating risk and allocating resources. With remote sensing data, valuable spatiotemporal statistical models can be created and used for resource management practices. In this paper, we create a dynamic model for future wildfire predictions of five locations within the western United States through a deep neural network via historical burned area and climate data. The proposed model has distinct features that address the characteristic need in prediction evaluations, including dynamic online estimation and time-series modeling. Between locations, local fire event triggers are not isolated, and there are confounding factors when local data is analyzed due to incomplete state observations. When compared to existing approaches that do not account for incomplete state observation within wildfire time-series data, on average, we are able to achieve higher prediction performances.
ROMay 9, 2021
Learning Image Attacks toward Vision Guided Autonomous VehiclesHyung-Jin Yoon, Hamidreza Jafarnejadsani, Petros Voulgaris
While adversarial neural networks have been shown successful for static image attacks, very few approaches have been developed for attacking online image streams while taking into account the underlying physical dynamics of autonomous vehicles, their mission, and environment. This paper presents an online adversarial machine learning framework that can effectively misguide autonomous vehicles' missions. In the existing image attack methods devised toward autonomous vehicles, optimization steps are repeated for every image frame. This framework removes the need for fully converged optimization at every frame to realize image attacks in real-time. Using reinforcement learning, a generative neural network is trained over a set of image frames to obtain an attack policy that is more robust to dynamic and uncertain environments. A state estimator is introduced for processing image streams to reduce the attack policy's sensitivity to physical variables such as unknown position and velocity. A simulation study is provided to validate the results.
ROMar 4, 2021
Estimation and Planning of Exploration Over Grid Map Using A Spatiotemporal Model with Incomplete State ObservationsHyung-Jin Yoon, Hunmin Kim, Kripash Shrestha et al.
Path planning over spatiotemporal models can be applied to a variety of applications such as UAVs searching for spreading wildfire in mountains or network of balloons in time-varying atmosphere deployed for inexpensive internet service. A notable aspect in such applications is the dynamically changing environment. However, path planning algorithms often assume static environments and only consider the vehicle's dynamics exploring the environment. We present a spatiotemporal model that uses a cross-correlation operator to consider spatiotemporal dependence. Also, we present an adaptive state estimator for path planning. Since the state estimation depends on the vehicle's path, the path planning needs to consider the trade-off between exploration and exploitation. We use a high-level decision-maker to choose an explorative path or an exploitative path. The overall proposed framework consists of an adaptive state estimator, a short-term path planner, and a high-level decision-maker. We tested the framework with a spatiotemporal model simulation where the state of each grid transits from normal, latent, and fire state. For the mission objective of visiting the grids with fire, the proposed framework outperformed the random walk (baseline) and the single-minded exploitation (or exploration) path.
QMAug 2, 2019
High Accuracy Tumor Diagnoses and Benchmarking of Hematoxylin and Eosin Stained Prostate Core Biopsy Images Generated by Explainable Deep Neural NetworksAman Rana, Alarice Lowe, Marie Lithgow et al.
Histopathological diagnoses of tumors in tissue biopsy after Hematoxylin and Eosin (H&E) staining is the gold standard for oncology care. H&E staining is slow and uses dyes, reagents and precious tissue samples that cannot be reused. Thousands of native nonstained RGB Whole Slide Image (RWSI) patches of prostate core tissue biopsies were registered with their H&E stained versions. Conditional Generative Adversarial Neural Networks (cGANs) that automate conversion of native nonstained RWSI to computational H&E stained images were then trained. High similarities between computational and H&E dye stained images with Structural Similarity Index (SSIM) 0.902, Pearsons Correlation Coefficient (CC) 0.962 and Peak Signal to Noise Ratio (PSNR) 22.821 dB were calculated. A second cGAN performed accurate computational destaining of H&E dye stained images back to their native nonstained form with SSIM 0.9, CC 0.963 and PSNR 25.646 dB. A single-blind study computed more than 95% pixel-by-pixel overlap between prostate tumor annotations on computationally stained images, provided by five-board certified MD pathologists, with those on H&E dye stained counterparts. We report the first visualization and explanation of neural network kernel activation maps during H&E staining and destaining of RGB images by cGANs. High similarities between kernel activation maps of computational and H&E stained images (Mean-Squared Errors <0.0005) provide additional mathematical and mechanistic validation of the staining system. Our neural network framework thus is automated, explainable and performs high precision H&E staining and destaining of low cost native RGB images, and is computer vision and physician authenticated for rapid and accurate tumor diagnoses.
SYJun 11, 2019
Towards Resilient UAV: Escape Time in GPS Denied Environment with Sensor DriftHyung-Jin Yoon, Wenbin Wan, Hunmin Kim et al.
This paper considers a resilient state estimation framework for unmanned aerial vehicles (UAVs) that integrates a Kalman filter-like state estimator and an attack detector. When an attack is detected, the state estimator uses only IMU signals as the GPS signals do not contain legitimate information. This limited sensor availability induces a sensor drift problem questioning the reliability of the sensor estimates. We propose a new resilience measure, escape time, as the safe time within which the estimation errors remain in a tolerable region with high probability. This paper analyzes the stability of the proposed resilient estimation framework and quantifies a lower bound for the escape time. Moreover, simulations of the UAV model demonstrate the performance of the proposed framework and provide analytical results.
ROMar 12, 2019
A Path Planning Framework for a Flying Robot in Close Proximity of HumansHyung-Jin Yoon, Christopher Widdowson, Thiago Marinho et al.
We present a path planning framework that takes into account the human's safety perception in the presence of a flying robot. The framework addresses two objectives: (i) estimation of the uncertain parameters of the proposed safety perception model based on test data collected using Virtual Reality (VR) testbed, and (ii) offline optimal control computation using the estimated safety perception model. Due to the unknown factors in the human tests data, it is not suitable to use standard regression techniques that minimize the mean squared error (MSE). We propose to use a Hidden Markov model (HMM) approach where human's attention is considered as a hidden state to infer whether the data samples are relevant to learn the safety perception model. The HMM approach improved log-likelihood over the standard least squares solution. For path planning, we use Bernstein polynomials for discretization, as the resulting path remains within the convex hull of the control points, providing guarantees for deconfliction with obstacles at low computational cost. An example of optimal trajectory generation using the learned human model is presented. The optimal trajectory generated using the proposed model results in reasonable safety distance from the human. In contrast, the paths generated using the standard regression model have undesirable shapes due to overfitting. The example demonstrates that the HMM approach has robustness to the unknown factors compared to the standard MSE model.
RODec 13, 2018
Learning to Communicate: A Machine Learning Framework for Heterogeneous Multi-Agent Robotic SystemsHyung-Jin Yoon, Huaiyu Chen, Kehan Long et al.
We present a machine learning framework for multi-agent systems to learn both the optimal policy for maximizing the rewards and the encoding of the high dimensional visual observation. The encoding is useful for sharing local visual observations with other agents under communication resource constraints. The actor-encoder encodes the raw images and chooses an action based on local observations and messages sent by the other agents. The machine learning agent generates not only an actuator command to the physical device, but also a communication message to the other agents. We formulate a reinforcement learning problem, which extends the action space to consider the communication action as well. The feasibility of the reinforcement learning framework is demonstrated using a 3D simulation environment with two collaborating agents. The environment provides realistic visual observations to be used and shared between the two agents.
LGSep 17, 2018
Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision ProcessHyung-Jin Yoon, Donghwan Lee, Naira Hovakimyan
The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.