Donald Sofge

h-index5

5papers

5citations

Novelty32%

AI Score32

Ranked #137,451 of 201,018 authors (top 68%)#9,360 in AI (top 66%)

5 Papers

AISep 26, 2023

Learning NEAT Emergent Behaviors in Robot Swarms

Pranav Rajbhandari, Donald Sofge

When researching robot swarms, many studies observe complex group behavior emerging from the individual agents' simple local actions. However, the task of learning an individual policy to produce a desired group behavior remains a challenging problem. We present a method of training distributed robotic swarm algorithms to produce emergent behavior. Inspired by the biological evolution of emergent behavior in animals, we use an evolutionary algorithm to train a population of individual behaviors to produce a desired group behavior. We perform experiments using simulations of the Georgia Tech Miniature Autonomous Blimps (GT-MABs) aerial robotics platforms conducted in the CoppeliaSim simulator. Additionally, we test on simulations of Anki Vector robots to display our algorithm's effectiveness on various modes of actuation. We evaluate our algorithm on various tasks where a somewhat complex group behavior is required for success. These tasks include an Area Coverage task and a Wall Climb task. We compare behaviors evolved using our algorithm against designed policies, which we create in order to exhibit the emergent behaviors we desire.

ROMay 15

Bayesian Networks for Path-Based Sensors: Gathering Information and Path Planning in Communication Denied Environments

Alkesh K. Srivastava, George P. Kontoudis, Donald Sofge et al.

A "path-based sensor" produces a single observation along a continuous path. For example, a boolean path-based sensor returns a single "1" if an event of interest is detected at any point along the path and a "0" otherwise. Notably, a "1" provides no direct information about where along the path the event(s) may have occurred. Previous work has demonstrated that observations from multiple path-based sensors can be fused to create a Bayesian belief map over the spatial locations of the underlying event or phenomenon. Moreover, path planning can employ Shannon information theory to accelerate the rate of convergence of the belief map. In this paper, we present a new method to update the belief map based on a path-based sensor observation, and then plan paths to increase information gain. In contrast to prior work that approximates the posterior by averaging over the alternative event histories, we introduce a Bayesian Network (BN) formulation that models the probabilistic relationships between the latent variables and path-based sensor measurements, enabling a more principled Bayesian belief update. We consider static hazard detection in a communication-denied environment as a representative problem setting. The event of a robot returning from its path corresponds to a path-based hazard sensor reading of "0" (hazard not detected), while a robot failing to return corresponds to a reading of "1" (hazard detected). We consider false positives and false negatives. We find that the new method leads to quicker convergence of the belief map than prior work in both single- and multi-robot cases.

LGFeb 7, 2025

Seasonal Station-Keeping of Short Duration High Altitude Balloons using Deep Reinforcement Learning

Tristan K. Schuler, Chinthan Prasad, Georgiy Kiselev et al.

Station-Keeping short-duration high-altitude balloons (HABs) in a region of interest is a challenging path-planning problem due to partially observable, complex, and dynamic wind flows. Deep reinforcement learning is a popular strategy for solving the station-keeping problem. A custom simulation environment was developed to train and evaluate Deep Q-Learning (DQN) for short-duration HAB agents in the simulation. To train the agents on realistic winds, synthetic wind forecasts were generated from aggregated historical radiosonde data to apply horizontal kinematics to simulated agents. The synthetic forecasts were closely correlated with ECWMF ERA5 Reanalysis forecasts, providing a realistic simulated wind field and seasonal and altitudinal variances between the wind models. DQN HAB agents were then trained and evaluated across different seasonal months. To highlight differences and trends in months with vastly different wind fields, a Forecast Score algorithm was introduced to independently classify forecasts based on wind diversity, and trends between station-keeping success and the Forecast Score were evaluated across all seasons.

AIOct 17, 2024

Transformer Guided Coevolution: Improved Team Selection in Multiagent Adversarial Team Games

Pranav Rajbhandari, Prithviraj Dasgupta, Donald Sofge

We consider the problem of team selection within multiagent adversarial team games. We propose BERTeam, a novel algorithm that uses a transformer-based deep neural network with Masked Language Model training to select the best team of players from a trained population. We integrate this with coevolutionary deep reinforcement learning, which trains a diverse set of individual players to choose from. We test our algorithm in the multiagent adversarial game Marine Capture-The-Flag, and find that BERTeam learns non-trivial team compositions that perform well against unseen opponents. For this game, we find that BERTeam outperforms MCAA, an algorithm that similarly optimizes team selection.

NENov 16, 2024

Fine Tuning Swimming Locomotion Learned from Mosquito Larvae

Pranav Rajbhandari, Karthick Dhileep, Sridhar Ravi et al.

In prior research, we analyzed the backwards swimming motion of mosquito larvae, parameterized it, and replicated it in a Computational Fluid Dynamics (CFD) model. Since the parameterized swimming motion is copied from observed larvae, it is not necessarily the most efficient locomotion for the model of the swimmer. In this project, we further optimize this copied solution for the swimmer model. We utilize Reinforcement Learning to guide local parameter updates. Since the majority of the computation cost arises from the CFD model, we additionally train a deep learning model to replicate the forces acting on the swimmer model. We find that this method is effective at performing local search to improve the parameterized swimming locomotion.