Dario Pompili

LG
h-index12
15papers
331citations
Novelty41%
AI Score39

15 Papers

ROAug 12, 2024
Retrieval-Augmented Hierarchical in-Context Reinforcement Learning and Hindsight Modular Reflections for Task Planning with LLMs

Chuanneng Sun, Songjun Huang, Dario Pompili

Large Language Models (LLMs) have demonstrated remarkable abilities in various language tasks, making them promising candidates for decision-making in robotics. Inspired by Hierarchical Reinforcement Learning (HRL), we propose Retrieval-Augmented in-context reinforcement Learning (RAHL), a novel framework that decomposes complex tasks into sub-tasks using an LLM-based high-level policy, in which a complex task is decomposed into sub-tasks by a high-level policy on-the-fly. The sub-tasks, defined by goals, are assigned to the low-level policy to complete. To improve the agent's performance in multi-episode execution, we propose Hindsight Modular Reflection (HMR), where, instead of reflecting on the full trajectory, we let the agent reflect on shorter sub-trajectories to improve reflection efficiency. We evaluated the decision-making ability of the proposed RAHL in three benchmark environments--ALFWorld, Webshop, and HotpotQA. The results show that RAHL can achieve an improvement in performance in 9%, 42%, and 10% in 5 episodes of execution in strong baselines. Furthermore, we also implemented RAHL on the Boston Dynamics SPOT robot. The experiment shows that the robot can scan the environment, find entrances, and navigate to new rooms controlled by the LLM policy.

41.0SPApr 18
E2E-WAVE: End-to-End Learned Waveform Generation for Underwater Video Multicasting

Khizar Anjum, Tingcong Jiang, Dario Pompili

We present E2E-WAVE, the first end-to-end learned waveform generation system for underwater video multicasting. Acoustic channels exhibit 20--46% bit error rates where forward error correction becomes counterproductive -- LDPC increases rather than decreases errors beyond its decoding threshold. E2E-WAVE addresses this by embedding semantic similarity directly into physical layer waveforms: when decoding errors are unavoidable, the system preferentially selects semantically similar tokens rather than arbitrary corruption. Combining VideoGPT tokenization (1024x compression) with a trainable waveform bank and fully differentiable OFDM transmission, E2E-WAVE achieves +5 dB (19.26%) PSNR and +0.10 (14.28%) SSIM over the strongest FEC-protected baseline in less challenging underwater channel (NOF1) while delivering real-time 16 FPS video at 128x128 resolution over 2.3 kbps channels -- impossible for conventional digital modulation. The performance gap only increases in harsher channels (BCH1, NCS1). Trained on a single channel, E2E-WAVE generalizes to unseen underwater environments without retraining, while HEVC fails at sub-5 kbps rates and SoftCast's AWGN assumptions collapse on frequency-selective channels.

MAMay 17, 2024
LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

Chuanneng Sun, Songjun Huang, Dario Pompili

In recent years, Large Language Models (LLMs) have shown great abilities in various tasks, including question answering, arithmetic problem solving, and poem writing, among others. Although research on LLM-as-an-agent has shown that LLM can be applied to Reinforcement Learning (RL) and achieve decent results, the extension of LLM-based RL to Multi-Agent System (MAS) is not trivial, as many aspects, such as coordination and communication between agents, are not considered in the RL frameworks of a single agent. To inspire more research on LLM-based MARL, in this letter, we survey the existing LLM-based single-agent and multi-agent RL frameworks and provide potential research directions for future research. In particular, we focus on the cooperative tasks of multiple agents with a common goal and communication among them. We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.

ROApr 1, 2025
Real-Time Navigation for Autonomous Aerial Vehicles Using Video

Khizar Anjum, Parul Pandey, Vidyasagar Sadhu et al.

Most applications in autonomous navigation using mounted cameras rely on the construction and processing of geometric 3D point clouds, which is an expensive process. However, there is another simpler way to make a space navigable quickly: to use semantic information (e.g., traffic signs) to guide the agent. However, detecting and acting on semantic information involves Computer Vision~(CV) algorithms such as object detection, which themselves are demanding for agents such as aerial drones with limited onboard resources. To solve this problem, we introduce a novel Markov Decision Process~(MDP) framework to reduce the workload of these CV approaches. We apply our proposed framework to both feature-based and neural-network-based object-detection tasks, using open-loop and closed-loop simulations as well as hardware-in-the-loop emulations. These holistic tests show significant benefits in energy consumption and speed with only a limited loss in accuracy compared to models based on static features and neural networks.

CVDec 28, 2021
Source Feature Compression for Object Classification in Vision-Based Underwater Robotics

Xueyuan Zhao, Mehdi Rahmati, Dario Pompili

New efficient source feature compression solutions are proposed based on a two-stage Walsh-Hadamard Transform (WHT) for Convolutional Neural Network (CNN)-based object classification in underwater robotics. The object images are firstly transformed by WHT following a two-stage process. The transform-domain tensors have large values concentrated in the upper left corner of the matrices in the RGB channels. By observing this property, the transform-domain matrix is partitioned into inner and outer regions. Consequently, two novel partitioning methods are proposed in this work: (i) fixing the size of inner and outer regions; and (ii) adjusting the size of inner and outer regions adaptively per image. The proposals are evaluated with an underwater object dataset captured from the Raritan River in New Jersey, USA. It is demonstrated and verified that the proposals reduce the training time effectively for learning-based underwater object classification task and increase the accuracy compared with the competing methods. The object classification is an essential part of a vision-based underwater robot that can sense the environment and navigate autonomously. Therefore, the proposed method is well-suited for efficient computer vision-based tasks in underwater robotics applications.

SPAug 3, 2020
Configuration Learning in Underwater Optical Links

Xueyuan Zhao, Zhuoran Qi, Dario Pompili

A new research problem named configuration learning is described in this work. A novel algorithm is proposed to address the configuration learning problem. The configuration learning problem is defined to be the optimization of the Machine Learning (ML) classifier to maximize the ML performance metric optimizing the transmitter configuration in the signal processing/communication systems. Specifically, this configuration learning problem is investigated in an underwater optical communication system with signal processing performance metric of the physical-layer communication throughput. A novel algorithm is proposed to perform the configuration learning by alternating optimization of key design parameters and switching between several Recurrent Neural Network (RNN) classifiers dependant on the learning objective. The proposed ML algorithm is validated with the datasets of an underwater optical communication system and is compared with competing ML algorithms. Performance results indicate that the proposal outperforms the competing algorithms for binary and multi-class configuration learning in underwater optical communication datasets. The proposed configuration learning framework can be further investigated and applied to a broad range of topics in signal processing and communications.

SPApr 3, 2020
On-board Deep-learning-based Unmanned Aerial Vehicle Fault Cause Detection and Identification

Vidyasagar Sadhu, Saman Zonouz, Dario Pompili

With the increase in use of Unmanned Aerial Vehicles (UAVs)/drones, it is important to detect and identify causes of failure in real time for proper recovery from a potential crash-like scenario or post incident forensics analysis. The cause of crash could be either a fault in the sensor/actuator system, a physical damage/attack, or a cyber attack on the drone's software. In this paper, we propose novel architectures based on deep Convolutional and Long Short-Term Memory Neural Networks (CNNs and LSTMs) to detect (via Autoencoder) and classify drone mis-operations based on sensor data. The proposed architectures are able to learn high-level features automatically from the raw sensor data and learn the spatial and temporal dynamics in the sensor data. We validate the proposed deep-learning architectures via simulations and experiments on a real drone. Empirical results show that our solution is able to detect with over 90% accuracy and classify various types of drone mis-operations (with about 99% accuracy (simulation data) and upto 88% accuracy (experimental data)).

GNDec 31, 2019
Transform-Domain Classification of Human Cells based on DNA Methylation Datasets

Xueyuan Zhao, Dario Pompili

A novel method to classify human cells is presented in this work based on the transform-domain method on DNA methylation data. DNA methylation profile variations are observed in human cells with the progression of disease stages, and the proposal is based on this DNA methylation variation to classify normal and disease cells including cancer cells. The cancer cell types investigated in this work cover hepatocellular (sample size n = 40), colorectal (n = 44), lung (n = 70) and endometrial (n = 87) cancer cells. A new pipeline is proposed integrating the DNA methylation intensity measurements on all the CpG islands by the transformation of Walsh-Hadamard Transform (WHT). The study reveals the three-step properties of the DNA methylation transform-domain data and the step values of association with the cell status. Further assessments have been carried out on the proposed machine learning pipeline to perform classification of the normal and cancer tissue cells. A number of machine learning classifiers are compared for whole sequence and WHT sequence classification based on public Whole-Genome Bisulfite Sequencing (WGBS) DNA methylation datasets. The WHT-based method can speed up the computation time by more than one order of magnitude compared with whole original sequence classification, while maintaining comparable classification accuracy by the selected machine learning classifiers. The proposed method has broad applications in expedited disease and normal human cell classifications by the epigenome and genome datasets.

CRJun 30, 2019
Secure Mobile Technologies for Proactive Critical Infrastructure Situational Awareness

Gabriel Salles-Loustau, Vidyasagar Sadhu, Dario Pompili et al.

Trustworthy operation of our national critical infrastructures, such as the electricity grid, against adversarial parties and accidental failures requires constant and secure monitoring capabilities. In this paper, Eyephone is presented to leverage secure smartphone sensing and data acquisition capabilities and enable pervasive sensing of the national critical infrastructures. The reported information by the smartphone users will notify the control center operators about particular accidental or malicious remote critical infrastructure incidents. The reporting will be proactive regarding potentially upcoming failures given the system's current risky situation, e.g., a tree close to fall on a power grid transmission line. The information will include various modalities such as images, video, audio, time and location. Eyephone will use system-wide information flow analysis and policy enforcement to prevent user privacy violations during the incident reportings. A working proof-of-concept prototype of Eyephone is implemented. Our results show that Eyephone allows secure and effective use of smartphones for real-time situational awareness of our national critical infrastructures.

LGJun 28, 2019
Deep Multi-Task Learning for Anomalous Driving Detection Using CAN Bus Scalar Sensor Data

Vidyasagar Sadhu, Teruhisa Misu, Dario Pompili

Corner cases are the main bottlenecks when applying Artificial Intelligence (AI) systems to safety-critical applications. An AI system should be intelligent enough to detect such situations so that system developers can prepare for subsequent planning. In this paper, we propose semi-supervised anomaly detection considering the imbalance of normal situations. In particular, driving data consists of multiple positive/normal situations (e.g., right turn, going straight), some of which (e.g., U-turn) could be as rare as anomalous situations. Existing machine learning based anomaly detection approaches do not fare sufficiently well when applied to such imbalanced data. In this paper, we present a novel multi-task learning based approach that leverages domain-knowledge (maneuver labels) for anomaly detection in driving data. We evaluate the proposed approach both quantitatively and qualitatively on 150 hours of real-world driving data and show improved performance over baseline approaches.

CYApr 29, 2019
Argus: Smartphone-enabled Human Cooperation via Multi-Agent Reinforcement Learning for Disaster Situational Awareness

Vidyasagar Sadhu, Gabriel Salles-Loustau, Dario Pompili et al.

Argus exploits a Multi-Agent Reinforcement Learning (MARL) framework to create a 3D mapping of the disaster scene using agents present around the incident zone to facilitate the rescue operations. The agents can be both human bystanders at the disaster scene as well as drones or robots that can assist the humans. The agents are involved in capturing the images of the scene using their smartphones (or on-board cameras in case of drones) as directed by the MARL algorithm. These images are used to build real time a 3D map of the disaster scene. Via both simulations and real experiments, an evaluation of the framework in terms of effectiveness in tracking random dynamicity of the environment is presented.

LGApr 21, 2019
HCFContext: Smartphone Context Inference via Sequential History-based Collaborative Filtering

Vidyasagar Sadhu, Saman Zonouz, Vincent Sritapan et al.

Mobile context determination is an important step for many context aware services such as location-based services, enterprise policy enforcement, building or room occupancy detection for power or HVAC operation, etc. Especially in enterprise scenarios where policies (e.g., attending a confidential meeting only when the user is in "Location X") are defined based on mobile context, it is paramount to verify the accuracy of the mobile context. To this end, two stochastic models based on the theory of Hidden Markov Models (HMMs) to obtain mobile context are proposed-personalized model (HPContext) and collaborative filtering model (HCFContext). The former predicts the current context using sequential history of the user's past context observations, the latter enhances HPContext with collaborative filtering features, which enables it to predict the current context of the primary user based on the context observations of users related to the primary user, e.g., same team colleagues in company, gym friends, family members, etc. Each of the proposed models can also be used to enhance or complement the context obtained from sensors. Furthermore, since privacy is a concern in collaborative filtering, a privacy-preserving method is proposed to derive HCFContext model parameters based on the concepts of homomorphic encryption. Finally, these models are thoroughly validated on a real-life dataset.

NISep 29, 2017
CollabLoc: Privacy-Preserving Multi-Modal Localization via Collaborative Information Fusion

Vidyasagar Sadhu, Dario Pompili, Saman Zonouz et al.

Mobile phones provide an excellent opportunity for building context-aware applications. In particular, location-based services are important context-aware services that are more and more used for enforcing security policies, for supporting indoor room navigation, and for providing personalized assistance. However, a major problem still remains unaddressed---the lack of solutions that work across buildings while not using additional infrastructure and also accounting for privacy and reliability needs. In this paper, a privacy-preserving, multi-modal, cross-building, collaborative localization platform is proposed based on Wi-Fi RSSI (existing infrastructure), Cellular RSSI, sound and light levels, that enables room-level localization as main application (though sub room level granularity is possible). The privacy is inherently built into the solution based on onion routing, and perturbation/randomization techniques, and exploits the idea of weighted collaboration to increase the reliability as well as to limit the effect of noisy devices (due to sensor noise/privacy). The proposed solution has been analyzed in terms of privacy, accuracy, optimum parameters, and other overheads on location data collected at multiple indoor and outdoor locations using an Android app.

LGFeb 17, 2017
Cloud-based Deep Learning of Big EEG Data for Epileptic Seizure Prediction

Mohammad-Parsa Hosseini, Hamid Soltanian-Zadeh, Kost Elisevich et al.

Developing a Brain-Computer Interface~(BCI) for seizure prediction can help epileptic patients have a better quality of life. However, there are many difficulties and challenges in developing such a system as a real-life support for patients. Because of the nonstationary nature of EEG signals, normal and seizure patterns vary across different patients. Thus, finding a group of manually extracted features for the prediction task is not practical. Moreover, when using implanted electrodes for brain recording massive amounts of data are produced. This big data calls for the need for safe storage and high computational resources for real-time processing. To address these challenges, a cloud-based BCI system for the analysis of this big EEG data is presented. First, a dimensionality-reduction technique is developed to increase classification accuracy as well as to decrease the communication bandwidth and computation time. Second, following a deep-learning approach, a stacked autoencoder is trained in two steps for unsupervised feature extraction and classification. Third, a cloud-computing solution is proposed for real-time analysis of big EEG data. The results on a benchmark clinical dataset illustrate the superiority of the proposed patient-specific BCI as an alternative method and its expected usefulness in real-life support of epilepsy patients.

CVOct 24, 2016
Automatic and Manual Segmentation of Hippocampus in Epileptic Patients MRI

Mohammad-Parsa Hosseini, Mohammad-Reza Nazem-Zadeh, Dario Pompili et al.

The hippocampus is a seminal structure in the most common surgically-treated form of epilepsy. Accurate segmentation of the hippocampus aids in establishing asymmetry regarding size and signal characteristics in order to disclose the likely site of epileptogenicity. With sufficient refinement, it may ultimately aid in the avoidance of invasive monitoring with its expense and risk for the patient. To this end, a reliable and consistent method for segmentation of the hippocampus from magnetic resonance imaging (MRI) is needed. In this work, we present a systematic and statistical analysis approach for evaluation of automated segmentation methods in order to establish one that reliably approximates the results achieved by manual tracing of the hippocampus.