GTMay 29
From Talking Words to Sharing Thoughts: Scalable Multi-LLM Aggregation via Structured Message PassingNiloufar Mehrabi, Sayed Pedram Haeri Boroujeni, Abolfazl Razi
The emergence of specialized, domain-tuned Large Language Models (LLMs) has demonstrated that smaller models can achieve expert-level performance in specific tasks, while struggling in out-of-domain settings. Current ensemble methods to combine their complementary expertise primarily rely on iterative re-prompting or cross-model refinement. These approaches suffer from high computational costs and latency because they require repeated LLM inference calls. Furthermore, naive aggregation often leads to anchor corruption, in which noise propagated from weaker models degrades the performance of the most accurate expert. To address these challenges, we propose a framework that integrates model predictions at the semantic layer using a bipartite factor graph. In this architecture, individual LLMs are represented as variable nodes, while a set of check nodes assess their consistency based on diverse epistemic criteria. We develop a message-passing protocol inspired by error-recovery systems to resolve disagreements iteratively. Furthermore, we introduce an asymmetric damping mechanism that protects high-reliability anchor nodes from being overridden by the ensemble majority. Unlike existing methods, our approach operates on output distributions and requires no additional LLM calls during the refinement phase. Evaluating on four benchmarks, including MMLU, MMLU-Pro, GPQA, and MedMCQA, our method demonstrates a 97% reduction in token usage and up to a 6X decrease in API calls, reducing inference time from several minutes to mere milliseconds while consistently outperforming leading multi-agent baselines. These results suggest that graph-based belief propagation is a robust, high-speed, and scalable alternative to the current multi-agent LLM systems. The full pipeline and code will be made public.
LGFeb 16, 2023
A Hybrid Chimp Optimization Algorithm and Generalized Normal Distribution Algorithm with Opposition-Based Learning Strategy for Solving Data Clustering ProblemsSayed Pedram Haeri Boroujeni, Elnaz Pashaei
This paper is concerned with data clustering to separate clusters based on the connectivity principle for categorizing similar and dissimilar data into different groups. Although classical clustering algorithms such as K-means are efficient techniques, they often trap in local optima and have a slow convergence rate in solving high-dimensional problems. To address these issues, many successful meta-heuristic optimization algorithms and intelligence-based methods have been introduced to attain the optimal solution in a reasonable time. They are designed to escape from a local optimum problem by allowing flexible movements or random behaviors. In this study, we attempt to conceptualize a powerful approach using the three main components: Chimp Optimization Algorithm (ChOA), Generalized Normal Distribution Algorithm (GNDA), and Opposition-Based Learning (OBL) method. Firstly, two versions of ChOA with two different independent groups' strategies and seven chaotic maps, entitled ChOA(I) and ChOA(II), are presented to achieve the best possible result for data clustering purposes. Secondly, a novel combination of ChOA and GNDA algorithms with the OBL strategy is devised to solve the major shortcomings of the original algorithms. Lastly, the proposed ChOAGNDA method is a Selective Opposition (SO) algorithm based on ChOA and GNDA, which can be used to tackle large and complex real-world optimization problems, particularly data clustering applications. The results are evaluated against seven popular meta-heuristic optimization algorithms and eight recent state-of-the-art clustering techniques. Experimental results illustrate that the proposed work significantly outperforms other existing methods in terms of the achievement in minimizing the Sum of Intra-Cluster Distances (SICD), obtaining the lowest Error Rate (ER), accelerating the convergence speed, and finding the optimal cluster centers.
LGAug 18, 2023
An Efficient High-Dimensional Gene Selection Approach based on Binary Horse Herd Optimization Algorithm for Biological Data ClassificationNiloufar Mehrabi, Sayed Pedram Haeri Boroujeni, Elnaz Pashaei
The Horse Herd Optimization Algorithm (HOA) is a new meta-heuristic algorithm based on the behaviors of horses at different ages. The HOA was introduced recently to solve complex and high-dimensional problems. This paper proposes a binary version of the Horse Herd Optimization Algorithm (BHOA) in order to solve discrete problems and select prominent feature subsets. Moreover, this study provides a novel hybrid feature selection framework based on the BHOA and a minimum Redundancy Maximum Relevance (MRMR) filter method. This hybrid feature selection, which is more computationally efficient, produces a beneficial subset of relevant and informative features. Since feature selection is a binary problem, we have applied a new Transfer Function (TF), called X-shape TF, which transforms continuous problems into binary search spaces. Furthermore, the Support Vector Machine (SVM) is utilized to examine the efficiency of the proposed method on ten microarray datasets, namely Lymphoma, Prostate, Brain-1, DLBCL, SRBCT, Leukemia, Ovarian, Colon, Lung, and MLL. In comparison to other state-of-the-art, such as the Gray Wolf (GW), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA), the proposed hybrid method (MRMR-BHOA) demonstrates superior performance in terms of accuracy and minimum selected features. Also, experimental results prove that the X-Shaped BHOA approach outperforms others methods.
IVSep 30, 2024
Adaptive Data Transport Mechanism for UAV Surveillance Missions in Lossy EnvironmentsNiloufar Mehrabi, Sayed Pedram Haeri Boroujeni, Jenna Hofseth et al.
Unmanned Aerial Vehicles (UAVs) play an increasingly critical role in Intelligence, Surveillance, and Reconnaissance (ISR) missions such as border patrolling and criminal detection, thanks to their ability to access remote areas and transmit real-time imagery to processing servers. However, UAVs are highly constrained by payload size, power limits, and communication bandwidth, necessitating the development of highly selective and efficient data transmission strategies. This has driven the development of various compression and optimal transmission technologies for UAVs. Nevertheless, most methods strive to preserve maximal information in transferred video frames, missing the fact that only certain parts of images/video frames might offer meaningful contributions to the ultimate mission objectives in the ISR scenarios involving moving object detection and tracking (OD/OT). This paper adopts a different perspective, and offers an alternative AI-driven scheduling policy that prioritizes selecting regions of the image that significantly contributes to the mission objective. The key idea is tiling the image into small patches and developing a deep reinforcement learning (DRL) framework that assigns higher transmission probabilities to patches that present higher overlaps with the detected object of interest, while penalizing sharp transitions over consecutive frames to promote smooth scheduling shifts. Although we used Yolov-8 object detection and UDP transmission protocols as a benchmark testing scenario the idea is general and applicable to different transmission protocols and OD/OT methods. To further boost the system's performance and avoid OD errors for cluttered image patches, we integrate it with interframe interpolations.
LGNov 4, 2024Code
Enhancing Graph Neural Networks in Large-scale Traffic Incident Analysis with Concurrency HypothesisXiwen Chen, Sayed Pedram Haeri Boroujeni, Xin Shu et al.
Despite recent progress in reducing road fatalities, the persistently high rate of traffic-related deaths highlights the necessity for improved safety interventions. Leveraging large-scale graph-based nationwide road network data across 49 states in the USA, our study first posits the Concurrency Hypothesis from intuitive observations, suggesting a significant likelihood of incidents occurring at neighboring nodes within the road network. To quantify this phenomenon, we introduce two novel metrics, Average Neighbor Crash Density (ANCD) and Average Neighbor Crash Continuity (ANCC), and subsequently employ them in statistical tests to validate the hypothesis rigorously. Building upon this foundation, we propose the Concurrency Prior (CP) method, a powerful approach designed to enhance the predictive capabilities of general Graph Neural Network (GNN) models in semi-supervised traffic incident prediction tasks. Our method allows GNNs to incorporate concurrent incident information, as mentioned in the hypothesis, via tokenization with negligible extra parameters. The extensive experiments, utilizing real-world data across states and cities in the USA, demonstrate that integrating CP into 12 state-of-the-art GNN architectures leads to significant improvements, with gains ranging from 3% to 13% in F1 score and 1.3% to 9% in AUC metrics. The code is publicly available at https://github.com/xiwenc1/Incident-GNN-CP.
ROMar 27
Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight DisturbancesHazim Alzorgan, Sayed Pedram Haeri Boroujeni, Abolfazl Razi
Drones equipped with overhead manipulators offer unique capabilities for inspection, maintenance, and contact-based interaction. However, the motion of the drone and its manipulator is tightly linked, and even small attitude changes caused by wind or control imperfections shift the end-effector away from its intended path. This coupling makes reliable tracking difficult and also limits the direct use of learning-based arm controllers that were originally designed for fixed-base robots. These effects appear consistently in our tests whenever the UAV body experiences drift or rapid attitude corrections. To address this behavior, we develop a reinforcement-learning (RL) framework with a transformer-based double deep Q learning (DDQN), with the core idea of using an adaptive beam-search planner that applies a short-horizon beam search over candidate control sequences using the learned critic as the forward estimator. This allows the controller to anticipate the end-effector's motion through simulated rollouts rather than executing those actions directly on the actual model, realizing a software-in-the-loop (SITL) approach. The lookahead relies on value estimates from a Transformer critic that processes short sequences of states, while a DDQN backbone provides the one-step targets needed to keep the learning process stable. Evaluated on a 3-DoF aerial manipulator under identical training conditions, the proposed meta-adaptive planner shows the strongest overall performance with a 10.2% reward increase, a substantial reduction in mean tracking error (from about 6% to 3%), and a 29.6% improvement in the combined reward-error metric relative to the DDQN baseline. Our method exhibits elevated stability in tracking target tip trajectory (by maintaining 5 cm tracking error) when the drone base exhibits drifts due to external disturbances, as opposed to the fixed-beam and Transformer-only variants.
CVOct 30, 2025
All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous VehiclesSayed Pedram Haeri Boroujeni, Niloufar Mehrabi, Hazim Alzorgan et al.
Autonomous Vehicles (AVs) are transforming the future of transportation through advances in intelligent perception, decision-making, and control systems. However, their success is tied to one core capability, reliable object detection in complex and multimodal environments. While recent breakthroughs in Computer Vision (CV) and Artificial Intelligence (AI) have driven remarkable progress, the field still faces a critical challenge as knowledge remains fragmented across multimodal perception, contextual reasoning, and cooperative intelligence. This survey bridges that gap by delivering a forward-looking analysis of object detection in AVs, emphasizing emerging paradigms such as Vision-Language Models (VLMs), Large Language Models (LLMs), and Generative AI rather than re-examining outdated techniques. We begin by systematically reviewing the fundamental spectrum of AV sensors (camera, ultrasonic, LiDAR, and Radar) and their fusion strategies, highlighting not only their capabilities and limitations in dynamic driving environments but also their potential to integrate with recent advances in LLM/VLM-driven perception frameworks. Next, we introduce a structured categorization of AV datasets that moves beyond simple collections, positioning ego-vehicle, infrastructure-based, and cooperative datasets (e.g., V2V, V2I, V2X, I2I), followed by a cross-analysis of data structures and characteristics. Ultimately, we analyze cutting-edge detection methodologies, ranging from 2D and 3D pipelines to hybrid sensor fusion, with particular attention to emerging transformer-driven approaches powered by Vision Transformers (ViTs), Large and Small Language Models (SLMs), and VLMs. By synthesizing these perspectives, our survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities.
LGJan 4, 2024
A comprehensive survey of research towards AI-enabled unmanned aerial systems in pre-, active-, and post-wildfire managementSayed Pedram Haeri Boroujeni, Abolfazl Razi, Sahand Khoshdel et al.
Wildfires have emerged as one of the most destructive natural disasters worldwide, causing catastrophic losses in both human lives and forest wildlife. Recently, the use of Artificial Intelligence (AI) in wildfires, propelled by the integration of Unmanned Aerial Vehicles (UAVs) and deep learning models, has created an unprecedented momentum to implement and develop more effective wildfire management. Although some of the existing survey papers have explored various learning-based approaches, a comprehensive review emphasizing the application of AI-enabled UAV systems and their subsequent impact on multi-stage wildfire management is notably lacking. This survey aims to bridge these gaps by offering a systematic review of the recent state-of-the-art technologies, highlighting the advancements of UAV systems and AI models from pre-fire, through the active-fire stage, to post-fire management. To this aim, we provide an extensive analysis of the existing remote sensing systems with a particular focus on the UAV advancements, device specifications, and sensor technologies relevant to wildfire management. We also examine the pre-fire and post-fire management approaches, including fuel monitoring, prevention strategies, as well as evacuation planning, damage assessment, and operation strategies. Additionally, we review and summarize a wide range of computer vision techniques in active-fire management, with an emphasis on Machine Learning (ML), Reinforcement Learning (RL), and Deep Learning (DL) algorithms for wildfire classification, segmentation, detection, and monitoring tasks. Ultimately, we underscore the substantial advancement in wildfire modeling through the integration of cutting-edge AI techniques and UAV-based data, providing novel insights and enhanced predictive capabilities to understand dynamic wildfire behavior.
CVApr 6
Don't Waste Bits! Adaptive KV-Cache Quantization for Lightweight On-Device LLMsSayed Pedram Haeri Boroujeni, Niloufar Mehrabi, Patrick Woods et al.
Large Language Models (LLMs) have achieved remarkable progress across reasoning, generation, and decision-making tasks, yet deploying them on mobile, embedded, and edge devices remains particularly challenging. On-device LLM inference is heavily constrained by the memory and bandwidth overhead of the key-value (KV) cache, which grows linearly with context length and often dominates decoding cost. Existing KV-cache quantization schemes typically rely on fixed precision or hand-crafted heuristics, thereby wasting bits on low-impact tokens while over-compressing informative ones, leading to avoidable accuracy degradation. Inspired by Huffman coding's principle of variable-length allocation, we propose adaptive KV-cache quantization, a learned policy that assigns bit-width proportional to token importance, minimizing expected memory and latency without sacrificing competitive accuracy. Our framework extracts lightweight token-level features, including token frequency, quality score, attention variance, and entropy-based uncertainty, and feeds them into a compact data-driven controller that dynamically selects KV precision from {2-bit, 4-bit, 8-bit, FP16} during decoding. This adaptive precision policy reduces KV memory footprint and latency while improving accuracy compared to static KV quantization and rule-based baselines, and maintaining competitive accuracy close to FP16 inference across standard LLM benchmarks. Extensive experiments across multiple commonsense reasoning benchmarks using SmolLM-135M, SmolLM-360M, and SmolLM-1.7B demonstrate that our controller consistently improves the accuracy-latency trade-off. For instance, with SmolLM-360M on HellaSwag, our method reduces decoding latency (ms/token) by 17.75% relative to static KV quantization, improves accuracy by 7.60 points, and remains within only 0.30 points of FP16 inference.
CVMar 6, 2024
FLAME Diffuser: Wildfire Image Synthesis using Mask Guided DiffusionHao Wang, Sayed Pedram Haeri Boroujeni, Xiwen Chen et al.
Wildfires are a significant threat to ecosystems and human infrastructure, leading to widespread destruction and environmental degradation. Recent advancements in deep learning and generative models have enabled new methods for wildfire detection and monitoring. However, the scarcity of annotated wildfire images limits the development of robust models for these tasks. In this work, we present the FLAME Diffuser, a training-free, diffusion-based framework designed to generate realistic wildfire images with paired ground truth. Our framework uses augmented masks, sampled from real wildfire data, and applies Perlin noise to guide the generation of realistic flames. By controlling the placement of these elements within the image, we ensure precise integration while maintaining the original images style. We evaluate the generated images using normalized Frechet Inception Distance, CLIP Score, and a custom CLIP Confidence metric, demonstrating the high quality and realism of the synthesized wildfire images. Specifically, the fusion of Perlin noise in this work significantly improved the quality of synthesized images. The proposed method is particularly valuable for enhancing datasets used in downstream tasks such as wildfire detection and monitoring.
CVMar 17, 2025
Eyes on the Environment: AI-Driven Analysis for Fire and Smoke Classification, Segmentation, and DetectionSayed Pedram Haeri Boroujeni, Niloufar Mehrabi, Fatemeh Afghah et al.
Fire and smoke phenomena pose a significant threat to the natural environment, ecosystems, and global economy, as well as human lives and wildlife. In this particular circumstance, there is a demand for more sophisticated and advanced technologies to implement an effective strategy for early detection, real-time monitoring, and minimizing the overall impacts of fires on ecological balance and public safety. Recently, the rapid advancement of Artificial Intelligence (AI) and Computer Vision (CV) frameworks has substantially revolutionized the momentum for developing efficient fire management systems. However, these systems extensively rely on the availability of adequate and high-quality fire and smoke data to create proficient Machine Learning (ML) methods for various tasks, such as detection and monitoring. Although fire and smoke datasets play a critical role in training, evaluating, and testing advanced Deep Learning (DL) models, a comprehensive review of the existing datasets is still unexplored. For this purpose, we provide an in-depth review to systematically analyze and evaluate fire and smoke datasets collected over the past 20 years. We investigate the characteristics of each dataset, including type, size, format, collection methods, and geographical diversities. We also review and highlight the unique features of each dataset, such as imaging modalities (RGB, thermal, infrared) and their applicability for different fire management tasks (classification, segmentation, detection). Furthermore, we summarize the strengths and weaknesses of each dataset and discuss their potential for advancing research and technology in fire management. Ultimately, we conduct extensive experimental analyses across different datasets using several state-of-the-art algorithms, such as ResNet-50, DeepLab-V3, and YoloV8.
CVApr 10, 2024
Enhanced Cooperative Perception for Autonomous Vehicles Using Imperfect CommunicationAhmad Sarlak, Hazim Alzorgan, Sayed Pedram Haeri Boroujeni et al.
Sharing and joint processing of camera feeds and sensor measurements, known as Cooperative Perception (CP), has emerged as a new technique to achieve higher perception qualities. CP can enhance the safety of Autonomous Vehicles (AVs) where their individual visual perception quality is compromised by adverse weather conditions (haze as foggy weather), low illumination, winding roads, and crowded traffic. To cover the limitations of former methods, in this paper, we propose a novel approach to realize an optimized CP under constrained communications. At the core of our approach is recruiting the best helper from the available list of front vehicles to augment the visual range and enhance the Object Detection (OD) accuracy of the ego vehicle. In this two-step process, we first select the helper vehicles that contribute the most to CP based on their visual range and lowest motion blur. Next, we implement a radio block optimization among the candidate vehicles to further improve communication efficiency. We specifically focus on pedestrian detection as an exemplary scenario. To validate our approach, we used the CARLA simulator to create a dataset of annotated videos for different driving scenarios where pedestrian detection is challenging for an AV with compromised vision. Our results demonstrate the efficacy of our two-step optimization process in improving the overall performance of cooperative perception in challenging scenarios, substantially improving driving safety under adverse conditions. Finally, we note that the networking assumptions are adopted from LTE Release 14 Mode 4 side-link communication, commonly used for Vehicle-to-Vehicle (V2V) communication. Nonetheless, our method is flexible and applicable to arbitrary V2V communications.
HCJan 26, 2024
Driving Towards Inclusion: A Systematic Review of AI-powered Accessibility Enhancements for People with Disability in Autonomous VehiclesAshish Bastola, Hao Wang, Sayed Pedram Haeri Boroujeni et al.
This paper provides a comprehensive and, to our knowledge, the first review of inclusive human-computer interaction (HCI) within autonomous vehicles (AVs) and human-driven cars with partial autonomy, emphasizing accessibility and user-centered design principles. We explore the current technologies and HCI systems designed to enhance passenger experience, particularly for individuals with accessibility needs. Key technologies discussed include brain-computer interfaces, anthropomorphic interaction, virtual reality, augmented reality, mode adaptation, voice-activated interfaces, haptic feedback, etc. Each technology is evaluated for its role in creating an inclusive in-vehicle environment. Furthermore, we highlight recent interface designs by leading companies and review emerging concepts and prototypes under development or testing, which show significant potential to address diverse accessibility requirements. Safety considerations, ethical concerns, and adoption of AVs are other major issues that require thorough investigation. Building on these findings, we propose an end-to-end design framework that addresses accessibility requirements across diverse user demographics, including older adults and individuals with physical or cognitive impairments. This work provides actionable insights for designers, researchers, and policymakers aiming to create safer and more comfortable environments in autonomous and regular vehicles accessible to all users.