SYOct 20, 2017
Data-Driven Robust Taxi Dispatch under Demand UncertaintiesFei Miao, Shuo Han, Shan Lin et al.
In modern taxi networks, large amounts of taxi occupancy status and location data are collected from networked in-vehicle sensors in real-time. They provide knowledge of system models on passenger demand and mobility patterns for efficient taxi dispatch and coordination strategies. Such approaches face new challenges: how to deal with uncertainties of predicted customer demand while fulfilling the system's performance requirements, including minimizing taxis' total idle mileage and maintaining service fairness across the whole city; how to formulate a computationally tractable problem. To address this problem, we develop a data-driven robust taxi dispatch framework to consider spatial-temporally correlated demand uncertainties. The robust vehicle dispatch problem we formulate is concave in the uncertain demand and convex in the decision variables. Uncertainty sets of random demand vectors are constructed from data based on theories in hypothesis testing, and provide a desired probabilistic guarantee level for the performance of robust taxi dispatch solutions. We prove equivalent computationally tractable forms of the robust dispatch problem using the minimax theorem and strong duality. Evaluations on four years of taxi trip data for New York City show that by selecting a probabilistic guarantee level at 75%, the average demand-supply ratio error is reduced by 31.7%, and the average total idle driving distance is reduced by 10.13% or about 20 million miles annually, compared with non-robust dispatch solutions.
CVOct 31, 2024
YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous VersionsNidhal Jegham, Chan Young Koh, Marwan Abdelatti et al.
This study presents a comprehensive benchmark analysis of various YOLO (You Only Look Once) algorithms. It represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12, on various object detection challenges. The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class, ensuring a comprehensive assessment across datasets with distinct challenges. To ensure a robust evaluation, we employ a comprehensive set of metrics, including Precision, Recall, Mean Average Precision (mAP), Processing Time, GFLOPs count, and Model Size. Our analysis highlights the distinctive strengths and limitations of each YOLO version. For example: YOLOv9 demonstrates substantial accuracy but struggles with detecting small objects and efficiency whereas YOLOv10 exhibits relatively lower accuracy due to architectural choices that affect its performance in overlapping object detection but excels in speed and efficiency. Additionally, the YOLO11 family consistently shows superior performance maintaining a remarkable balance of accuracy and efficiency. However, YOLOv12 delivered underwhelming results, with its complex architecture introducing computational overhead without significant performance gains. These results provide critical insights for both industry and academia, facilitating the selection of the most suitable YOLO algorithm for diverse applications and guiding future enhancements.
CYMay 14, 2025
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM InferenceNidhal Jegham, Marwan Abdelatti, Chan Young Koh et al.
This paper introduces an infrastructure-aware benchmarking framework for quantifying the environmental footprint of LLM inference across 30 state-of-the-art models in commercial datacenters. The framework combines public API performance data with company-specific environmental multipliers and statistical inference of hardware configurations. We additionally utilize cross-efficiency Data Envelopment Analysis (DEA) to rank models by performance relative to environmental cost and provide a dynamically updated dashboard that visualizes model-level energy, water, and carbon metrics. Results show the most energy-intensive models exceed 29 Wh per long prompt, over 65 times the most efficient systems. Even a 0.42 Wh short query, when scaled to 700M queries/day, aggregates to annual electricity comparable to 35{,}000 U.S. homes, evaporative freshwater equal to the annual drinking needs of 1.2M people, and carbon emissions requiring a Chicago-sized forest to offset. These findings highlight a growing paradox: as AI becomes cheaper and faster, global adoption drives disproportionate resource consumption. Our methodology offers a standardized, empirically grounded basis for sustainability benchmarking and accountability in AI deployment.
CVFeb 23, 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPTNidhal Jegham, Marwan Abdelatti, Abdeltawab Hendawi
Traditional evaluations of multimodal large language models (LLMs) have been limited by their focus on single-image reasoning, failing to assess crucial aspects like contextual understanding, reasoning stability, and uncertainty calibration. This study addresses these limitations by introducing a novel benchmark that integrates multi-image reasoning tasks with rejection-based evaluation and positional bias detection. To evaluate these dimensions, we further introduce entropy as a novel metric for quantifying reasoning consistency across reordered answer variants. We applied this benchmark to assess Grok 3, ChatGPT-4o, ChatGPT-o1, Gemini 2.0 Flash Experimental, DeepSeek Janus models, Qwen2.5-VL-72B-Instruct, QVQ-72B-Preview, and Pixtral 12B across eight visual reasoning tasks, including difference spotting and diagram interpretation. Our findings reveal ChatGPT-o1 leading in overall accuracy (82.5\%) and rejection accuracy (70.0\%), closely followed by Gemini 2.0 Flash Experimental (70.8\%). QVQ-72B-Preview demonstrated superior rejection accuracy (85.5\%). Notably, Pixtral 12B (51.7\%) showed promise in specific domains, while Janus models exhibited challenges in bias and uncertainty calibration, reflected in low rejection accuracies and high entropy scores. High entropy scores in Janus models (Janus 7B: 0.8392, Janus 1B: 0.787) underscore their susceptibility to positional bias and unstable reasoning, contrasting with the low entropy and robust reasoning of ChatGPT models. The study further demonstrates that model size is not the sole determinant of performance, as evidenced by Grok 3 underperformance despite its substantial parameter count. By employing multi-image contexts, rejection mechanisms, and entropy-based consistency metrics, this benchmark sets a new standard for evaluating multimodal LLMs, enabling a more robust and reliable assessment of next-generation AI systems.
CRMar 12, 2019
Hardware/Software Security Patches for Internet of Trillions of ThingsJohn A. Stankovic, Tu Le, Abdeltawab Hendawi et al.
With the rapid development of the Internet of Things, there are many interacting devices and applications. One crucial challenge is how to provide security. Our proposal for a new direction is to create "smart buttons" and collections of them called "smart blankets" as hardware/software security patches rather than software-only patches.