DSMar 16, 2025
Improving Merge Sort and Quick Sort Performance by Utilizing Alphadev's Sorting Networks as Base CasesAnas Gamal Aly, Anders E. Jensen, Hala ElAarag
Recent work by Google DeepMind introduced assembly-optimized sorting networks that achieve faster performance for small fixed-size arrays (3-8). In this research, we investigate the integration of these networks as base cases in classical divide-and-conquer sorting algorithms, specifically Merge Sort and Quick Sort, to leverage these efficient sorting networks for small subarrays generated during the recursive process. We conducted benchmarks with 11 different optimization configurations and compared them to classical Merge Sort and Quick Sort. We tested the configurations with random, sorted and nearly sorted arrays. Our optimized Merge Sort, using a configuration of three sorting networks (sizes 6, 7, and 8), achieves at least 1.5x speedup for random and nearly sorted arrays, and at least 2x speedup for sorted arrays, in comparison to classical Merge Sort. This optimized Merge Sort surpasses both classical Quick Sort and similarly optimized Quick Sort variants when sorting random arrays of size 10,000 and larger. When comparing our optimized Quick Sort to classical Quick Sort, we observe a 1.5x speedup using the 3-to-5 configuration on sorted arrays of size 10,000. The 6-to-8 configuration maintains a consistent 1.5x improvement across sorted arrays from 25,000 to 1 million elements. Our findings demonstrate the potential of integrating AI-optimized sorting networks to enhance the performance of classical sorting algorithms.
CVApr 28
No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal ControlAnas Gamal Aly, Hala ElAarag
Current pedestrian crossing signals operate on fixed timing without adjustment to pedestrian behavior, which can leave vulnerable road users (VRUs) such as the elderly, disabled, or distracted pedestrians stranded when the light changes. We introduce No Pedestrian Left Behind (NPLB), a real-time adaptive traffic signal system that monitors VRUs in crosswalks and automatically extends signal timing when needed. We evaluated five state-of-the-art object detection models on the BGVP dataset, with YOLOv12 achieving the highest mean Average Precision at 50% (mAP@0.5) of 0.756. NPLB integrates our fine-tuned YOLOv12 with ByteTrack multi-object tracking and an adaptive controller that extends pedestrian phases when remaining time falls below a critical threshold. Through 10,000 Monte Carlo simulations, we demonstrate that NPLB improves VRU safety by 71.4%, reducing stranding rates from 9.10% to 2.60%, while requiring signal extensions in only 12.1% of crossing cycles.
CYApr 28
Hands-on PDC in Undergraduate Computing EducationHala ElAarag, Anas Gamal Aly
Parallel and Distributed Computing (PDC) is a critical yet conceptually challenging area of the undergraduate computer science curriculum. While students often encounter these concepts in theory, few gain exposure to experience in real high-performance computing (HPC) environments. Research shows that when students are engaged in project-based learning they retain knowledge more effectively. They also develop a deeper understanding of concepts taught in the classroom. This paper presents a practical assignment in which students engage directly with the University of Florida's HiPerGator supercomputer to implement and benchmark matrix multiplication using Python and C (via POSIX threads and OpenMP). Students navigate batch scheduling, core allocation, and performance tuning, experiences that are rarely accessible at the undergraduate level. We describe the assignment in detail and provide a three-year evaluation across multiple course offerings, highlighting how structured access to real HPC infrastructure can deepen student understanding of parallelism and multithreading.