39.3AIApr 28Code
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and OptimizationAlexander Blasberg, Vasilis Kypriotis, Dimitrios Skarlatos
Rapid advances in Large Language Models (LLMs) create new opportunities by enabling efficient exploration of broad, complex design spaces. This is particularly valuable in computer architecture, where performance depends on microarchitectural designs and policies drawn from vast combinatorial spaces. We introduce Agentic Architect, an agentic AI framework for computer architecture design exploration and optimization that combines LLM-driven code evolution with cycle-accurate simulation. The human architect specifies the optimization target, seed design, scoring function, simulator interface, and benchmark split, while the LLM explores implementations within these constraints. Across cache replacement, data prefetching, and branch prediction, Agentic Architect matches or exceeds state-of-the-art designs. Our best evolved cache replacement design achieves a 1.062x geomean IPC speedup over LRU, 0.6% over Mockingjay (1.056x). Our evolved branch predictor achieves a 1.100x geomean IPC speedup over Bimodal, 1.5% over its Hashed Perceptron seed (1.085x). Finally, our evolved prefetcher achieves a 1.76x geomean IPC speedup over no prefetching, 17% over its VA/AMPM Lite seed (1.59x) and 21% over SMS (1.55x). Our analysis surfaces several findings about agentic AI-driven microarchitecture design. Across evolved designs, components often correspond to known techniques; the novelty lies in how they are coordinated. The architect's role is shifting, but the human remains central. Seed quality bounds what search can achieve: evolution can refine and extend an existing mechanism, but cannot compensate for a weak foundation. Likewise, objectives, constraints, and prompt guidance affect reliability and generalization. Overall, Agentic Architect is the first end-to-end open-source framework for agentic AI architecture exploration and optimization.
49.1OSApr 20
Equilibria: Fair Multi-Tenant CXL Memory Tiering At ScaleKaiyang Zhao, Neha Gholkar, Hasan Maruf et al.
Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requires software-level tiering for hyperscaler workloads. Existing tiering solutions, including current Linux support, face fundamental limitations in production deployments. First, they lack multi-tenancy support, failing to handle stacked homogeneous or heterogeneous workloads. Second, limited control-plane flexibility leads to fairness violations and performance variability. Finally, insufficient observability prevents operators from diagnosing performance pathologies at scale. We present Equilibria, an OS framework enabling fair, multi-tenant CXL tiering at datacenter scale. Equilibria provides per-container controls for memory fair-share allocation and fine-grained observability of tiered-memory usage and operations. It further enforces flexible, user-specified fairness policies through regulated promotion and demotion, and mitigates noisy-neighbor interference by suppressing thrashing. Evaluated in a large hyperscaler fleet using production workloads and benchmarks, Equilibria helps workloads meet service level objectives (SLOs) while avoiding performance interference. It improves performance over the state-of-the-art Linux solution, TPP, by up to 52% for production workloads and 1.7x for benchmarks. All Equilibria patches have been released to the Linux community.
CRDec 12, 2025
A Scalable Multi-GPU Framework for Encrypted Large-Model InferenceSiddharth Jayashankar, Joshua Kim, Michael B. Sullivan et al.
Encrypted AI using fully homomorphic encryption (FHE) provides strong privacy guarantees; but its slow performance has limited practical deployment. Recent works proposed ASICs to accelerate FHE, but require expensive advanced manufacturing processes that constrain their accessibility. GPUs are a far more accessible platform, but achieving ASIC-level performance using GPUs has remained elusive. Furthermore, state-of-the-art approaches primarily focus on small models that fit comfortably within a single device. Supporting large models such as LLMs in FHE introduces a dramatic increase in computational complexity that requires optimized GPU kernels, along with managing terabyte-scale memory footprints that far exceed the capacity of a single GPU. This paper presents Cerium, a multi-GPU framework for FHE inference on large models. Cerium integrates a domain-specific language, an optimizing compiler, and a runtime system to automatically generate high-performance GPU kernels, manage terabyte-scale memory footprints, and parallelize computation across multiple GPUs. It introduces new IR constructs, compiler passes, sparse polynomial representations, memory-efficient data layouts, and communication-aware parallelization techniques that together enable encrypted inference for models ranging from small CNNs to Llama3-8B. We build Cerium on NVIDIA GPUs and demonstrate significant performance gains. For small models, Cerium outperforms expert-written hand-optimized GPU libraries by up to 2.25 times. Cerium achieves performance competitive with state-of-the-art FHE ASICs, outright matching prior FHE ASIC CraterLake. It is the first GPU system to execute bootstrapping in under 10 milliseconds, achieving 7.5 milliseconds, and is the first to demonstrate encrypted inference for BERT-Base and Llama3-8B in 8 seconds and 134 seconds, respectively.
CVNov 13, 2024
High-resolution optical and acoustic remote sensing datasets of the Puck Lagoon, Southern BalticŁukasz Janowski, Dimitrios Skarlatos, Panagiotis Agrafiotis et al.
The very shallow marine basin of Puck Lagoon in the southern Baltic Sea, on the Northern coast of Poland, hosts valuable benthic habitats and cultural heritage sites. These include, among others, protected Zostera marina meadows, one of the Baltic's major medieval harbours, a ship graveyard, and likely other submerged features that are yet to be discovered. Prior to this project, no comprehensive high-resolution remote sensing data were available for this area. This article describes the first Digital Elevation Models (DEMs) derived from a combination of airborne bathymetric LiDAR, multibeam echosounder, airborne photogrammetry and satellite imagery. These datasets also include multibeam echosounder backscatter and LiDAR intensity, allowing determination of the character and properties of the seafloor. Combined, these datasets are a vital resource for assessing and understanding seafloor morphology, benthic habitats, cultural heritage, and submerged landscapes. Given the significance of Puck Lagoon's hydrographical, ecological, geological, and archaeological environs, the high-resolution bathymetry, acquired by our project, can provide the foundation for sustainable management and informed decision-making for this area of interest.
CVMay 24, 2024
MagicBathyNet: A Multimodal Remote Sensing Dataset for Bathymetry Prediction and Pixel-based Classification in Shallow WatersPanagiotis Agrafiotis, Łukasz Janowski, Dimitrios Skarlatos et al.
Accurate, detailed, and high-frequent bathymetry, coupled with complex semantic content, is crucial for the undermapped shallow seabed areas facing intense climatological and anthropogenic pressures. Current methods exploiting remote sensing images to derive bathymetry or seabed classes mainly exploit non-open data. This lack of openly accessible benchmark archives prevents the wider use of deep learning methods in such applications. To address this issue, in this paper we present the MagicBathyNet, which is a benchmark dataset made up of image patches of Sentinel2, SPOT-6 and aerial imagery, bathymetry in raster format and annotations of seabed classes. MagicBathyNet is then exploited to benchmark state-of-the-art methods in learning-based bathymetry and pixel-based classification. Dataset, pre-trained weights, and code are publicly available at www.magicbathy.eu/magicbathynet.html.
OSApr 21, 2025
LithOS: An Operating System for Efficient Machine Learning on GPUsPatrick H. Coppock, Brian Zhang, Eliot H. Solomon et al.
The surging demand for GPUs in datacenters for machine learning (ML) has made efficient GPU utilization crucial. However, meeting the diverse needs of ML models while optimizing resource usage is challenging. To enable transparent, fine-grained GPU management that maximizes utilization and energy efficiency while maintaining strong isolation, an operating system (OS) approach is needed. This paper introduces LithOS, a first step toward a GPU OS. LithOS includes the following new abstractions and mechanisms for efficient GPU resource management: (i) a novel TPC Scheduler that supports spatial scheduling at the granularity of individual TPCs, unlocking efficient TPC stealing between workloads; (ii) transparent kernel atomization to reduce head-of-line blocking and enable dynamic resource reallocation mid-execution; (iii) a lightweight hardware right-sizing mechanism that determines the minimal TPC resources needed per atom; and (iv) a transparent power management mechanism that reduces power consumption based on in-flight work behavior. We implement LithOS in Rust and evaluate its performance across extensive ML environments, comparing it to state-of-the-art solutions from NVIDIA and prior research. For inference stacking, LithOS reduces tail latencies by 13x compared to MPS; compared to the best SotA, it reduces tail latencies by 3x while improving aggregate throughput by 1.6x. In hybrid inference-training stacking, LithOS reduces tail latencies by 4.7x compared to MPS; compared to the best SotA, it reduces tail latencies 1.18x while improving aggregate throughput by 1.35x. Finally, for a modest performance hit under 4%, LithOS's right-sizing provides a quarter of GPU capacity savings on average, while for a 7% hit, its power management yields a quarter of a GPU's energy savings. Overall, LithOS increases GPU efficiency, establishing a foundation for future OS research on GPUs.
85.7DCApr 6
The Energy Cost of Execution-Idle in GPU ClustersYiran Lei, Jared Fernandez, Vasilis Kypriotis et al.
GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. Using per-second telemetry from a large academic AI cluster, we characterize execution-idle as a recurring low-activity yet high-power state in real deployments. Across diverse workloads and multiple GPU generations, it accounts for 19.7% of in-execution time and 10.7% of energy. This suggests a need to both reduce the cost of execution-idle and reduce exposure to it. We therefore build two prototypes: one uses automatic downscaling during execution-idle, and the other uses load imbalance to reduce exposure, both with performance trade-offs. These findings suggest that future energy-efficient GPU systems should treat execution-idle as a first-class operating state.
HCOct 14, 2020
Underwater Augmented Reality for improving the diving experience in submerged archaeological sitesFabio Bruno, Loris Barbieri, Marino Mangeruga et al.
The Mediterranean Sea has a vast maritime heritage which exploitation is made difficult because of the many limitations imposed by the submerged environment. Archaeological diving tours, in fact, suffer from the impossibility to provide underwater an exhaustive explanation of the submerged remains. Furthermore, low visibility conditions, due to water turbidity and biological colonization, sometimes make very confusing for tourists to find their way around in the underwater archaeological site. To this end, the paper investigates the feasibility and potentials of the underwater Augmented Reality (UWAR) technologies developed in the iMARECulture project for improving the experience of the divers that visit the Underwater Archaeological Park of Baiae (Naples). In particular, the paper presents two UWAR technologies that adopt hybrid tracking techniques to perform an augmented visualization of the actual conditions and of a hypothetical 3D reconstruction of the archaeological remains as appeared in the past. The first one integrates a marker-based tracking with inertial sensors, while the second one adopts a markerless approach that integrates acoustic localization and visual-inertial odometry. The experimentations show that the proposed UWAR technologies could contribute to have a better comprehension of the underwater site and its archaeological remains.
CVOct 2, 2020
Image-based underwater 3D reconstruction for Cultural Heritage: from image collection to 3D. Critical steps and considerationsDimitrios Skarlatos, Panagiotis Agrafiotis
Underwater Cultural Heritage (CH) sites are widely spread; from ruins in coastlines up to shipwrecks in deep. The documentation and preservation of this heritage is an obligation of the mankind, dictated also by the international treaties like the Convention on the Protection of the Underwater Cultural Her-itage which fosters the use of "non-destructive techniques and survey meth-ods in preference over the recovery of objects". However, submerged CH lacks in protection and monitoring in regards to the land CH and nowadays recording and documenting, for digital preservation as well as dissemination through VR to wide public, is of most importance. At the same time, it is most difficult to document it, due to inherent restrictions posed by the environ-ment. In order to create high detailed textured 3D models, optical sensors and photogrammetric techniques seems to be the best solution. This chapter dis-cusses critical aspects of all phases of image based underwater 3D reconstruc-tion process, from data acquisition and data preparation using colour restora-tion and colour enhancement algorithms to Structure from Motion (SfM) and Multi-View Stereo (MVS) techniques to produce an accurate, precise and complete 3D model for a number of applications.
CVFeb 27, 2019
Shallow Water Bathymetry Mapping from UAV Imagery based on Machine LearningPanagiotis Agrafiotis, Dimitrios Skarlatos, Andreas Georgopoulos et al.
The determination of accurate bathymetric information is a key element for near offshore activities, hydrological studies such as coastal engineering applications, sedimentary processes, hydrographic surveying as well as archaeological mapping and biological research. UAV imagery processed with Structure from Motion (SfM) and Multi View Stereo (MVS) techniques can provide a low-cost alternative to established shallow seabed mapping techniques offering as well the important visual information. Nevertheless, water refraction poses significant challenges on depth determination. Till now, this problem has been addressed through customized image-based refraction correction algorithms or by modifying the collinearity equation. In this paper, in order to overcome the water refraction errors, we employ machine learning tools that are able to learn the systematic underestimation of the estimated depths. In the proposed approach, based on known depth observations from bathymetric LiDAR surveys, an SVR model was developed able to estimate more accurately the real depths of point clouds derived from SfM-MVS procedures. Experimental results over two test sites along with the performed quantitative validation indicated the high potential of the developed approach.