59.4DCMay 12
Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUsNathan Ng, Walid A. Hanafy, Prashanthi Kadambi et al.
IoT applications increasingly rely on on-device AI accelerators to ensure high performance, especially in low-connectivity and safety-critical scenarios. However, the limited on-chip memory of these accelerators forces inference runtimes to swap model segments between host and accelerator memory, incurring significant swapping overheads. While collaborative processing by partitioning model execution across CPU and accelerator resources can reduce accelerator memory pressure and execution overhead, naive partitioning may worsen end-to-end latency by either shifting excessive computation to the CPU or failing to sufficiently reduce swapping, a problem that is further exacerbated in multi-tenant and dynamic environments. To address these issues, we present SwapLess, a system for adaptive, multi-tenant TPU-CPU collaborative inference on memory-constrained Edge TPUs. SwapLess utilizes an analytic queueing model that captures partition-dependent CPU/TPU service times as well as inter- and intra-model swapping overheads across different workload mixes and request rates. Using this model, SwapLess continuously adjusts both the partition point and CPU core allocation online to minimize end-to-end response time with low decision overhead. An implementation on Edge TPU-equipped platforms demonstrates that SwapLess reduces mean latency by up to 63.8% for single-tenant workloads and up to 77.4% for multi-tenant workloads relative to the default Edge TPU compiler.
69.1QUANT-PHApr 17
Quantum Integrated High-Performance Computing: Foundations, Architectural Elements and Future DirectionsSuman Raj, Siva Sai, Yogesh Simmhan et al.
High-performance computing (HPC) has evolved over decades through multiple architectural transitions, from vector supercomputers to massively parallel CPU clusters and GPU-accelerated systems, continuously expanding the frontier of scientific discovery. With the emergence of quantum processing units (QPUs) as practical computational accelerators, a new opportunity arises to further extend this trajectory by integrating quantum and classical computing paradigms. This paper presents Quantum Integrated High-Performance Computing (QHPC), a visionary architectural framework that unifies CPUs, GPUs, FPGAs, and QPUs as first-class heterogeneous resources. We propose a layered system design comprising unified resource management, quantum-aware scheduling, hybrid workflow orchestration, middleware and programming abstraction, interconnect technologies, and a tiered execution model enabling seamless workload partitioning across classical and quantum backends. A central aspect of our vision is a strong user requests abstraction layer that exposes heterogeneous resources through a unified job submission interface, similar in spirit to existing schedulers such as Slurm, allowing users to describe workloads in a consistent template independent of underlying compute type or location. Drawing insights from prior accelerator integration eras, we outline how QHPC can support emerging workloads in quantum chemistry, materials discovery, combinatorial optimization, and climate modeling. We conclude by highlighting open challenges in building scalable, reliable, and programmable quantum-classical infrastructures that seamlessly connect global users to heterogeneous compute resources for future quantum-classical HPC ecosystems.
SEJul 16, 2024
Building AI Agents for Autonomous Clouds: Challenges and Design PrinciplesManish Shetty, Yinfang Chen, Gagan Somashekar et al.
The rapid growth in the use of Large Language Models (LLMs) and AI Agents as part of software development and deployment is revolutionizing the information technology landscape. While code generation receives significant attention, a higher-impact application lies in using AI agents for operational resilience of cloud services, which currently require significant human effort and domain knowledge. There is a growing interest in AI for IT Operations (AIOps) which aims to automate complex operational tasks, like fault localization and root cause analysis, thereby reducing human intervention and customer impact. However, achieving the vision of autonomous and self-healing clouds through AIOps is hampered by the lack of standardized frameworks for building, evaluating, and improving AIOps agents. This vision paper lays the groundwork for such a framework by first framing the requirements and then discussing design decisions that satisfy them. We also propose AIOpsLab, a prototype implementation leveraging agent-cloud-interface that orchestrates an application, injects real-time faults using chaos engineering, and interfaces with an agent to localize and resolve the faults. We report promising results and lay the groundwork to build a modular and robust framework for building, evaluating, and improving agents for autonomous clouds.
SYMar 3, 2015
An Interoperable Realization of Smart Cities with Plug and Play based Device ManagementPrasant Misra, Vasanth Rajaraman, Kumaresh Dhotrad et al.
The primal problem with Internet of Things (IoT) solutions for smart cities is the lack of interoperability at various levels, and more predominately at the device level. While there exist multitude of platforms from multiple manufacturers, the existing ecosystem still remains highly closed. In this paper, we propose SNaaS or Sensor/Network as a Service: a service layer that enables the creation of the plug-n-play infrastructure, across platforms from multiple vendors, necessary for interoperability and successful deployment of large-scale city wide systems. In order to correctly position the new service layer, we present a high level reference IoT architecture for smart city implementations, and follow it up with the workflow details of SNaaS along with preliminary microbenchmarks.
4.9DCApr 25
Characterizing FaaS Workflows on Public Clouds: The Good, the Bad and the UglyVarad Kulkarni, Nikhil Reddy, Tuhin Khare et al.
Function-as-a-service (FaaS) is a popular serverless computing paradigm for developing event-driven functions that elastically scale on public clouds. FaaS workflows, such as AWS Step Functions and Azure Durable Functions, are composed from FaaS functions, like AWS Lambda and Azure Functions, to build practical applications. But, the complex interactions between functions in the workflow and the limited visibility into the internals of proprietary FaaS platforms are major impediments to gaining a deeper understanding of FaaS workflow platforms. While several works characterize FaaS platforms to derive such insights, there is a lack of a principled and rigorous study for FaaS workflow platforms, which have unique scaling, performance and costing behavior influenced by the platform design, dataflow and workloads. In this article, we perform extensive evaluations of three popular FaaS workflow platforms from AWS and Azure, running 25 micro-benchmark and application workflows over 132k invocations. Our detailed analysis confirms some conventional wisdom but also uncovers unique insights on the function execution, workflow orchestration, inter-function interactions, cold-start scaling and monetary costs. Our observations help developers better configure and program these platforms, set performance and scalability expectations, and identify research gaps on enhancing the platforms.
59.6ROMar 15
AeroGen: Agentic Drone Autonomy through Single-Shot Structured Prompting & Drone SDKKautuk Astu, Yogesh Simmhan
Designing correct UAV autonomy programs is challenging due to joint navigation, sensing and analytics requirements. While LLMs can generate code, their reliability for safety-critical UAVs remains uncertain. This paper presents AeroGen, an open-loop framework that enables consistently correct single-shot AI-generated drone control programs through structured guardrail prompting and integration with the AeroDaaS drone SDK. AeroGen encodes API descriptions, flight constraints and operational world rules directly into the system context prompt, enabling generic LLMs to produce constraint-aware code from user prompts, with minimal example code. We evaluate AeroGen across a diverse benchmark of 20 navigation tasks and 5 drone missions on urban, farm and inspection environments, using both imperative and declarative user prompts. AeroGen generates about 40 lines of AeroDaaS Python code in about 20s per mission, in both real-world and simulations, showing that structured prompting with a well-defined SDK improves robustness, correctness and deployability of LLM-generated drone autonomy programs.
89.2QUANT-PHMar 14
Folding-Free Zero-Noise Extrapolation by Layout-induced Noise DiversityDebarthi Pal, Yogesh Simmhan
Near term quantum processors operate in a noise dominated regime, motivating error mitigation techniques that recover accurate expectation values without full fault tolerance. Zero Noise Extrapolation (ZNE) is a widely used but biased error mitigation method that lacks rigorous error bounds. Its effective application requires nontrivial technical choices, most notably the selection of noise scaling factors and extrapolation models, making ZNE sensitive to user expertise and often necessitating costly trial and error procedures. Here, we introduce Folding Free Zero Noise Extrapolation (FF-ZNE), a method that removes the need for noise factor selection by achieving effective noise amplification without circuit folding. FF-ZNE exploits isomorphic hardware layouts with distinct native noise profiles, such that executing a fixed circuit across these layouts induces controllable variations in the effective noise strength. Under a depolarizing noise model, we analytically show that the resulting extrapolation admits a fixed linear form, eliminating extrapolator choice and enabling a seamless, user independent mitigation procedure. We further propose two algorithms that identify sets of isomorphic hardware layouts on which a given circuit yields sufficiently distinct expectation values to enable reliable zero noise extrapolation. Experiments on a 133 qubit IBM Quantum device demonstrate that FF-ZNE yields mitigated expectation values with average deviations of ~6% and 4.5% for up to 50 qubit EfficientSU2 (sparse) and Hamiltonian simulation (dense) circuits, respectively. The method is thus scalable and applicable to a broad range of circuits. By eliminating noise factor and extrapolator selection, FF-ZNE transforms zero noise extrapolation from a technique requiring expert tuning into a practical, scalable, and broadly accessible error mitigation method for current quantum hardware.
CVNov 4, 2025
The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian TrafficAkash Sharma, Chinmay Mhatre, Sankalp Gawali et al.
This report describes the UVH-26 dataset, the first public release by AIM@IISc of a large-scale dataset of annotated traffic-camera images from India. The dataset comprises 26,646 high-resolution (1080p) images sampled from 2800 Bengaluru's Safe-City CCTV cameras over a 4-week period, and subsequently annotated through a crowdsourced hackathon involving 565 college students from across India. In total, 1.8 million bounding boxes were labeled across 14 vehicle classes specific to India: Cycle, 2-Wheeler (Motorcycle), 3-Wheeler (Auto-rickshaw), LCV (Light Commercial Vehicles), Van, Tempo-traveller, Hatchback, Sedan, SUV, MUV, Mini-bus, Bus, Truck and Other. Of these, 283k-316k consensus ground truth bounding boxes and labels were derived for distinct objects in the 26k images using Majority Voting and STAPLE algorithms. Further, we train multiple contemporary detectors, including YOLO11-S/X, RT-DETR-S/X, and DAMO-YOLO-T/L using these datasets, and report accuracy based on mAP50, mAP75 and mAP50:95. Models trained on UVH-26 achieve 8.4-31.5% improvements in mAP50:95 over equivalent baseline models trained on COCO dataset, with RT-DETR-X showing the best performance at 0.67 (mAP50:95) as compared to 0.40 for COCO-trained weights for common classes (Car, Bus, and Truck). This demonstrates the benefits of domain-specific training data for Indian traffic scenarios. The release package provides the 26k images with consensus annotations based on Majority Voting (UVH-26-MV) and STAPLE (UVH-26-ST) and the 6 fine-tuned YOLO and DETR models on each of these datasets. By capturing the heterogeneity of Indian urban mobility directly from operational traffic-camera streams, UVH-26 addresses a critical gap in existing global benchmarks, and offers a foundation for advancing detection, classification, and deployment of intelligent transportation systems in emerging nations with complex traffic conditions.
56.8CVApr 27Code
BMD-45: A Large-Scale CCTV Vehicle Detection Dataset for Urban Traffic in Developing CitiesAkash Sharma, Chinmay Mhatre, Sankalp Gawali et al.
Robust vehicle detection from fixed CCTV cameras is critical for Intelligent Transportation Systems. Yet existing benchmarks predominantly feature relatively homogeneous, highly organized traffic patterns captured from ego-centric driving perspectives or controlled aerial views. This regional and sensor view bias creates a significant gap. Models trained on datasets such as UA-DETRAC and COCO struggle to generalize to the dense, heterogeneous, disorganized traffic conditions observed in rapidly developing urban centers in emerging economies. To address this limitation, we introduce BMD-45, a large-scale dataset comprising 480K bounding boxes annotated over 45K images captured from over 3.6K operational Safe City CCTV cameras. BMD-45 contains 14 fine-grained vehicle categories, including region-specific modes such as auto-rickshaws and tempo travellers, which are not present in existing benchmarks. The dataset captures real-world deployment challenges, including extreme viewpoint variation, occlusion, and vehicle density . We establish comprehensive baselines using state-of-the-art detectors and reveal a striking domain gap: models fine-tuned on UA-DETRAC achieve only 33.6% mAP@0.50:0.95, compared to 83.8% when trained in-domain on BMD-45, representing a 2.5x improvement that persists even when accounting for novel vehicle classes. This performance gap underscores the critical need for geographically diverse traffic benchmarks and establishes BMD-45 as a baseline for developing robust perception systems in underrepresented urban environments worldwide. The dataset is available at: https://huggingface.co/datasets/iisc-aim/BMD-45.
33.8DCMay 10
ATLAS: Efficient Out-of-Core Inference for Billion-Scale Graph Neural NetworksPranjal Naman, Yogesh Simmhan
Graph Neural Network (GNN) inference on billion-scale graphs is critical for domains like fintech and recommendation systems. Full-graph inference on these large graphs can be challenging due to high communication costs in distributed settings and high I/O costs in disk-backed Out-of-Core (OOC) settings. Existing OOC systems, operating across disk and memory, primarily focus on GNN training and perform poorly for full-graph inference due to massive read amplification, irregular I/O, and memory pressure. We present ATLAS, a disk-based GNN inference framework that enables efficient full-graph, layer-wise inference on graphs whose topologies, features and intermediate embeddings exceed the available memory on single machines. ATLAS replaces gather-based execution with a broadcast-based model that enables sequential, single-pass streaming reads of features and embeddings per layer. A tiered memory-disk hierarchy with minimum-pending-message eviction, graph reordering and a GPU-accelerated pipeline sustains high throughput within $128$ GiB RAM and $2$ TiB SSD. Across out-of-core graphs with up to $4$B edges and $550$ GiB features and multiple GNN architectures, ATLAS improves end-to-end inference time by $\approx12$--$30\times$ over State-of-the-Art (SOTA) OOC baselines on a single workstation, while remaining within $\approx5\%$ when features fit in memory.
59.2ETMay 8
Per-Phase Fidelity Attribution for Quantum Compilers using HBR DecompositionChandrachud Pati, Yogesh Simmhan
Quantum compilers sit between an algorithm's theoretical promise and what executes on physical hardware. Existing benchmarks report aggregate post-transpilation metrics but cannot attribute where fidelity is lost within the compilation pipeline. We present HBR decomposition, a per-phase fidelity attribution model that quantifies relative fidelity loss across High-level structural decomposition (H), Basis translation (B), and Routing (R). We evaluate three production SDKs (Qiskit, PennyLane, TKET) across eight algorithms on two backend topologies: IBM Heron (heavy-hex) and IonQ Forte (all-to-all). The dominant compiler bottleneck is strongly circuit-class dependent: Routing accounts for up to 60% of relative fidelity loss in search-class circuits, while synthesis dominates Hamiltonian simulation workloads. Early synthesis choices amplify or compress downstream routing overhead depending on circuit connectivity. SDK rankings at diagnostic optimization level (opt=0) reverse at production levels (opt=2) for deep circuits, showing that stagewise diagnostics and production results answer different questions. HBR correctly predicts SDK rank ordering across noisy simulations (8 circuits x 3 SDKs x 2 tiers) and real IBM Fez hardware executions, revealing stage-specific bottlenecks that are not observable through aggregate compiler benchmarks.
AIJan 12, 2025
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous CloudsYinfang Chen, Manish Shetty, Gagan Somashekar et al.
AI for IT Operations (AIOps) aims to automate complex operational tasks, such as fault localization and root cause analysis, to reduce human workload and minimize customer impact. While traditional DevOps tools and AIOps algorithms often focus on addressing isolated operational tasks, recent advances in Large Language Models (LLMs) and AI agents are revolutionizing AIOps by enabling end-to-end and multitask automation. This paper envisions a future where AI agents autonomously manage operational tasks throughout the entire incident lifecycle, leading to self-healing cloud systems, a paradigm we term AgentOps. Realizing this vision requires a comprehensive framework to guide the design, development, and evaluation of these agents. To this end, we present AIOPSLAB, a framework that not only deploys microservice cloud environments, injects faults, generates workloads, and exports telemetry data but also orchestrates these components and provides interfaces for interacting with and evaluating agents. We discuss the key requirements for such a holistic framework and demonstrate how AIOPSLAB can facilitate the evaluation of next-generation AIOps agents. Through evaluations of state-of-the-art LLM agents within the benchmark created by AIOPSLAB, we provide insights into their capabilities and limitations in handling complex operational tasks in cloud environments.
10.8DCApr 29
End-to-End and Phase-Level Performance Optimization for Hyperledger FabricPavan Sollu, Aniruddha Mukherjee, Divya Pulivarthi et al.
Hyperledger Fabric (HLF) is a modular, permissioned blockchain widely adopted in enterprise settings. Enhancing its throughput and latency remains challenging, as optimization decisions made in one phase of the transaction lifecycle can adversely affect other phases. In this work, we present a systematic, phase-level and end-to-end study of HLF optimizations along three fronts, combining production-grade testbed experiments with calibrated SimPy simulations. First, we introduce two novel optimization techniques that target commit-phase bottlenecks: block-level pipelining and strategic waiting. In pipelining, we overlap validation and private-data acquisition of successive blocks with state-consistency checks and ledger updates improving commit throughput by up to 1.9x. Strategic waiting coordinates commit progress by temporarily pausing fast leaders and boosting laggers to sustain endorsement parallelism, yielding up to a 1.2x higher throughput. Second, we conduct micro-benchmarking of three configuration levers: private-data dissemination, block-size selection, and endorsement peer selection. Our results reveal that: (i) Relaxed quorums for private-data dissemination significantly reduce latency in both endorsement and commit phases; (ii) Under light workloads, smaller blocks yield lower end-to-end latency, whereas, under heavy workloads, larger blocks are necessary to improve throughput and reduce latency; and (iii) Relaxed leader selection dramatically reduces dropped transactions and boosts endorsement throughput, with a modest increase in MVCC invalidations. Finally, we analyze the interplay among private-data dissemination, VSCC parallelization, and pipelined commits. Interestingly, the throughput gains over a serial commit path are maximized at a moderate level of parallelization. Together, our findings provide phase-aware and protocol-level refinements for optimizing HLF.
74.0QUANT-PHApr 27
Noise-aware selection of circuit cutting strategies under hardware noise non-uniformityDebarthi Pal, Ritajit Majumdar, Padmanabha Venkatagiri Seshadri et al.
Noise in contemporary quantum hardware is highly non-uniform across qubits and couplers, giving rise to localized low-noise "islands" within otherwise noisy device topologies. As quantum workloads scale, executions are increasingly forced to traverse high-noise regions, degrading algorithmic fidelity. Circuit cutting provides a route to circumvent such regions by decomposing large circuits into smaller subcircuits, but its practicality is limited by exponential sampling overhead and the lack of systematic guidance on how cut strategies should align with heterogeneous hardware noise. In this work, we present a hardware-noise-aware circuit cutting framework that explicitly exploits the spatial non-uniformity of noise in quantum devices. Rather than proposing a new cut-finding algorithm, we formalize the problem of device-constraint selection under realistic hardware noise and show that this choice critically determines both execution overhead and effective noise. Using a unified gate- and wire-cutting formulation, we demonstrate that small, hardware-informed relaxations in the device constraint yield exponential reductions in execution overhead while preserving alignment with low-noise hardware regions. Across representative workloads, our method achieves an average reduction in the number of circuit executions ranging from 5-54x for 20-qubit circuits, and enables tractable circuit cutting for 50-qubit circuits and application-level benchmarks where conventional strategies incur prohibitive overhead. These results establish noise-aware device-constraint selection as a necessary ingredient for making circuit cutting resource-efficient and practically deployable on contemporary quantum hardware.
DCAug 11, 2025
Optimizing Federated Learning for Scalable Power-demand Forecasting in MicrogridsRoopkatha Banerjee, Sampath Koti, Gyanendra Singh et al.
Real-time monitoring of power consumption in cities and micro-grids through the Internet of Things (IoT) can help forecast future demand and optimize grid operations. But moving all consumer-level usage data to the cloud for predictions and analysis at fine time scales can expose activity patterns. Federated Learning~(FL) is a privacy-sensitive collaborative DNN training approach that retains data on edge devices, trains the models on private data locally, and aggregates the local models in the cloud. But key challenges exist: (i) clients can have non-independently identically distributed~(non-IID) data, and (ii) the learning should be computationally cheap while scaling to 1000s of (unseen) clients. In this paper, we develop and evaluate several optimizations to FL training across edge and cloud for time-series demand forecasting in micro-grids and city-scale utilities using DNNs to achieve a high prediction accuracy while minimizing the training cost. We showcase the benefit of using exponentially weighted loss while training and show that it further improves the prediction of the final model. Finally, we evaluate these strategies by validating over 1000s of clients for three states in the US from the OpenEIA corpus, and performing FL both in a pseudo-distributed setting and a Pi edge cluster. The results highlight the benefits of the proposed methods over baselines like ARIMA and DNNs trained for individual consumers, which are not scalable.
ROJun 17, 2025
Towards Perception-based Collision Avoidance for UAVs when Guiding the Visually ImpairedSuman Raj, Swapnil Padhi, Ruchi Bhoot et al.
Autonomous navigation by drones using onboard sensors combined with machine learning and computer vision algorithms is impacting a number of domains, including agriculture, logistics, and disaster management. In this paper, we examine the use of drones for assisting visually impaired people (VIPs) in navigating through outdoor urban environments. Specifically, we present a perception-based path planning system for local planning around the neighborhood of the VIP, integrated with a global planner based on GPS and maps for coarse planning. We represent the problem using a geometric formulation and propose a multi DNN based framework for obstacle avoidance of the UAV as well as the VIP. Our evaluations conducted on a drone human system in a university campus environment verifies the feasibility of our algorithms in three scenarios; when the VIP walks on a footpath, near parked vehicles, and in a crowded street.
ROMar 31, 2025
NeoARCADE: Robust Calibration for Distance Estimation to Support Assistive Drones for the Visually ImpairedSuman Raj, Bhavani A Madhabhavi, Madhav Kumar et al.
Autonomous navigation by drones using onboard sensors, combined with deep learning and computer vision algorithms, is impacting a number of domains. We examine the use of drones to autonomously follow and assist Visually Impaired People (VIPs) in navigating urban environments. Estimating the absolute distance between the drone and the VIP, and to nearby objects, is essential to design obstacle avoidance algorithms. Here, we present NeoARCADE (Neo), which uses depth maps over monocular video feeds, common in consumer drones, to estimate absolute distances to the VIP and obstacles. Neo proposes robust calibration technique based on depth score normalization and coefficient estimations to translate relative distances from depth map to absolute ones. It further develops a dynamic recalibration method that can adapt to changing scenarios. We also develop two baseline models, Regression and Geometric, and compare Neo with SOTA depth map approaches and the baselines. We provide detailed evaluations to validate their robustness and generalizability for distance estimation to VIPs and other obstacles in diverse and dynamic conditions, using datasets collected in a campus environment. Neo predicts distances to VIP with an error <30cm, and to different obstacles like cars and bicycles within a maximum error of 60cm, which are better than the baselines. Neo also clearly out-performs SOTA depth map methods, reporting errors up to 5.3-14.6x lower.
CYOct 23, 2025
Towards AI Agents for Course Instruction in Higher Education: Early Experiences from the FieldYogesh Simmhan, Varad Kulkarni
This article presents early findings from designing, deploying and evaluating an AI-based educational agent deployed as the primary instructor in a graduate-level Cloud Computing course at IISc. We detail the design of a Large Language Model (LLM)-driven Instructor Agent, and introduce a pedagogical framework that integrates the Instructor Agent into the course workflow for actively interacting with the students for content delivery, supplemented by the human instructor to offer the course structure and undertake question--answer sessions. We also propose an analytical framework that evaluates the Agent--Student interaction transcripts using interpretable engagement metrics of topic coverage, topic depth and turn-level elaboration. We report early experiences on how students interact with the Agent to explore concepts, clarify doubts and sustain inquiry-driven dialogue during live classroom sessions. We also report preliminary analysis on our evaluation metrics applied across two successive instructional modules that reveals patterns of engagement evolution, transitioning from broad conceptual exploration to deeper, focused inquiry. These demonstrate how structured integration of conversational AI agents can foster reflective learning, offer a reproducible methodology for studying engagement in authentic classroom settings, and support scalable, high-quality higher education.
LGJul 15, 2025
D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series DataHarsha Varun Marisetty, Manik Gupta, Yogesh Simmhan
With advancements in computing and communication technologies, the Internet of Things (IoT) has seen significant growth. IoT devices typically collect data from various sensors, such as temperature, humidity, and energy meters. Much of this data is temporal in nature. Traditionally, data from IoT devices is centralized for analysis, but this approach introduces delays and increased communication costs. Federated learning (FL) has emerged as an effective alternative, allowing for model training across distributed devices without the need to centralize data. In many applications, such as smart home energy and environmental monitoring, the data collected by IoT devices across different locations can exhibit significant variation in trends and seasonal patterns. Accurately forecasting such non-stationary, non-linear time-series data is crucial for applications like energy consumption estimation and weather forecasting. However, these data variations can severely impact prediction accuracy. The key contributions of this paper are: (1) Investigating how non-linear, non-stationary time-series data distributions, like generalized extreme value (gen-extreme) and log norm distributions, affect FL performance. (2) Analyzing how different detrending techniques for non-linear time-series data influence the forecasting model's performance in a FL setup. We generated several synthetic time-series datasets using non-linear data distributions and trained an LSTM-based forecasting model using both centralized and FL approaches. Additionally, we evaluated the impact of detrending on real-world datasets with non-linear time-series data distributions. Our experimental results show that: (1) FL performs worse than centralized approaches when dealing with non-linear data distributions. (2) The use of appropriate detrending techniques improves FL performance, reducing loss across different data distributions.
DCJun 14, 2025
Optimizing Federated Learning using Remote Embeddings for Graph Neural NetworksPranjal Naman, Yogesh Simmhan
Graph Neural Networks (GNNs) have experienced rapid advancements in recent years due to their ability to learn meaningful representations from graph data structures. Federated Learning (FL) has emerged as a viable machine learning approach for training a shared model on decentralized data, addressing privacy concerns while leveraging parallelism. Existing methods that address the unique requirements of federated GNN training using remote embeddings to enhance convergence accuracy are limited by their diminished performance due to large communication costs with a shared embedding server. In this paper, we present OpES, an optimized federated GNN training framework that uses remote neighbourhood pruning, and overlaps pushing of embeddings to the server with local training to reduce the network costs and training time. The modest drop in per-round accuracy due to pre-emptive push of embeddings is out-stripped by the reduction in per-round training time for large and dense graphs like Reddit and Products, converging up to $\approx2\times$ faster than the state-of-the-art technique using an embedding server and giving up to $20\%$ better accuracy than vanilla federated GNN learning.
LGMay 31, 2023
Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated LearningM Yashwanth, Gaurav Kumar Nayak, Arya Singh et al.
Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across the local data distributions observed by each of these clients. Under such non-iid label distributions across clients, FL suffers from the 'client-drift' problem where every client drifts to its own local optimum. This results in slower convergence and poor performance of the aggregated model. To address this limitation, we propose a novel regularization technique based on adaptive self-distillation (ASD) for training models on the client side. Our regularization scheme adaptively adjusts to each client's training data based on the global model's prediction entropy and the client-data label distribution. We show in this paper that our proposed regularization (ASD) can be easily integrated atop existing, state-of-the-art FL algorithms, leading to a further boost in the performance of these off-the-shelf methods. We theoretically explain how incorporation of ASD regularizer leads to reduction in client-drift and empirically justify the generalization ability of the trained model. We demonstrate the efficacy of our approach through extensive experiments on multiple real-world benchmarks and show substantial gains in performance when the proposed regularizer is combined with popular FL methods.
ROSep 14, 2021
CORNET 2.0: A Co-Simulation Middleware for Robot NetworksSrikrishna Acharya, Bharadwaj Amrutur, Mukunda Bharatheesha et al.
We present a networked co-simulation framework for multi-robot systems applications. We require a simulation framework that captures both physical interactions and communications aspects to effectively design such complex systems. This is necessary to co-design the multi-robots' autonomy logic and the communication protocols. The proposed framework extends existing tools to simulate the robot's autonomy and network-related aspects. We have used Gazebo with ROS/ROS2 to develop the autonomy logic for robots and mininet-WiFi as the network simulator to capture the cyber-physical systems properties of the multi-robot system. This framework addresses the need to seamlessly integrate the two simulation environments by synchronizing mobility and time, allowing for easy migration of the algorithms to real platforms. The framework supports container-based virtualization and extends a generic robotic framework by decoupling the data plane and control plane.
ROFeb 6, 2021
Heuristic Algorithms for Co-scheduling of Edge Analytics and Routes for UAV Fleet MissionsAakash Khochare, Yogesh Simmhan, Francesco Betti Sorbelli et al.
Unmanned Aerial Vehicles (UAVs) or drones are increasingly used for urban applications like traffic monitoring and construction surveys. Autonomous navigation allows drones to visit waypoints and accomplish activities as part of their mission. A common activity is to hover and observe a location using on-board cameras. Advances in Deep Neural Networks (DNNs) allow such videos to be analyzed for automated decision making. UAVs also host edge computing capability for on-board inferencing by such DNNs. To this end, for a fleet of drones, we propose a novel Mission Scheduling Problem (MSP) that co-schedules the flight routes to visit and record video at waypoints, and their subsequent on-board edge analytics. The proposed schedule maximizes the utility from the activities while meeting activity deadlines as well as energy and computing constraints. We first prove that MSP is NP-hard and then optimally solve it by formulating a mixed integer linear programming (MILP) problem. Next, we design two efficient heuristic algorithms, JSC and VRC, that provide fast sub-optimal solutions. Evaluation of these three schedulers using real drone traces demonstrate utility-runtime trade-offs under diverse workloads.
HCJul 2, 2014
Towards a Practical Architecture for India Centric Internet of ThingsPrasant Misra, Yogesh Simmhan, Jay Warrior
An effective architecture for the Internet of Things (IoT), particularly for an emerging nation like India with limited technology penetration at the national scale, should be based on tangible technology advances in the present, practical application scenarios of social and entrepreneurial value, and ubiquitous capabilities that make the realization of IoT affordable and sustainable. Humans, data, communication and devices play key roles in the IoT ecosystem that we perceive. In a push towards this sustainable and practical IoT Architecture for India, we synthesize ten design paradigms to consider.
LGJun 2, 2014
Holistic Measures for Evaluating Prediction Models in Smart GridsSaima Aman, Yogesh Simmhan, Viktor K. Prasanna
The performance of prediction models is often based on "abstract metrics" that estimate the model's ability to limit residual errors between the observed and predicted values. However, meaningful evaluation and selection of prediction models for end-user domains requires holistic and application-sensitive performance measures. Inspired by energy consumption prediction models used in the emerging "big data" domain of Smart Power Grids, we propose a suite of performance measures to rationally compare models along the dimensions of scale independence, reliability, volatility and cost. We include both application independent and dependent measures, the latter parameterized to allow customization by domain experts to fit their scenario. While our measures are generalizable to other domains, we offer an empirical analysis using real energy use data for three Smart Grid applications: planning, customer education and demand response, which are relevant for energy sustainability. Our results underscore the value of the proposed measures to offer a deeper insight into models' behavior and their impact on real applications, which benefit both data mining researchers and practitioners.