31.2NIMay 19
SKYLINK: Scalable and Resilient Link Management in LEO Satellite NetworkWanja de Sombre, Arash Asadi, Debopam Bhattacherjee et al.
The rapid growth of space-based services has established LEO satellite networks as a promising option for global broadband connectivity. Next-generation LEO networks leverage inter-satellite links (ISLs) to provide faster and more reliable communications compared to traditional bent-pipe architectures, even in remote regions. However, the high mobility of satellites, dynamic traffic patterns, and potential link failures pose significant challenges for efficient and resilient routing. To address these challenges, we model the LEO satellite network as a time-varying graph comprising a constellation of satellites and ground stations. Our objective is to minimize a weighted sum of average delay and packet drop rate. Each satellite independently decides how to distribute its incoming traffic to neighboring nodes in real time. Given the infeasibility of finding optimal solutions at scale, due to the exponential growth of routing options and uncertainties in link capacities, we propose SKYLINK, a novel fully distributed learning strategy for link management in LEO satellite networks. SKYLINK enables each satellite to adapt to the time-varying network conditions, ensuring real-time responsiveness, scalability to millions of users, and resilience to network failures, while maintaining low communication overhead and computational complexity. To support the evaluation of SKYLINK at global scale, we develop a new simulator for large-scale LEO satellite networks. For 25.4 million users, SKYLINK reduces the weighted sum of average delay and drop rate by 29% compared to the bent-pipe approach, and by 92% compared to Dijkstra. It lowers drop rates by 95% relative to k-shortest paths, 99% relative to Dijkstra, and 74% compared to the bent-pipe baseline, while achieving up to 46% higher throughput. At the same time, SKYLINK maintains constant computational complexity with respect to constellation size.
ROJul 21, 2023
BatMobility: Towards Flying Without Seeing for Autonomous DronesEmerson Sie, Zikun Liu, Deepak Vasisht
Unmanned aerial vehicles (UAVs) rely on optical sensors such as cameras and lidar for autonomous operation. However, such optical sensors are error-prone in bad lighting, inclement weather conditions including fog and smoke, and around textureless or transparent surfaces. In this paper, we ask: is it possible to fly UAVs without relying on optical sensors, i.e., can UAVs fly without seeing? We present BatMobility, a lightweight mmWave radar-only perception system for UAVs that eliminates the need for optical sensors. BatMobility enables two core functionalities for UAVs -- radio flow estimation (a novel FMCW radar-based alternative for optical flow based on surface-parallel doppler shift) and radar-based collision avoidance. We build BatMobility using commodity sensors and deploy it as a real-time system on a small off-the-shelf quadcopter running an unmodified flight controller. Our evaluation shows that BatMobility achieves comparable or better performance than commercial-grade optical sensors across a wide range of scenarios.
RONov 19, 2023
Radarize: Enhancing Radar SLAM with Generalizable Doppler-Based OdometryEmerson Sie, Xinyu Wu, Heyu Guo et al.
Millimeter-wave (mmWave) radar is increasingly being considered as an alternative to optical sensors for robotic primitives like simultaneous localization and mapping (SLAM). While mmWave radar overcomes some limitations of optical sensors, such as occlusions, poor lighting conditions, and privacy concerns, it also faces unique challenges, such as missed obstacles due to specular reflections or fake objects due to multipath. To address these challenges, we propose Radarize, a self-contained SLAM pipeline that uses only a commodity single-chip mmWave radar. Our radar-native approach uses techniques such as Doppler shift-based odometry and multipath artifact suppression to improve performance. We evaluate our method on a large dataset of 146 trajectories spanning 4 buildings and mounted on 3 different platforms, totaling approximately 4.7 Km of travel distance. Our results show that our method outperforms state-of-the-art radar and radar-inertial approaches by approximately 5x in terms of odometry and 8x in terms of end-to-end SLAM, as measured by absolute trajectory error (ATE), without the need for additional sensors such as IMUs or wheel encoders.
RONov 16, 2022
RF-Annotate: Automatic RF-Supervised Image Annotation of Common Objects in ContextEmerson Sie, Deepak Vasisht
Wireless tags are increasingly used to track and identify common items of interest such as retail goods, food, medicine, clothing, books, documents, keys, equipment, and more. At the same time, there is a need for labelled visual data featuring such items for the purpose of training object detection and recognition models for robots operating in homes, warehouses, stores, libraries, pharmacies, and so on. In this paper, we ask: can we leverage the tracking and identification capabilities of such tags as a basis for a large-scale automatic image annotation system for robotic perception tasks? We present RF-Annotate, a pipeline for autonomous pixel-wise image annotation which enables robots to collect labelled visual data of objects of interest as they encounter them within their environment. Our pipeline uses unmodified commodity RFID readers and RGB-D cameras, and exploits arbitrary small-scale motions afforded by mobile robotic platforms to spatially map RFIDs to corresponding objects in the scene. Our only assumption is that the objects of interest within the environment are pre-tagged with inexpensive battery-free RFIDs costing 3-15 cents each. We demonstrate the efficacy of our pipeline on several RGB-D sequences of tabletop scenes featuring common objects in a variety of indoor environments.
11.7NIApr 22
StarLoc: Pinpointing Transmitting LEO Satellites from a Single Passive ArrayIshani Janveja, Jida Zhang, Emerson Sie et al.
This paper focuses on 3D localization of transmitting satellites in low Earth orbits (LEO). 3D localization of transmitters in low orbits is an important emerging problem for many applications such as spectrum management, orbit determination, and backup for GPS failures in orbit. We present StarLoc -- a system to geolocate transmitters in space using a combination of orbital modeling and a new interferometric 3D angle-of-arrival estimation technique. StarLoc's design relies on a unique insight -- the motion of satellites is governed by orbital dynamics and is therefore along a 2D manifold in a 3D space. This reduces the degrees of freedom in satellite motion and allows us to 3D-locate and track a satellite with just three antennas in a 2D plane. We evaluate the system using signal transmissions from 81 Starlink satellites. Our results show that StarLoc can estimate the 3D-angle of a satellite within 0.7 degrees and the orbital range within 5 km. Our dataset and implementation are available at: https://connectedsystemslab.github.io/starloc.
32.6NIMay 6
SILC: Lookahead Caching for Short-form Video Delivery SystemsMaleeha Masood, Shreya Kannan, Om Chabra et al.
Short video platforms like TikTok, Instagram Reels, and YouTube Shorts have gained immense popularity in the last few years and are responsible for a large and growing fraction of Internet traffic. We identify two unique opportunities for improving short video delivery using their existing interactions with content delivery networks (CDNs). First, short videos use a push-based recommendation system, where the user is presented a sequence of videos recommended by the algorithm rather than user explicitly picking content to watch (e.g., in YouTube). Such push-based short video systems offer a unique opportunity for system design by providing visibility into upcoming requests. Second, the popularity of these videos follows a highly skewed Pareto distribution, leading to geographical and temporal overlap amongst videos being served. We leverage these opportunities to build SILC - a lookahead-aware caching system, aimed at (i) reducing CDN cache miss rates, as well as (ii) reducing midgress bandwidth between the CDN and the origin server. Our evaluation of SILC uses traces that we collect from real users, through (i) an in-person user study, and (ii) a data donation program involving 100 TikTok users across the world. Using a combination of these traces, we simulate traffic from 10,000 simultaneous users. Our evaluation shows that, compared to 10 state-of-the-art heuristic and learning-based cache eviction policies, SILC reduces a CDN's midgress costs by 11.1% to 111%.
DCNov 20, 2024
Transforming the Hybrid Cloud for Emerging AI WorkloadsDeming Chen, Alaa Youssef, Ruchi Pendse et al.
This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co-design approaches, emphasizing usability, manageability, affordability, adaptability, efficiency, and scalability. By integrating cutting-edge technologies such as generative and agentic AI, cross-layer automation and optimization, unified control plane, and composable and adaptive system architecture, the proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. Incorporating quantum computing as it matures will enable quantum-accelerated simulations for materials science, climate modeling, and other high-impact domains. Collaborative efforts between academia and industry are central to this vision, driving advancements in foundation models for material design and climate solutions, scalable multimodal data processing, and enhanced physics-based AI emulators for applications like weather forecasting and carbon sequestration. Research priorities include advancing AI agentic systems, LLM as an Abstraction (LLMaaA), AI model optimization and unified abstractions across heterogeneous infrastructure, end-to-end edge-cloud transformation, efficient programming model, middleware and platform, secure infrastructure, application-adaptive cloud systems, and new quantum-classical collaborative workflows. These ideas and solutions encompass both theoretical and practical research questions, requiring coordinated input and support from the research community. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms, fostering breakthroughs in AI-driven applications and scientific discovery across academia, industry, and society.
RONov 6, 2024
Fed-EC: Bandwidth-Efficient Clustering-Based Federated Learning For Autonomous Visual Robot NavigationShreya Gummadi, Mateus V. Gasparino, Deepak Vasisht et al.
Centralized learning requires data to be aggregated at a central server, which poses significant challenges in terms of data privacy and bandwidth consumption. Federated learning presents a compelling alternative, however, vanilla federated learning methods deployed in robotics aim to learn a single global model across robots that works ideally for all. But in practice one model may not be well suited for robots deployed in various environments. This paper proposes Federated-EmbedCluster (Fed-EC), a clustering-based federated learning framework that is deployed with vision based autonomous robot navigation in diverse outdoor environments. The framework addresses the key federated learning challenge of deteriorating model performance of a single global model due to the presence of non-IID data across real-world robots. Extensive real-world experiments validate that Fed-EC reduces the communication size by 23x for each robot while matching the performance of centralized learning for goal-oriented navigation and outperforms local learning. Fed-EC can transfer previously learnt models to new robots that join the cluster.
CVMay 2, 2024
S4: Self-Supervised Sensing Across the SpectrumJayanth Shenoy, Xingjian Davis Zhang, Shlok Mehrotra et al.
Satellite image time series (SITS) segmentation is crucial for many applications like environmental monitoring, land cover mapping and agricultural crop type classification. However, training models for SITS segmentation remains a challenging task due to the lack of abundant training data, which requires fine grained annotation. We propose S4 a new self-supervised pre-training approach that significantly reduces the requirement for labeled training data by utilizing two new insights: (a) Satellites capture images in different parts of the spectrum such as radio frequencies, and visible frequencies. (b) Satellite imagery is geo-registered allowing for fine-grained spatial alignment. We use these insights to formulate pre-training tasks in S4. We also curate m2s2-SITS, a large-scale dataset of unlabeled, spatially-aligned, multi-modal and geographic specific SITS that serves as representative pre-training data for S4. Finally, we evaluate S4 on multiple SITS segmentation datasets and demonstrate its efficacy against competing baselines while using limited labeled data.
CVMar 17, 2025
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial DataHaozhe Si, Yuxuan Wan, Minh Do et al.
Geospatial raster data, such as that collected by satellite-based imaging systems at different times and spectral bands, hold immense potential for enabling a wide range of high-impact applications. This potential stems from the rich information that is spatially and temporally contextualized across multiple channels and sensing modalities. Recent work has adapted existing self-supervised learning approaches for such geospatial data. However, they fall short of scalable model architectures, leading to inflexibility and computational inefficiencies when faced with an increasing number of channels and modalities. To address these limitations, we introduce Low-rank Efficient Spatial-Spectral Vision Transformer with three key innovations: i) the LESS Attention Block that approximates high-dimensional spatial-spectral attention through Kronecker's product of the low-dimensional spatial and spectral attention components; ii) the Continuous Positional-Channel Embedding Layer that preserves both the continuity and physical characteristics of each spatial-spectral patch; and iii) the Perception Field Mask that exploits local spatial dependencies by constraining attention to neighboring patches. To evaluate the proposed innovations, we construct GFM-Bench, which serves as a comprehensive benchmark for such geospatial raster data. We pretrain LESS ViT using a Hyperspectral Masked Autoencoder framework with integrated positional and channel masking strategies. Experimental results demonstrate that our proposed method achieves competitive performance against state-of-the-art multi-modal geospatial foundation models while outperforming them on cross-satellite generalization tasks with higher computational efficiency. The flexibility and extensibility of our framework make it a promising direction for future geospatial data analysis tasks that involve a wide range of modalities and channels.
HCSep 17, 2025
AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language ModelsBeitong Tian, Lingzhi Zhao, Bo Chen et al.
Underwater activities like scuba diving enable millions annually to explore marine environments for recreation and scientific research. Maintaining situational awareness and effective communication are essential for diver safety. Traditional underwater communication systems are often bulky and expensive, limiting their accessibility to divers of all levels. While recent systems leverage lightweight smartphones and support text messaging, the messages are predefined and thus restrict context-specific communication. In this paper, we present AquaVLM, a tap-and-send underwater communication system that automatically generates context-aware messages and transmits them using ubiquitous smartphones. Our system features a mobile vision-language model (VLM) fine-tuned on an auto-generated underwater conversation dataset and employs a hierarchical message generation pipeline. We co-design the VLM and transmission, incorporating error-resilient fine-tuning to improve the system's robustness to transmission errors. We develop a VR simulator to enable users to experience AquaVLM in a realistic underwater environment and create a fully functional prototype on the iOS platform for real-world experiments. Both subjective and objective evaluations validate the effectiveness of AquaVLM and highlight its potential for personal underwater communication as well as broader mobile VLM applications.
LGApr 16, 2025
Support is All You Need for Certified VAE TrainingChangming Xu, Debangshu Banerjee, Deepak Vasisht et al.
Variational Autoencoders (VAEs) have become increasingly popular and deployed in safety-critical applications. In such applications, we want to give certified probabilistic guarantees on performance under adversarial attacks. We propose a novel method, CIVET, for certified training of VAEs. CIVET depends on the key insight that we can bound worst-case VAE error by bounding the error on carefully chosen support sets at the latent layer. We show this point mathematically and present a novel training algorithm utilizing this insight. We show in an extensive evaluation across different datasets (in both the wireless and vision application areas), architectures, and perturbation magnitudes that our method outperforms SOTA methods achieving good standard performance with strong robustness guarantees.
NISep 10, 2021
No Size Fits All: Automated Radio Configuration for LPWANsZerina Kapetanovic, Deepak Vasisht, Tusher Chakraborty et al.
Low power long-range networks like LoRa have become increasingly mainstream for Internet of Things deployments. Given the versatility of applications that these protocols enable, they support many data rates and bandwidths. Yet, for a given network that supports hundreds of devices over multiple miles, the network operator typically needs to specify the same configuration or among a small subset of configurations for all the client devices to communicate with the gateway. This one-size-fits-all approach is highly inefficient in large networks. We propose an alternative approach -- we allow network devices to transmit at any data rate they choose. The gateway uses the first few symbols in the preamble to classify the correct data rate, switches its configuration, and then decodes the data. Our design leverages the inherent asymmetry in outdoor IoT deployments where the clients are power-starved and resource-constrained, but the gateway is not. Our gateway design, Proteus, runs a neural network architecture and is backward compatible with existing LoRa protocols. Our experiments reveal that Proteus can identify the correct configuration with over 97% accuracy in both indoor and outdoor deployments. Our network architecture leads to a 3.8 to 11 times increase in throughput for our LoRa testbed.