Omid Abari

h-index24

4papers

5citations

Novelty53%

AI Score47

Ranked #33,914 of 194,257 authors (top 17%)#12,079 in CV (top 20%)

4 Papers

4.4CVApr 29

Camera-RFID Fusion for Robust Asset Tracking in Forested Environments

John Hateley, Sriram Narasimhan, Omid Abari

Passive RFID tags offer a cost-effective and scalable solution for tracking numerous deployed assets. However, in forested environments, signal attenuation and multipath effects generally limit RFID spatial accuracy to the meter level. Conversely, while cameras employing stereo vision can achieve centimeter-level precision, relying solely on computer vision fails to resolve issues arising from spatial association ambiguity and partial occlusions in dense settings. Fusing these modalities allows systems to harness the high-accuracy benefits of vision while retaining the robust, non-line-of-sight identification advantages of RFID. Yet, a primary challenge in achieving this, which is the central focus of this paper, lies in accurately associating the disparate trajectories generated by these two sensors. To overcome this limitation, we introduce a novel camera--RFID fusion framework that integrates depth and object information with advanced trajectory-matching algorithms. By successfully bridging the meter-to-centimeter accuracy gap, the proposed approach helps achieve reliable tag localization even when assets temporarily leave the camera's field of view. To the best of our knowledge, this represents the first application of camera--RFID fusion for asset tracking in natural forested environments.

7.1LGDec 17, 2025

Tracking Wildfire Assets with Commodity RFID and Gaussian Process Modeling

John Hateley, Sriram Narasimhan, Omid Abari

This paper presents a novel, cost-effective, and scalable approach to track numerous assets distributed in forested environments using commodity Radio Frequency Identification (RFID) targeting wildfire response applications. Commodity RFID systems suffer from poor tag localization when dispersed in forested environments due to signal attenuation, multi-path effects and environmental variability. Current methods to address this issue via fingerprinting rely on dispersing tags at known locations {\em a priori}. In this paper, we address the case when it is not possible to tag known locations and show that it is possible to localize tags to accuracies comparable to global positioning systems (GPS) without such a constraint. For this, we propose Gaussian Process to model various environments solely based on RF signal response signatures and without the aid of additional sensors such as global positioning GPS or cameras, and match an unknown RF to the closest match in a model dictionary. We utilize a new weighted log-likelihood method to associate an unknown environment with the closest environment in a dictionary of previously modeled environments, which is a crucial step in being able to use our approach. Our results show that it is possible to achieve localization accuracies of the order of GPS, but with passive commodity RFID, which will allow the tracking of dozens of wildfire assets within the vicinity of mobile readers at-a-time simultaneously, does not require known positions to be tagged {\em a priori}, and can achieve localization at a fraction of the cost compared to GPS.

7.4NIMay 26

mmDiff: A Noise-Robust Differentiable Ray-Tracing Framework for mmWave Scene Calibration and Channel Prediction

Haofan Lu, Yadi Cao, Wanghao Yi et al.

3D reconstruction techniques such as LiDAR scanning and photogrammetry have made it practical to build detailed geometric models of real-world environments. Such reconstructed models can potentially serve as the foundation for wireless digital twins and support network planning and optimization. The core challenge is that reconstructed models inevitably contain geometric artifacts such as holes and noisy surfaces, and wireless simulation is highly sensitive to such noise. To solve this problem, we propose a differentiable directional scattering model to approximate the noise-sensitive specular reflection. This approximation smoothly distributes reflected power among nearby ray directions, making the simulator inherently robust to local geometric artifacts in the reconstructed model. We prove mathematically that this approximation preserves asymptotic path-gain accuracy. Building on this idea, we propose mmDiff, an end-to-end differentiable framework for calibrating material properties from sparse mmWave measurements and predicting mmWave channels. We evaluate mmDiff on both real-world and synthetic datasets, and demonstrate its superior performance over prior methods using pure specular reflection in noisy reconstructed geometry.

13.1CVNov 3, 2025

OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation

Heyu Guo, Shanmu Wang, Ruichun Ma et al.

Vision-language-action (VLA) models have shown strong generalization for robotic action prediction through large-scale vision-language pretraining. However, most existing models rely solely on RGB cameras, limiting their perception and, consequently, manipulation capabilities. We present OmniVLA, an omni-modality VLA model that integrates novel sensing modalities for physically-grounded spatial intelligence beyond RGB perception. The core of our approach is the sensor-masked image, a unified representation that overlays spatially grounded and physically meaningful masks onto the RGB images, derived from sensors including an infrared camera, a mmWave radar, and a microphone array. This image-native unification keeps sensor input close to RGB statistics to facilitate training, provides a uniform interface across sensor hardware, and enables data-efficient learning with lightweight per-sensor projectors. Built on this, we present a multisensory vision-language-action model architecture and train the model based on an RGB-pretrained VLA backbone. We evaluate OmniVLA on challenging real-world tasks where sensor-modality perception guides the robotic manipulation. OmniVLA achieves an average task success rate of 84%, significantly outperforms both RGB-only and raw-sensor-input baseline models by 59% and 28% respectively, meanwhile showing higher learning efficiency and stronger generalization capability.