CVLGNIMar 29

Tracking without Seeing: Geospatial Inference using Encrypted Traffic from Distributed Nodes

arXiv:2603.2781129.01 citationsh-index: 7
Predicted impact top 80% in CV · last 90 daysOriginality Highly original
AI Analysis

This work enables privacy-preserving geospatial inference from encrypted traffic, expanding sensing capabilities for scenarios where raw video is inaccessible.

GraySense performs geospatial object tracking using only encrypted packet-level information from wireless video cameras, achieving 2.33 meters tracking error without raw signal access, within the dimensions of tracked objects (4.61m x 1.93m).

Accurate observation of dynamic environments traditionally relies on synthesizing raw, signal-level information from multiple distributed sensors. This work investigates an alternative approach: performing geospatial inference using only encrypted packet-level information, without access to the raw sensory data. We further explore how this indirect information can be fused with directly available sensory data to extend overall inference capabilities. We introduce GraySense, a learning-based framework that performs geospatial object tracking by analyzing encrypted wireless video transmission traffic, such as packet sizes, from cameras with inaccessible streams. GraySense leverages the inherent relationship between scene dynamics and transmitted packet sizes to infer object motion. The framework consists of two stages: (1) a Packet Grouping module that identifies frame boundaries and estimates frame sizes from encrypted network traffic, and (2) a Tracker module, based on a Transformer encoder with a recurrent state, which fuses indirect packet-based inputs with optional direct camera-based inputs to estimate the object's position. Extensive experiments with realistic videos from the CARLA simulator and emulated networks under varying conditions show that GraySense achieves 2.33 meters tracking error (Euclidean distance) without raw signal access, within the dimensions of tracked objects (4.61m x 1.93m). To our knowledge, this capability has not been previously demonstrated, expanding the use of latent signals for sensing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes