Marco Gruteser

CV
h-index38
16papers
8,894citations
Novelty49%
AI Score50

16 Papers

LGJun 7, 2022
Algorithms for bounding contribution for histogram estimation under user-level privacy

Yuhan Liu, Ananda Theertha Suresh, Wennan Zhu et al.

We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. We consider the heterogeneous scenario where the quantity of data can be different for each user. In this scenario, the amount of noise injected into the histogram to obtain differential privacy is proportional to the maximum user contribution, which can be amplified by few outliers. One approach to circumvent this would be to bound (or limit) the contribution of each user to the histogram. However, if users are limited to small contributions, a significant amount of data will be discarded. In this work, we propose algorithms to choose the best user contribution bound for histogram estimation under both bounded and unbounded domain settings. When the size of the domain is bounded, we propose a user contribution bounding strategy that almost achieves a two-approximation with respect to the best contribution bound in hindsight. For unbounded domain histogram estimation, we propose an algorithm that is logarithmic-approximation with respect to the best contribution bound in hindsight. This result holds without any distribution assumptions on the data. Experiments on both real and synthetic datasets verify our theoretical findings and demonstrate the effectiveness of our algorithms. We also show that clipping bias introduced by bounding user contribution may be reduced under mild distribution assumptions, which can be of independent interest.

CVNov 22, 2022
ViFi-Loc: Multi-modal Pedestrian Localization using GAN with Camera-Phone Correspondences

Hansi Liu, Kristin Dana, Marco Gruteser et al.

In Smart City and Vehicle-to-Everything (V2X) systems, acquiring pedestrians' accurate locations is crucial to traffic safety. Current systems adopt cameras and wireless sensors to detect and estimate people's locations via sensor fusion. Standard fusion algorithms, however, become inapplicable when multi-modal data is not associated. For example, pedestrians are out of the camera field of view, or data from camera modality is missing. To address this challenge and produce more accurate location estimations for pedestrians, we propose a Generative Adversarial Network (GAN) architecture. During training, it learns the underlying linkage between pedestrians' camera-phone data correspondences. During inference, it generates refined position estimations based only on pedestrians' phone data that consists of GPS, IMU and FTM. Results show that our GAN produces 3D coordinates at 1 to 2 meter localization error across 5 different outdoor scenes. We further show that the proposed model supports self-learning. The generated coordinates can be associated with pedestrian's bounding box coordinates to obtain additional camera-phone data correspondences. This allows automatic data collection during inference. After fine-tuning on the expanded dataset, localization accuracy is improved by up to 26%.

CVOct 4, 2023
ViFiT: Reconstructing Vision Trajectories from IMU and Wi-Fi Fine Time Measurements

Bryan Bo Cao, Abrar Alali, Hansi Liu et al.

Tracking subjects in videos is one of the most widely used functions in camera-based IoT applications such as security surveillance, smart city traffic safety enhancement, vehicle to pedestrian communication and so on. In the computer vision domain, tracking is usually achieved by first detecting subjects with bounding boxes, then associating detected bounding boxes across video frames. For many IoT systems, images captured by cameras are usually sent over the network to be processed at a different site that has more powerful computing resources than edge devices. However, sending entire frames through the network causes significant bandwidth consumption that may exceed the system bandwidth constraints. To tackle this problem, we propose ViFiT, a transformer-based model that reconstructs vision bounding box trajectories from phone data (IMU and Fine Time Measurements). It leverages a transformer ability of better modeling long-term time series data. ViFiT is evaluated on Vi-Fi Dataset, a large-scale multimodal dataset in 5 diverse real world scenes, including indoor and outdoor environments. To fill the gap of proper metrics of jointly capturing the system characteristics of both tracking quality and video bandwidth reduction, we propose a novel evaluation framework dubbed Minimum Required Frames (MRF) and Minimum Required Frames Ratio (MRFR). ViFiT achieves an MRFR of 0.65 that outperforms the state-of-the-art approach for cross-modal reconstruction in LSTM Encoder-Decoder architecture X-Translator of 0.98, resulting in a high frame reduction rate as 97.76%.

CVOct 11, 2022
ViFiCon: Vision and Wireless Association Via Self-Supervised Contrastive Learning

Nicholas Meegan, Hansi Liu, Bryan Cao et al.

We introduce ViFiCon, a self-supervised contrastive learning scheme which uses synchronized information across vision and wireless modalities to perform cross-modal association. Specifically, the system uses pedestrian data collected from RGB-D camera footage as well as WiFi Fine Time Measurements (FTM) from a user's smartphone device. We represent the temporal sequence by stacking multi-person depth data spatially within a banded image. Depth data from RGB-D (vision domain) is inherently linked with an observable pedestrian, but FTM data (wireless domain) is associated only to a smartphone on the network. To formulate the cross-modal association problem as self-supervised, the network learns a scene-wide synchronization of the two modalities as a pretext task, and then uses that learned representation for the downstream task of associating individual bounding boxes to specific smartphones, i.e. associating vision and wireless information. We use a pre-trained region proposal model on the camera footage and then feed the extrapolated bounding box information into a dual-branch convolutional neural network along with the FTM data. We show that compared to fully supervised SoTA models, ViFiCon achieves high performance vision-to-wireless association, finding which bounding box corresponds to which smartphone device, without hand-labeled association examples for training data.

94.7CRMar 16
Personalizing Agent Privacy Decisions via Logical Entailment

James Flemings, Ren Yi, Octavian Suciu et al.

Personal large language model (LLM) agents increasingly perform tasks that require access to user data, raising concerns about appropriate data disclosure. We show that relying solely on LLMs to make data-sharing decisions is insufficient. Prompting LLMs with general privacy norms fails to capture individual users' privacy preferences, while providing prior user data-sharing decisions through in-context learning (ICL) leads to unreliable and opaque reasoning. To address these limitations, we propose ARIEL (Agentic Reasoning with Individualized Entailment Logic), a framework that combines LLMs with rule-based logic to enable structured, personalized privacy reasoning. The core mechanism of ARIEL determines whether a user's prior decision on a data-sharing request $\textit{logically entails}$ the same decision for a new request. Experimental evaluations using advanced models and public datasets show that ARIEL reduces the F1 error rate for appropriate judgments by $\textbf{40.6%}$ compared to standard ICL-based reasoning, indicating that ARIEL is effective at correctly judging requests where the user would approve data sharing. These results demonstrate that integrating LLMs with logical entailment provides an effective and interpretable approach for automating personalized privacy decisions.

98.4CRMar 20
Text-Based Personas for Simulating User Privacy Decisions

Kassem Fawaz, Ren Yi, Octavian Suciu et al.

The ability to simulate human privacy decisions has significant implications for aligning autonomous agents with individual intent and conducting cost-effective, large-scale privacy-centric user studies. Prior approaches prompt Large Language Models (LLMs) with natural language user statements, data-sharing histories, or demographic attributes to simulate privacy decisions. These approaches, however, fail to balance individual-level accuracy, prompt usability, token efficiency, and population-level representation. We present Narriva, an approach that generates text-based synthetic privacy personas to address these shortcomings. Narriva grounds persona generation in prior user privacy decisions, such as those from large-scale survey datasets, rather than purely relying on demographic stereotypes. It compresses this data into concise, human-readable summaries structured by established privacy theories. Through benchmarking across five diverse datasets, we analyze the characteristics of Narriva's synthetic personas in modeling both individual and population-level privacy preferences. We find that grounding personas in past privacy behaviors achieves up to 88% predictive accuracy (significantly outperforming a non-personalized LLM baseline), and yields an 80-95% reduction in prompt tokens compared to in-context learning with raw examples. Finally, we demonstrate that personas synthesized from a single survey can reproduce the aggregate privacy behaviors and statistical distributions (TVComplement up to 0.85) of entirely different studies.

CRMay 8, 2024
AirGapAgent: Protecting Privacy-Conscious Conversational Agents

Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi et al.

The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand. Grounded in the framework of contextual integrity, we introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task. Extensive experiments using Gemini, GPT, and Mistral models as agents validate our approach's effectiveness in mitigating this form of context hijacking while maintaining core agent functionality. For example, we show that a single-query context hijacking attack on a Gemini Ultra agent reduces its ability to protect user data from 94% to 45%, while an AirGapAgent achieves 97% protection, rendering the same attack ineffective.

CRApr 16, 2024
Confidential Federated Computations

Hubert Eichner, Daniel Ramage, Kallista Bonawitz et al.

Federated Learning and Analytics (FLA) have seen widespread adoption by technology platforms for processing sensitive on-device data. However, basic FLA systems have privacy limitations: they do not necessarily require anonymization mechanisms like differential privacy (DP), and provide limited protections against a potentially malicious service provider. Adding DP to a basic FLA system currently requires either adding excessive noise to each device's updates, or assuming an honest service provider that correctly implements the mechanism and only uses the privatized outputs. Secure multiparty computation (SMPC) -based oblivious aggregations can limit the service provider's access to individual user updates and improve DP tradeoffs, but the tradeoffs are still suboptimal, and they suffer from scalability challenges and susceptibility to Sybil attacks. This paper introduces a novel system architecture that leverages trusted execution environments (TEEs) and open-sourcing to both ensure confidentiality of server-side computations and provide externally verifiable privacy properties, bolstering the robustness and trustworthiness of private federated computations.

AIJun 13, 2025
Privacy Reasoning in Ambiguous Contexts

Ren Yi, Octavian Suciu, Adria Gascon et al.

We study the ability of language models to reason about appropriate information disclosure - a central aspect of the evolving field of agentic privacy. Whereas previous works have focused on evaluating a model's ability to align with human decisions, we examine the role of ambiguity and missing context on model performance when making information-sharing decisions. We identify context ambiguity as a crucial barrier for high performance in privacy assessments. By designing Camber, a framework for context disambiguation, we show that model-generated decision rationales can reveal ambiguities and that systematically disambiguating context based on these rationales leads to significant accuracy improvements (up to 13.3\% in precision and up to 22.3\% in recall) as well as reductions in prompt sensitivity. Overall, our results indicate that approaches for context disambiguation are a promising way forward to enhance agentic privacy reasoning.

CRNov 3, 2021
Towards Sparse Federated Analytics: Location Heatmaps under Distributed Differential Privacy with Secure Aggregation

Eugene Bagdasaryan, Peter Kairouz, Stefan Mellem et al.

We design a scalable algorithm to privately generate location heatmaps over decentralized data from millions of user devices. It aims to ensure differential privacy before data becomes visible to a service provider while maintaining high data accuracy and minimizing resource consumption on users' devices. To achieve this, we revisit distributed differential privacy based on recent results in secure multiparty computation, and we design a scalable and adaptive distributed differential privacy approach for location analytics. Evaluation on public location datasets shows that this approach successfully generates metropolitan-scale heatmaps from millions of user samples with a worst-case client communication overhead that is significantly smaller than existing state-of-the-art private protocols of similar accuracy.

CVApr 6, 2020
LaNet: Real-time Lane Identification by Learning Road SurfaceCharacteristics from Accelerometer Data

Madhumitha Harishankar, Jun Han, Sai Vineeth Kalluru Srinivas et al.

The resolution of GPS measurements, especially in urban areas, is insufficient for identifying a vehicle's lane. In this work, we develop a deep LSTM neural network model LaNet that determines the lane vehicles are on by periodically classifying accelerometer samples collected by vehicles as they drive in real time. Our key finding is that even adjacent patches of road surfaces contain characteristics that are sufficiently unique to differentiate between lanes, i.e., roads inherently exhibit differing bumps, cracks, potholes, and surface unevenness. Cars can capture this road surface information as they drive using inexpensive, easy-to-install accelerometers that increasingly come fitted in cars and can be accessed via the CAN-bus. We collect an aggregate of 60 km driving data and synthesize more based on this that capture factors such as variable driving speed, vehicle suspensions, and accelerometer noise. Our formulated LSTM-based deep learning model, LaNet, learns lane-specific sequences of road surface events (bumps, cracks etc.) and yields 100% lane classification accuracy with 200 meters of driving data, achieving over 90% with just 100 m (correspondingly to roughly one minute of driving). We design the LaNet model to be practical for use in real-time lane classification and show with extensive experiments that LaNet yields high classification accuracy even on smooth roads, on large multi-lane roads, and on drives with frequent lane changes. Since different road surfaces have different inherent characteristics or entropy, we excavate our neural network model and discover a mechanism to easily characterize the achievable classification accuracies in a road over various driving distances by training the model just once. We present LaNet as a low-cost, easily deployable and highly accurate way to achieve fine-grained lane identification.

LGDec 10, 2019
Advances and Open Problems in Federated Learning

Peter Kairouz, H. Brendan McMahan, Brendan Avent et al.

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

DCNov 30, 2019
Federated Learning with Autotuned Communication-Efficient Secure Aggregation

Keith Bonawitz, Fariborz Salehi, Jakub Konečný et al.

Federated Learning enables mobile devices to collaboratively learn a shared inference model while keeping all the training data on a user's device, decoupling the ability to do machine learning from the need to store the data in the cloud. Existing work on federated learning with limited communication demonstrates how random rotation can enable users' model updates to be quantized much more efficiently, reducing the communication cost between users and the server. Meanwhile, secure aggregation enables the server to learn an aggregate of at least a threshold number of device's model contributions without observing any individual device's contribution in unaggregated form. In this paper, we highlight some of the challenges of setting the parameters for secure aggregation to achieve communication efficiency, especially in the context of the aggressively quantized inputs enabled by random rotation. We then develop a recipe for auto-tuning communication-efficient secure aggregation, based on specific properties of random rotation and secure aggregation -- namely, the predictable distribution of vector entries post-rotation and the modular wrapping inherent in secure aggregation. We present both theoretical results and initial experiments.

CVNov 1, 2017
Recognizing Textures with Mobile Cameras for Pedestrian Safety Applications

Shubham Jain, Marco Gruteser

As smartphone rooted distractions become commonplace, the lack of compelling safety measures has led to a rise in the number of injuries to distracted walkers. Various solutions address this problem by sensing a pedestrian's walking environment. Existing camera-based approaches have been largely limited to obstacle detection and other forms of object detection. Instead, we present TerraFirma, an approach that performs material recognition on the pedestrian's walking surface. We explore, first, how well commercial off-the-shelf smartphone cameras can learn texture to distinguish among paving materials in uncontrolled outdoor urban settings. Second, we aim at identifying when a distracted user is about to enter the street, which can be used to support safety functions such as warning the user to be cautious. To this end, we gather a unique dataset of street/sidewalk imagery from a pedestrian's perspective, that spans major cities like New York, Paris, and London. We demonstrate that modern phone cameras can be enabled to distinguish materials of walking surfaces in urban areas with more than 90% accuracy, and accurately identify when pedestrians transition from sidewalk to street.

CVApr 6, 2016
Reading Between the Pixels: Photographic Steganography for Camera Display Messaging

Eric Wengrowski, Kristin Dana, Marco Gruteser et al.

We exploit human color metamers to send light-modulated messages less visible to the human eye, but recoverable by cameras. These messages are a key component to camera-display messaging, such as handheld smartphones capturing information from electronic signage. Each color pixel in the display image is modified by a particular color gradient vector. The challenge is to find the color gradient that maximizes camera response, while minimizing human response. The mismatch in human spectral and camera sensitivity curves creates an opportunity for hidden messaging. Our approach does not require knowledge of these sensitivity curves, instead we employ a data-driven method. We learn an ellipsoidal partitioning of the six-dimensional space of colors and color gradients. This partitioning creates metamer sets defined by the base color at the display pixel and the color gradient direction for message encoding. We sample from the resulting metamer sets to find color steps for each base color to embed a binary message into an arbitrary image with reduced visible artifacts. Unlike previous methods that rely on visually obtrusive intensity modulation, we embed with color so that the message is more hidden. Ordinary displays and cameras are used without the need for expensive LEDs or high speed devices. The primary contribution of this work is a framework to map the pixels in an arbitrary image to a metamer pair for steganographic photo messaging.

CVJan 8, 2015
Optimal Radiometric Calibration for Camera-Display Communication

Wenjia Yuan, Eric Wengrowski, Kristin J. Dana et al.

We present a novel method for communicating between a camera and display by embedding and recovering hidden and dynamic information within a displayed image. A handheld camera pointed at the display can receive not only the display image, but also the underlying message. These active scenes are fundamentally different from traditional passive scenes like QR codes because image formation is based on display emittance, not surface reflectance. Detecting and decoding the message requires careful photometric modeling for computational message recovery. Unlike standard watermarking and steganography methods that lie outside the domain of computer vision, our message recovery algorithm uses illumination to optically communicate hidden messages in real world scenes. The key innovation of our approach is an algorithm that performs simultaneous radiometric calibration and message recovery in one convex optimization problem. By modeling the photometry of the system using a camera-display transfer function (CDTF), we derive a physics-based kernel function for support vector machine classification. We demonstrate that our method of optimal online radiometric calibration (OORC) leads to an efficient and robust algorithm for computational messaging between nine commercial cameras and displays.