Priyanka Mary Mammen

h-index5

6papers

419citations

Novelty28%

AI Score25

Ranked #165,927 of 194,257 authors (top 85%)#28,162 in CL (top 92%)

6 Papers

2.3SPOct 24, 2022

SleepMore: Inferring Sleep Duration at Scale via Multi-Device WiFi Sensing

Camellia Zakaria, Gizem Yilmaz, Priyanka Mammen et al.

The availability of commercial wearable trackers equipped with features to monitor sleep duration and quality has enabled more useful sleep health monitoring applications and analyses. However, much research has reported the challenge of long-term user retention in sleep monitoring through these modalities. Since modern Internet users own multiple mobile devices, our work explores the possibility of employing ubiquitous mobile devices and passive WiFi sensing techniques to predict sleep duration as the fundamental measure for complementing long-term sleep monitoring initiatives. In this paper, we propose SleepMore, an accurate and easy-to-deploy sleep-tracking approach based on machine learning over the user's WiFi network activity. It first employs a semi-personalized random forest model with an infinitesimal jackknife variance estimation method to classify a user's network activity behavior into sleep and awake states per minute granularity. Through a moving average technique, the system uses these state sequences to estimate the user's nocturnal sleep period and its uncertainty rate. Uncertainty quantification enables SleepMore to overcome the impact of noisy WiFi data that can yield large prediction errors. We validate SleepMore using data from a month-long user study involving 46 college students and draw comparisons with the Oura Ring wearable. Beyond the college campus, we evaluate SleepMore on non-student users of different housing profiles. Our results demonstrate that SleepMore produces statistically indistinguishable sleep statistics from the Oura ring baseline for predictions made within a 5% uncertainty rate. These errors range between 15-28 minutes for determining sleep time and 7-29 minutes for determining wake time, proving statistically significant improvements over prior work. Our in-depth analysis explains the sources of errors.

0.9CLSep 11, 2023

Detecting Natural Language Biases with Prompt-based Learning

Md Abdul Aowal, Maliha T Islam, Priyanka Mary Mammen et al.

In this project, we want to explore the newly emerging field of prompt engineering and apply it to the downstream task of detecting LM biases. More concretely, we explore how to design prompts that can indicate 4 different types of biases: (1) gender, (2) race, (3) sexual orientation, and (4) religion-based. Within our project, we experiment with different manually crafted prompts that can draw out the subtle biases that may be present in the language model. We apply these prompts to multiple variations of popular and well-recognized models: BERT, RoBERTa, and T5 to evaluate their biases. We provide a comparative analysis of these models and assess them using a two-fold method: use human judgment to decide whether model predictions are biased and utilize model-level judgment (through further prompts) to understand if a model can self-diagnose the biases of its own prediction.

21.5CLApr 18, 2024Code

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed et al. · deepmind, oxford

This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.

2.3SPFeb 7, 2021

WiSleep: Inferring Sleep Duration at Scale Using Passive WiFi Sensing

Priyanka Mary Mammen, Camellia Zakaria, Tergel Molom-Ochir et al.

Sleep deprivation is a public health concern that significantly impacts one's well-being and performance. Sleep is an intimate experience, and state-of-the-art sleep monitoring solutions are highly-personalized to individual users. With a motivation to expand sleep monitoring capabilities at a large scale and contribute sleep data to public health understanding, we present Wisleep, a system for inferring sleep duration using smartphone network connections that are passively sensed from WiFi infrastructure. We propose an unsupervised ensemble model of Bayesian change point detection, validating it over a user study among 20 students living in campus dormitories and a private home. Our results find Wisleep outperforming prior techniques for users with irregular sleep patterns while yielding an average 88.50% accuracy within 60 minutes sleep time error and 39 minutes wake-up time error. This is comparable to client-side methods, albeit utilizing coarse-grained information. Additionally, we utilize our approach to predict sleep and wake-up times from a user study of more than 1000 student users, demonstrating results similar to prior findings on students' sleep patterns. Finally, we show that Wisleep can process data from twenty thousand users on a single commodity server, allowing it to scale to large campus populations with low server requirements.

27.4LGJan 14, 2021

Federated Learning: Opportunities and Challenges

Priyanka Mary Mammen

Federated Learning (FL) is a concept first introduced by Google in 2016, in which multiple devices collaboratively learn a machine learning model without sharing their private data under the supervision of a central server. This offers ample opportunities in critical domains such as healthcare, finance etc, where it is risky to share private user information to other organisations or devices. While FL appears to be a promising Machine Learning (ML) technique to keep the local data private, it is also vulnerable to attacks like other ML models. Given the growing interest in the FL domain, this report discusses the opportunities and challenges in federated learning.

3.4LGOct 19, 2019

Explainable AI: Deep Reinforcement Learning Agents for Residential Demand Side Cost Savings in Smart Grids

Hareesh Kumar, Priyanka Mary Mammen, Krithi Ramamritham

Motivated by recent advancements in Deep Reinforcement Learning (RL), we have developed an RL agent to manage the operation of storage devices in a household and is designed to maximize demand-side cost savings. The proposed technique is data-driven, and the RL agent learns from scratch how to efficiently use the energy storage device given variable tariff structures. In most of the studies, the RL agent is considered as a black box, and how the agent has learned is often ignored. We explain the learning progression of the RL agent, and the strategies it follows based on the capacity of the storage device.