SDNov 4, 2022
Real-Time Target Sound ExtractionBandhav Veluri, Justin Chan, Malek Itani et al.
We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner while also leveraging the generalization performance of transformer-based architectures. Our evaluations show as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior models for this task while having a 1.2-4x smaller model size and a 1.5-2x lower runtime. We provide code, dataset, and audio samples: https://waveformer.cs.washington.edu/.
SDNov 1, 2023
Semantic Hearing: Programming Acoustic Scenes with Binaural HearablesBandhav Veluri, Malek Itani, Justin Chan et al.
Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues. To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use. Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56 ms on a connected smartphone. In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output. Project page with code: https://semantichearing.cs.washington.edu
HCMay 25
WeeCare: Towards Handheld Bladder Fullness Sensing with a Conformable PadZhikai Qin, Siqi Zhang, Junyi Zhu et al.
Patients with bladder dysfunction often lose the sensation of bladder fullness and cannot void naturally, forcing reliance on fixed-schedule catheterization that is uncomfortable and risks complications. We present WeeCare, a handheld conformable pad with fabric electrodes for on-demand bladder fullness sensing using electrical impedance tomography (EIT). The central challenge is that repeated removal and reattachment can introduce variation in electrode position and contact quality. We assess WeeCare along three axes: in-silico simulations characterizing electrode layout and noise robustness, in-vitro phantom experiments across urine salinities and filling levels, and an in-vivo human measurement for bladder fullness sensing, voiding, and filling dynamics.
MED-PHNov 17, 2025
Contactless Monitoring of Muscle Vibrations During Exercise with a Chaos-Inspired RadarJiangyifei Zhu, Yuzhe Wang, Tao Qiang et al.
In this paper, our goal is to enable quantitative feedback on muscle fatigue during exercise to optimize exercise effectiveness while minimizing injury risk. We seek to capture fatigue by monitoring surface vibrations that muscle exertion induces. Muscle vibrations are unique as they arise from the asynchronous firing of motor units, producing surface micro-displacements that are broadband, nonlinear, and seemingly stochastic. Accurately sensing these noise-like signals requires new algorithmic strategies that can uncover their underlying structure. We present GigaFlex the first contactless system that measures muscle vibrations using mmWave radar to infer muscle force and detect fatigue. GigaFlex draws on algorithmic foundations from Chaos theory to model the deterministic patterns of muscle vibrations and extend them to the radar domain. Specifically, we design a radar processing architecture that systematically infuses principles from Chaos theory and nonlinear dynamics throughout the sensing pipeline, spanning localization, segmentation, and learning, to estimate muscle forces during static and dynamic weight-bearing exercises. Across a 23-participant study, GigaFlex estimates maximum voluntary isometric contraction (MVIC) root mean square error (RMSE) of 5.9\%, and detects one to three Repetitions in Reserve (RIR), a key quantitative muscle fatigue metric, with an AUC of 0.83 to 0.86, performing comparably to a contact-based IMU baseline. Our system can enable timely feedback that can help prevent fatigue-induced injury, and opens new opportunities for physiological sensing of complex, non-periodic biosignals.
HCApr 14
GlintMarkers: Spatial Perception on XR Eyewear using Corneal ReflectionsSeungjoo Lee, Vimal Mollyn, Chris Harrison et al.
We present GlintMarkers, the first system to perform gaze-driven spatial perception using the inward-facing cameras on XR eyewear. Our key observation is that the cornea acts as a mirror that encodes both gaze direction and visual information about the environment in a small, low-contrast reflection. To extract spatial and semantic information from this reflection despite the camera's limited pixel budget, we present a passive retroreflective marker design that concentrates reflected near-infrared light onto the cornea, producing bright glint patterns. We develop a custom Perspective-n-Point (PnP) estimation framework adapted to corneal imaging and perform orientation and distance estimation of tagged objects, as well as unique object identification.
HCApr 8
LubDubDecoder: Bringing Micro-Mechanical Cardiac Monitoring to HearablesSiqi Zhang, Xiyuxing Zhang, Duc Vu et al.
We present LubDubDecoder, a system that enables fine-grained monitoring of micro-cardiac vibrations associated with the opening and closing of heart valves across a range of hearables. Our system transforms the built-in speaker, the only transducer common to all hearables, into an acoustic sensor that captures the coarse "lub-dub" heart sounds, leverages their shared temporal and spectral structure to reconstruct the subtle seismocardiography (SCG) and gyrocardiography (GCG) waveforms, and extract the timing of key micro-cardiac events. In an IRB-approved feasibility study with 25 users, our system achieves correlations of 0.88-0.95 compared to chest-mounted reference measurements in within-user and cross-user evaluations, and generalizes to unseen hearables using a zero-effort adaptation scheme with a correlation of 0.91. Our system is robust across remounting sessions and music playback.
ASApr 7
Active noise cancellation on open-ear smart glassesKuang Yuan, Freddy Yifei Liu, Tong Xiao et al.
Smart glasses are becoming an increasingly prevalent wearable platform, with audio as a key interaction modality. However, hearing in noisy environments remains challenging because smart glasses are equipped with open-ear speakers that do not seal the ear canal. Furthermore, the open-ear design is incompatible with conventional active noise cancellation (ANC) techniques, which rely on an error microphone inside or at the entrance of the ear canal to measure the residual sound heard after cancellation. Here we present the first real-time ANC system for open-ear smart glasses that suppresses environmental noise using only microphones and miniaturized open-ear speakers embedded in the glasses frame. Our low-latency computational pipeline estimates the noise at the ear from an array of eight microphones distributed around the glasses frame and generates an anti-noise signal in real-time to cancel environmental noise. We develop a custom glasses prototype and evaluate it in a user study across 8 environments under mobility in the 100--1000 Hz frequency range, where environmental noise is concentrated. We achieve a mean noise reduction of 9.6 dB without any calibration, and 11.2 dB with a brief user-specific calibration.
HCApr 8
DropleX: Liquid sensing on tablet touchscreensSiqi Zhang, Mayank Goel, Justin Chan
We present DropleX, the first system that enables liquid sensing using the capacitive touchscreen of commodity tablets. DropleX detects microliter-scale liquid samples, and performs non-invasive, through-container measurements for liquid analysis. These capabilities are made possible by a physics-informed mechanism that disables the touchscreen's built-in adaptive filters, originally designed to reject the effects of liquid drops such as rain, without any hardware modifications. We model the touchscreen's sensing capabilities, limits, and non-idealities to inform the design of a signal processing and learning-based pipeline for liquid sensing. Under controlled laboratory conditions, our system achieves 89-99% accuracy in detecting microliter-scale adulteration in soda, wine, and milk, 94-96% accuracy in threshold detection of trace chemical concentrations, and 86-96% accuracy in through-container adulterant detection. These exploratory results demonstrate the potential of repurposing commodity touchscreens as a liquid characterization platform for laboratory settings, food and beverage testing, and chemical analysis applications.
SDApr 15, 2025
SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic MicrostructuresKuang Yuan, Yifeng Wang, Xiyuxing Zhang et al. · cmu
Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer's voice with clarity in a reverberant auditorium. We introduce SonicSieve, the first intelligent directional speech extraction system for smartphones using a bio-inspired acoustic microstructure. Our passive design embeds directional cues onto incoming speech without any additional electronics. It attaches to the in-line mic of low-cost wired earphones which can be attached to smartphones. We present an end-to-end neural network that processes the raw audio mixtures in real-time on mobile devices. Our results show that SonicSieve achieves a signal quality improvement of 5.0 dB when focusing on a 30° angular region. Additionally, the performance of our system based on only two microphones exceeds that of conventional 5-microphone arrays.
DBMar 1, 2025
Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing SystemsAlexander W. Lee, Justin Chan, Michael Fu et al.
AI-augmented data processing systems (DPSs) integrate large language models (LLMs) into query pipelines, allowing powerful semantic operations on structured and unstructured data. However, the reliability (a.k.a. trust) of these systems is fundamentally challenged by the potential for LLMs to produce errors, limiting their adoption in critical domains. To help address this reliability bottleneck, we introduce semantic integrity constraints (SICs) -- a declarative abstraction for specifying and enforcing correctness conditions over LLM outputs in semantic queries. SICs generalize traditional database integrity constraints to semantic settings, supporting common types of constraints, such as grounding, soundness, and exclusion, with both reactive and proactive enforcement strategies. We argue that SICs provide a foundation for building reliable and auditable AI-augmented data systems. Specifically, we present a system design for integrating SICs into query planning and runtime execution and discuss its realization in AI-augmented DPSs. To guide and evaluate our vision, we outline several design goals -- covering criteria around expressiveness, runtime semantics, integration, performance, and enterprise-scale applicability -- and discuss how our framework addresses each, along with open research challenges.
CRApr 7, 2020
PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact TracingJustin Chan, Dean Foster, Shyam Gollakota et al.
The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the civil liberties of individuals. In particular, this work advocates for a third-party free approach to assisted mobile contact tracing, because such an approach mitigates the security and privacy risks of requiring a trusted third party. We also explicitly consider the inferential risks involved in any contract tracing system, where any alert to a user could itself give rise to de-anonymizing information. More generally, we hope to participate in bringing together colleagues in industry, academia, and civil society to discuss and converge on ideas around a critical issue rising with attempts to mitigate the COVID-19 pandemic.
IVSep 16, 2019
Identifying Pediatric Vascular Anomalies With Deep LearningJustin Chan, Sharat Raju, Randall Bly et al.
Vascular anomalies, more colloquially known as birthmarks, affect up to 1 in 10 infants. Though many of these lesions self-resolve, some types can result in medical complications or disfigurement without proper diagnosis or management. Accurately diagnosing vascular anomalies is challenging for pediatricians and primary care physicians due to subtle visual differences and similarity to other pediatric dermatologic conditions. This can result in delayed or incorrect referrals for treatment. To address this problem, we developed a convolutional neural network (CNN) to automatically classify images of vascular anomalies and other pediatric skin conditions to aid physicians with diagnosis. We constructed a dataset of 21,681 clinical images, including data collected between 2002-2018 at Seattle Children's hospital as well as five dermatologist-curated online repositories, and built a taxonomy over vascular anomalies and other common pediatric skin lesions. The CNN achieved an average AUC of 0.9731 when ten-fold cross-validation was performed across a taxonomy of 12 classes. The classifier's average AUC and weighted F1 score was 0.9889 and 0.9732 respectively when evaluated on a previously unseen test set of six of these classes. Further, when used as an aid by pediatricians (n = 7), the classifier increased their average visual diagnostic accuracy from 73.10% to 91.67%. The classifier runs in real-time on a smartphone and has the potential to improve diagnosis of these conditions, particularly in resource-limited areas.