SDMar 20, 2022
A Study on Robustness to Perturbations for Representations of Environmental SoundSangeeta Srivastava, Ho-Hsiang Wu, Joao Rulff et al. · meta-ai
Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. Therefore, for a given data domain, it is unclear how the representations would be affected by the variations caused by myriad microphones' range and acoustic conditions -- commonly known as channel effects. We aim to extend HEAR to evaluate invariance to channel effects in this work. To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation domain-dependent but not task-dependent. Combined with the downstream performance, it helps us make a more informed prediction of how robust the embeddings are to the channel effects. We evaluate two embeddings -- YAMNet, and OpenL3 on monophonic (UrbanSound8K) and polyphonic (SONYC-UST) urban datasets. We show that one distance measure does not suffice in such task-independent evaluation. Although Fréchet Audio Distance (FAD) correlates with the trend of the performance drop in the downstream task most accurately, we show that we need to study FAD in conjunction with the other distances to get a clear understanding of the overall effect of the perturbation. In terms of the embedding performance, we find OpenL3 to be more robust than YAMNet, which aligns with the HEAR evaluation.
LGAug 18, 2023
Learning from A Single Graph is All You Need for Near-Shortest Path Routing in Wireless NetworksYung-Fu Chen, Sen Lin, Anish Arora
We propose a learning algorithm for local routing policies that needs only a few data samples obtained from a single graph while generalizing to all random graphs in a standard model of wireless networks. We thus solve the all-pairs near-shortest path problem by training deep neural networks (DNNs) that efficiently and scalably learn routing policies that are local, i.e., they only consider node states and the states of neighboring nodes. Remarkably, one of these DNNs we train learns a policy that exactly matches the performance of greedy forwarding; another generally outperforms greedy forwarding. Our algorithm design exploits network domain knowledge in several ways: First, in the selection of input features and, second, in the selection of a ``seed graph'' and subsamples from its shortest paths. The leverage of domain knowledge provides theoretical explainability of why the seed graph and node subsampling suffice for learning that is efficient, scalable, and generalizable. Simulation-based results on uniform random graphs with diverse sizes and densities empirically corroborate that using samples generated from a few routing paths in a modest-sized seed graph quickly learns a model that is generalizable across (almost) all random graphs in the wireless network model.
LGNov 14, 2025
MoCap2Radar: A Spatiotemporal Transformer for Synthesizing Micro-Doppler Radar Signatures from Motion CaptureKevin Chen, Kenneth W. Parker, Anish Arora
We present a pure machine learning process for synthesizing radar spectrograms from Motion-Capture (MoCap) data. We formulate MoCap-to-spectrogram translation as a windowed sequence-to-sequence task using a transformer-based model that jointly captures spatial relations among MoCap markers and temporal dynamics across frames. Real-world experiments show that the proposed approach produces visually and quantitatively plausible doppler radar spectrograms and achieves good generalizability. Ablation experiments show that the learned model includes both the ability to convert multi-part motion into doppler signatures and an understanding of the spatial relations between different parts of the human body. The result is an interesting example of using transformers for time-series signal processing. It is especially applicable to edge computing and Internet of Things (IoT) radars. It also suggests the ability to augment scarce radar datasets using more abundant MoCap data for training higher-level applications. Finally, it requires far less computation than physics-based methods for generating radar data.
LGApr 19
What Physics do Data-Driven MoCap-to-Radar Models Learn?Kevin Chen, Kenneth W. Parker, Anish Arora
Data-driven MoCap-to-radar models generate plausible micro-Doppler spectrograms, but do they actually learn the underlying physics? We introduce a physics-based interpretability framework to answer this question via two proposed complementary metrics: one measures alignment between model predictions and the physics-derived Doppler frequency, while the other tests whether predictions preserve the velocity-frequency relationship under velocity intervention. Both metrics require only MoCap input and model predictions, without access to measured radar data. Experiments across several model architectures reveal that low reconstruction error does not guarantee physical consistency: some, but not all, models achieve low error yet perform poorly on the two physics-based metrics. Further analysis shows that temporal attention is critical for transformer-based models to learn the underlying physics.
LGMar 16
Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path RoutingYung-Fu Chen, Anish Arora
While deep learning models that leverage local features have demonstrated significant potential for near-optimal routing in dense Euclidean graphs, they struggle to generalize well in sparse networks where topological irregularities require broader structural awareness. To address this limitation, we train a Cascaded Mixture of Experts (Ca-MoE) to solve the all-pairs near-shortest path (APNSP) routing problem. Our Ca-MoE is a modular two-tier architecture that supports the decision-making for forwarder selection with lower-tier experts relying on local features and upper-tier experts relying on global features. It performs adaptive inference wherein the upper-tier experts are triggered only when the lower-tier ones do not suffice to achieve adequate decision quality. Computational efficiency is thus achieved by escalating model capacity only when necessitated by topological complexity, and parameter redundancy is avoided. Furthermore, we incorporate an online meta-learning strategy that facilitates independent expert fine-tuning and utilizes a stability-focused update mechanism to prevent catastrophic forgetting as new graph environments are encountered. Experimental evaluations demonstrate that Ca-MoE routing improves accuracy by up to 29.1% in sparse networks compared to single-expert baselines and maintains performance within 1%-6% of the theoretical upper bound across diverse graph densities.
LGSep 8, 2025
Knowledge-Guided Machine Learning for Stabilizing Near-Shortest Path RoutingYung-Fu Chen, Sen Lin, Anish Arora
We propose a simple algorithm that needs only a few data samples from a single graph for learning local routing policies that generalize across a rich class of geometric random graphs in Euclidean metric spaces. We thus solve the all-pairs near-shortest path problem by training deep neural networks (DNNs) that let each graph node efficiently and scalably route (i.e., forward) packets by considering only the node's state and the state of the neighboring nodes. Our algorithm design exploits network domain knowledge in the selection of input features and design of the policy function for learning an approximately optimal policy. Domain knowledge also provides theoretical assurance that the choice of a ``seed graph'' and its node data sampling suffices for generalizable learning. Remarkably, one of these DNNs we train -- using distance-to-destination as the only input feature -- learns a policy that exactly matches the well-known Greedy Forwarding policy, which forwards packets to the neighbor with the shortest distance to the destination. We also learn a new policy, which we call GreedyTensile routing -- using both distance-to-destination and node stretch as the input features -- that almost always outperforms greedy forwarding. We demonstrate the explainability and ultra-low latency run-time operation of Greedy Tensile routing by symbolically interpreting its DNN in low-complexity terms of two linear actions.
NIJun 24, 2025
MILAAP: Mobile Link Allocation via Attention-based PredictionYung-Fu Chen, Anish Arora
Channel hopping (CS) communication systems must adapt to interference changes in the wireless network and to node mobility for maintaining throughput efficiency. Optimal scheduling requires up-to-date network state information (i.e., of channel occupancy) to select non-overlapping channels for links in interference regions. However, state sharing among nodes introduces significant communication overhead, especially as network size or node mobility scale, thereby decreasing throughput efficiency of already capacity-limited networks. In this paper, we eschew state sharing while adapting the CS schedule based on a learning-based channel occupancy prediction. We propose the MiLAAP attention-based prediction framework for machine learning models of spectral, spatial, and temporal dependencies among network nodes. MiLAAP uses a self-attention mechanism that lets each node capture the temporospectral CS pattern in its interference region and accordingly predict the channel occupancy state within that region. Notably, the prediction relies only on locally and passively observed channel activities, and thus introduces no communication overhead. To deal with node mobility, MiLAAP also uses a multi-head self-attention mechanism that lets each node locally capture the spatiotemporal dependencies on other network nodes that can interfere with it and accordingly predict the motion trajectory of those nodes. Detecting nodes that enter or move outside the interference region is used to further improve the prediction accuracy of channel occupancy. We show that for dynamic networks that use local CS sequences to support relatively long-lived flow traffics, the channel state prediction accuracy of MiLAAP is remarkably ~100% across different node mobility patterns and it achieves zero-shot generalizability across different periods of CS sequences.
AIJun 12, 2025
MUDAS: Mote-scale Unsupervised Domain Adaptation in Multi-label Sound ClassificationJihoon Yun, Chengzhang Li, Dhrubojyoti Roy et al.
Unsupervised Domain Adaptation (UDA) is essential for adapting machine learning models to new, unlabeled environments where data distribution shifts can degrade performance. Existing UDA algorithms are designed for single-label tasks and rely on significant computational resources, limiting their use in multi-label scenarios and in resource-constrained IoT devices. Overcoming these limitations is particularly challenging in contexts such as urban sound classification, where overlapping sounds and varying acoustics require robust, adaptive multi-label capabilities on low-power, on-device systems. To address these limitations, we introduce Mote-scale Unsupervised Domain Adaptation for Sounds (MUDAS), a UDA framework developed for multi-label sound classification in resource-constrained IoT settings. MUDAS efficiently adapts models by selectively retraining the classifier in situ using high-confidence data, minimizing computational and memory requirements to suit on-device deployment. Additionally, MUDAS incorporates class-specific adaptive thresholds to generate reliable pseudo-labels and applies diversity regularization to improve multi-label classification accuracy. In evaluations on the SONYC Urban Sound Tagging (SONYC-UST) dataset recorded at various New York City locations, MUDAS demonstrates notable improvements in classification accuracy over existing UDA algorithms, achieving good performance in a resource-constrained IoT setting.
LGFeb 12, 2022
Physics-Guided Problem Decomposition for Scaling Deep Learning of High-dimensional Eigen-Solvers: The Case of Schrödinger's EquationSangeeta Srivastava, Samuel Olin, Viktor Podolskiy et al.
Given their ability to effectively learn non-linear mappings and perform fast inference, deep neural networks (NNs) have been proposed as a viable alternative to traditional simulation-driven approaches for solving high-dimensional eigenvalue equations (HDEs), which are the foundation for many scientific applications. Unfortunately, for the learned models in these scientific applications to achieve generalization, a large, diverse, and preferably annotated dataset is typically needed and is computationally expensive to obtain. Furthermore, the learned models tend to be memory- and compute-intensive primarily due to the size of the output layer. While generalization, especially extrapolation, with scarce data has been attempted by imposing physical constraints in the form of physics loss, the problem of model scalability has remained. In this paper, we alleviate the compute bottleneck in the output layer by using physics knowledge to decompose the complex regression task of predicting the high-dimensional eigenvectors into multiple simpler sub-tasks, each of which are learned by a simple "expert" network. We call the resulting architecture of specialized experts Physics-Guided Mixture-of-Experts (PG-MoE). We demonstrate the efficacy of such physics-guided problem decomposition for the case of the Schrödinger's Equation in Quantum Mechanics. Our proposed PG-MoE model predicts the ground-state solution, i.e., the eigenvector that corresponds to the smallest possible eigenvalue. The model is 150x smaller than the network trained to learn the complex task while being competitive in generalization. To improve the generalization of the PG-MoE, we also employ a physics-guided loss function based on variational energy, which by quantum mechanics principles is minimized iff the output is the ground-state solution.
SPSep 6, 2019
One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar ClassificationDhrubojyoti Roy, Sangeeta Srivastava, Aditya Kusupati et al.
Edge sensing with micro-power pulse-Doppler radars is an emergent domain in monitoring and surveillance with several smart city applications. Existing solutions for the clutter versus multi-source radar classification task are limited in terms of either accuracy or efficiency, and in some cases, struggle with a trade-off between false alarms and recall of sources. We find that this problem can be resolved by learning the classifier across multiple time-scales. We propose a multi-scale, cascaded recurrent neural network architecture, MSC-RNN, comprised of an efficient multi-instance learning (MIL) Recurrent Neural Network (RNN) for clutter discrimination at a lower tier, and a more complex RNN classifier for source classification at the upper tier. By controlling the invocation of the upper RNN with the help of the lower tier conditionally, MSC-RNN achieves an overall accuracy of 0.972. Our approach holistically improves the accuracy and per-class recalls over ML models suitable for radar inferencing. Notably, we outperform cross-domain handcrafted feature engineering with time-domain deep feature learning, while also being up to $\sim$3$\times$ more efficient than a competitive solution.
SDMay 2, 2018
SONYC: A System for the Monitoring, Analysis and Mitigation of Urban Noise PollutionJuan Pablo Bello, Claudio Silva, Oded Nov et al.
We present the Sounds of New York City (SONYC) project, a smart cities initiative focused on developing a cyber-physical system for the monitoring, analysis and mitigation of urban noise pollution. Noise pollution is one of the topmost quality of life issues for urban residents in the U.S. with proven effects on health, education, the economy, and the environment. Yet, most cities lack the resources to continuously monitor noise and understand the contribution of individual sources, the tools to analyze patterns of noise pollution at city-scale, and the means to empower city agencies to take effective, data-driven action for noise mitigation. The SONYC project advances novel technological and socio-technical solutions that help address these needs. SONYC includes a distributed network of both sensors and people for large-scale noise monitoring. The sensors use low-cost, low-power technology, and cutting-edge machine listening techniques, to produce calibrated acoustic measurements and recognize individual sound sources in real time. Citizen science methods are used to help urban residents connect to city agencies and each other, understand their noise footprint, and facilitate reporting and self-regulation. Crucially, SONYC utilizes big data solutions to analyze, retrieve and visualize information from sensors and citizens, creating a comprehensive acoustic model of the city that can be used to identify significant patterns of noise pollution. These data can be used to drive the strategic application of noise code enforcement by city agencies to optimize the reduction of noise pollution. The entire system, integrating cyber, physical and social infrastructure, forms a closed loop of continuous sensing, analysis and actuation on the environment. SONYC provides a blueprint for the mitigation of noise pollution that can potentially be applied to other cities in the US and abroad.
CRJun 25, 2014
Extract Secrets from Wireless Channel: A New Shape-based ApproachYue Qiao, Kannan Srinivasan, Anish Arora
Existing secret key extraction techniques use quantization to map wireless channel amplitudes to secret bits. This pa- per shows that such techniques are highly prone to environ- ment and local noise effects: They have very high mismatch rates between the two nodes that measure the channel be- tween them. This paper advocates using the shape of the channel instead of the size (or amplitude) of the channel. It shows that this new paradigm shift is significantly ro- bust against environmental and local noises. We refer to this shape-based technique as Puzzle. Implementation in a software-defined radio (SDR) platform demonstrates that Puzzle has a 63% reduction in bit mismatch rate than the state-of-art frequency domain approach (CSI-2bit). Exper- iments also show that unlike the state-of-the-art received signal strength (RSS)-based methods like ASBG, Puzzle is robust against an attack in which an eavesdropper can pre- dict the secret bits using planned movements.