Andrew C. Singer

AS
h-index9
21papers
289citations
Novelty43%
AI Score52

21 Papers

SYMar 19, 2012
Linear MMSE-Optimal Turbo Equalization Using Context Trees

Nargiz Kalantarova, Kyeongyeon Kim, Suleyman S. Kozat et al.

Formulations of the turbo equalization approach to iterative equalization and decoding vary greatly when channel knowledge is either partially or completely unknown. Maximum aposteriori probability (MAP) and minimum mean square error (MMSE) approaches leverage channel knowledge to make explicit use of soft information (priors over the transmitted data bits) in a manner that is distinctly nonlinear, appearing either in a trellis formulation (MAP) or inside an inverted matrix (MMSE). To date, nearly all adaptive turbo equalization methods either estimate the channel or use a direct adaptation equalizer in which estimates of the transmitted data are formed from an expressly linear function of the received data and soft information, with this latter formulation being most common. We study a class of direct adaptation turbo equalizers that are both adaptive and nonlinear functions of the soft information from the decoder. We introduce piecewise linear models based on context trees that can adaptively approximate the nonlinear dependence of the equalizer on the soft information such that it can choose both the partition regions as well as the locally linear equalizer coefficients in each region independently, with computational complexity that remains of the order of a traditional direct adaptive linear equalizer. This approach is guaranteed to asymptotically achieve the performance of the best piecewise linear equalizer and we quantify the MSE performance of the resulting algorithm and the convergence of its MSE to that of the linear minimum MSE estimator as the depth of the context tree and the data length increase.

SYMar 19, 2012
Low Complexity Turbo-Equalization: A Clustering Approach

Kyeongyeon Kim, Jun Won Choi, Suleyman S. Kozat et al.

We introduce a low complexity approach to iterative equalization and decoding, or "turbo equalization", that uses clustered models to better match the nonlinear relationship that exists between likelihood information from a channel decoder and the symbol estimates that arise in soft-input channel equalization. The introduced clustered turbo equalizer uses piecewise linear models to capture the nonlinear dependency of the linear minimum mean square error (MMSE) symbol estimate on the symbol likelihoods produced by the channel decoder and maintains a computational complexity that is only linear in the channel memory. By partitioning the space of likelihood information from the decoder, based on either hard or soft clustering, and using locally-linear adaptive equalizers within each clustered region, the performance gap between the linear MMSE equalizer and low-complexity, LMS-based linear turbo equalizers can be dramatically narrowed.

SYMay 5
Adaptive Diagonal Loading for Norm Constrained Beamforming

Manan Mittal, Ryan M. Corey, John R. Buck et al.

Reliable adaptive beamforming is critical for large microphone arrays operating in highly dynamic acoustic environments. In scenarios characterized by fast-moving talkers and interferers, the available sample support for estimating the spatial correlation matrix is often snapshot-deficient. This deficiency, coupled with array imperfections, degrades the White Noise Gain (WNG), leading to severe target signal cancellation. To ensure stable and robust beamforming, we propose a novel adaptive diagonal loading method that guarantees the WNG remains strictly within specified bounds. By leveraging the Kantorovich inequality, we map the desired WNG to a strict upper bound on the condition number of the correlation matrix. Furthermore, we present three estimation techniques for the adaptive loading level, ranging from trace-based bounding to exact eigenvalue decomposition, offering scalable computational complexities of $\mathcal{O}(M)$, $\mathcal{O}(M^2)$, and $\mathcal{O}(M^3)$. Our approach demonstrates highly stable beamforming under fast-changing interference.

SPMay 24
Time Segmented Beamforming via Dynamic Programming: Theory and Implementation

Manan Mittal, Ryan M. Corey, Diego Cuji et al.

In dynamic acoustic environments with time-varying interferers, effective beamforming requires identifying stationary regions over time. The Capon beamformer, a whitened matched filter constrained to maintain unity gain in the desired direction, theoretically relies on the instantaneous ensemble covariance matrix. Practical implementations rely on the batch Capon (or Sample Matrix Inversion), which estimates the sample covariance matrix (SCM) by averaging over a block of snapshots. This practical approach implicitly assumes that the data within the batch window is stationary and can be coherently combined. In non-stationary settings, a batch approach that averages over fixed or excessively long windows fails, as moving interferers smear the SCM and degrade the beamformer's nulling capabilities. To address this, this paper introduces a temporally segmented distortionless response beamformer. Inspired by the segmented least squares method, which fits piecewise polynomials to data while penalizing excessive segmentation to prevent overfitting, the framework extends practical Capon beamforming by incorporating data-driven temporal segmentation. This formulation minimizes output power while dynamically adapting the SCM estimation windows to local stationarity, offering a principled approach to tracking time-varying interferers.

AIAug 20, 2023
Unsupervised Opinion Aggregation -- A Statistical Perspective

Noyan C. Sevuktekin, Andrew C. Singer

Complex decision-making systems rarely have direct access to the current state of the world and they instead rely on opinions to form an understanding of what the ground truth could be. Even in problems where experts provide opinions without any intention to manipulate the decision maker, it is challenging to decide which expert's opinion is more reliable -- a challenge that is further amplified when decision-maker has limited, delayed, or no access to the ground truth after the fact. This paper explores a statistical approach to infer the competence of each expert based on their opinions without any need for the ground truth. Echoing the logic behind what is commonly referred to as \textit{the wisdom of crowds}, we propose measuring the competence of each expert by their likeliness to agree with their peers. We further show that the more reliable an expert is the more likely it is that they agree with their peers. We leverage this fact to propose a completely unsupervised version of the naïve Bayes classifier and show that the proposed technique is asymptotically optimal for a large class of problems. In addition to aggregating a large block of opinions, we further apply our technique for online opinion aggregation and for decision-making based on a limited the number of opinions.

ASMay 15
A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models

Ningyuan Yang, Yize Li, Diego A. Cuji et al.

Audio super-resolution (SR), also referred to as bandwidth extension (BWE), aims to reconstruct high-fidelity signals from low-resolution (LR) or band-limited (BL) observations, an inherently ill-posed task due to the ambiguity of missing high-frequency (HF) content. This survey provides a comprehensive overview of the field, with a particular focus on the paradigm shift from discriminative mapping to modern generative modeling. We first review early discriminative deep neural network (DNN) models, which formulate BWE/SR as a deterministic mapping problem and are prone to regression-to-the-mean effects and spectral over-smoothing. We then systematically review generative approaches, including autoregressive (AR) models, variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion and score-based models, flow-based methods, and Schrödinger bridges. Across these approaches, we examine key design aspects, including representation domain, architecture, conditioning mechanisms, and trade-offs among reconstruction fidelity, perceptual quality, robustness, and computational efficiency. Furthermore, we discuss emerging directions involving large language models (LLMs) and multimodal foundation models, and highlight open challenges in perceptual evaluation, phase modeling, and real-world generalization. By providing a structured taxonomy and unified perspective, this survey establishes a comprehensive foundation and offers a practical roadmap for advancing BWE/SR from deterministic point estimation toward distribution-aware generative modeling.

SPMay 11
Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming

Manan Mittal, Ryan M. Corey, John R. Buck et al.

Reliable adaptive beamforming is critical for large microphone arrays operating in highly dynamic acoustic environments. In scenarios characterized by fast-moving talkers and interferers, the available sample support for estimating the spatial correlation matrix is often snapshot-deficient. This deficiency degrades the White Noise Gain (WNG), leading to severe target signal cancellation. To ensure stable and robust beamforming, we previously proposed an adaptive diagonal loading method that leverages the Kantorovich inequality to guarantee the WNG remains strictly within specified bounds. However, accurately determining the smallest necessary loading level requires calculating the extreme eigenvalues of the spatial correlation matrix, a computationally expensive $\mathcal{O}(M^3)$ operation for large arrays. In this paper, we introduce a highly efficient $\mathcal{O}(kM^2)$ estimation technique using Lanczos iterations to build a small Krylov subspace. By projecting the correlation matrix onto a tridiagonal matrix of dimension $k \ll M$, we extract Ritz values that rapidly converge to the exact extreme eigenvalues. Our evaluations demonstrate that this Lanczos-accelerated approach achieves performance identical to exact Eigenvalue Decomposition (EVD), ensuring optimal interference suppression and strict WNG adherence at a fraction of the computational cost.

SDMay 8
Online Segmented Beamforming via Dynamic Programming

Manan Mittal, Ryan M. Corey, Diego Cuji et al.

In dynamic acoustic environments characterized by time-varying interferers and moving sources, effective beamforming requires accurately identifying stationary regions over time. Traditional Capon beamformers rely on the instantaneous ensemble covariance matrix, which is inaccessible in practice. Practical implementations overcome this by estimating the sample covariance matrix (SCM) through averaging over a block of temporal samples. However, in non-stationary settings, a naive batch approach fails. Moving interferers smear the SCM, causing the beamformer to place nulls in outdated locations while failing to track newly active interferers, thereby degrading its nulling capabilities. To address this fundamental limitation, an Online Segmented Beamformer is proposed. This algorithm incorporates data-driven temporal segmentation to causally minimize output power while dynamically adapting the SCM estimation windows to local stationarity. By framing the problem through the lens of dynamic programming, the proposed method tracks abrupt environmental changes and resets covariance estimates in real-time. We validate the performance of this framework in a complex, reverberant simulated acoustic environment and in highly reverberant real world experiments, demonstrating its superiority over fixed-window adaptive methods.

SDMar 30, 2025
Joint Source-Environment Adaptation for Deep Learning-Based Underwater Acoustic Source Ranging

Dariush Kari, Andrew C. Singer

In this paper, we propose a method to adapt a pre-trained deep-learning-based model for underwater acoustic localization to a new environment. We use unsupervised domain adaptation to improve the generalization performance of the model, i.e., using an unsupervised loss, fine-tune the pre-trained network parameters without access to any labels of the target environment or any data used to pre-train the model. This method improves the pre-trained model prediction by coupling that with an almost independent estimation based on the received signal energy (that depends on the source). We show the effectiveness of this approach on Bellhop generated data in an environment similar to that of the SWellEx-96 experiment contaminated with real ocean noise from the KAM11 experiment.

SDMar 30, 2025
Mismatch-Robust Underwater Acoustic Localization Using A Differentiable Modular Forward Model

Dariush Kari, Yongjie Zhuang, Andrew C. Singer

In this paper, we study the underwater acoustic localization in the presence of environmental mismatch. Especially, we exploit a pre-trained neural network for the acoustic wave propagation in a gradient-based optimization framework to estimate the source location. To alleviate the effect of mismatch between the training data and the test data, we simultaneously optimize over the network weights at the inference time, and provide conditions under which this method is effective. Moreover, we introduce a physics-inspired modularity in the forward model that enables us to learn the path lengths of the multipath structure in an end-to-end training manner without access to the specific path labels. We investigate the validity of the assumptions in a simple yet illustrative environment model.

SDMar 30, 2025
Joint Source-Environment Adaptation of Data-Driven Underwater Acoustic Source Ranging Based on Model Uncertainty

Dariush Kari, Hari Vishnu, Andrew C. Singer

Adapting pre-trained deep learning models to new and unknown environments remains a major challenge in underwater acoustic localization. We show that although the performance of pre-trained models suffers from mismatch between the training and test data, they generally exhibit a higher uncertainty in environments where there is more mismatch. Additionally, in the presence of environmental mismatch, spurious peaks can appear in the output of classification-based localization approaches, which inspires us to define and use a method to quantify the "implied uncertainty" based on the number of model output peaks. Leveraging this notion of implied uncertainty, we partition the test samples into sets with more certain and less certain samples, and implement a method to adapt the model to new environments by using the certain samples to improve the labeling for uncertain samples, which helps to adapt the model. Thus, using this efficient method for model uncertainty quantification, we showcase an innovative approach to adapt a pre-trained model to unseen underwater environments at test time. This eliminates the need for labeled data from the target environment or the original training data. This adaptation is enhanced by integrating an independent estimate based on the received signal energy. We validate the approach extensively using real experimental data, as well as synthetic data consisting of model-generated signals with real ocean noise. The results demonstrate significant improvements in model prediction accuracy, underscoring the potential of the method to enhance underwater acoustic localization in diverse, noisy, and unknown environments.

SPApr 2, 2021
Blind Exploration and Exploitation of Stochastic Experts

Noyan C. Sevuktekin, Andrew C. Singer

We present blind exploration and exploitation (BEE) algorithms for identifying the most reliable stochastic expert based on formulations that employ posterior sampling, upper-confidence bounds, empirical Kullback-Leibler divergence, and minmax methods for the stochastic multi-armed bandit problem. Joint sampling and consultation of experts whose opinions depend on the hidden and random state of the world becomes challenging in the unsupervised, or blind, framework as feedback from the true state is not available. We propose an empirically realizable measure of expert competence that can be inferred instantaneously using only the opinions of other experts. This measure preserves the ordering of true competences and thus enables joint sampling and consultation of stochastic experts based on their opinions on dynamically changing tasks. Statistics derived from the proposed measure is instantaneously available allowing both blind exploration-exploitation and unsupervised opinion aggregation. We discuss how the lack of supervision affects the asymptotic regret of BEE architectures that rely on UCB1, KL-UCB, MOSS, IMED, and Thompson sampling. We demonstrate the performance of different BEE algorithms empirically and compare them to their standard, or supervised, counterparts.

ASDec 7, 2020
Modeling the effects of dynamic range compression on signals in noise

Ryan M. Corey, Andrew C. Singer

Hearing aids use dynamic range compression (DRC), a form of automatic gain control, to make quiet sounds louder and loud sounds quieter. Compression can improve listening comfort, but it can also cause distortion in noisy environments. It has been widely reported that DRC performs poorly in noise, but there has been little mathematical analysis of these distortion effects. This work introduces a mathematical model to study the behavior of DRC in noise. Using statistical assumptions about the signal envelopes, we define an effective compression function that models the compression applied to one signal in the presence of another. This framework is used to prove results about DRC that have been previously observed experimentally: that when DRC is applied to a mixture of signals, uncorrelated signal envelopes become negatively correlated; that the effective compression applied to each sound in a mixture is weaker than it would have been for the signal alone; and that compression can reduce the long-term signal-to-noise ratio in certain conditions. These theoretical results are supported by software experiments using recorded speech signals.

ASAug 11, 2020
Acoustic effects of medical, cloth, and transparent face masks on speech signals

Ryan M. Corey, Uriah Jones, Andrew C. Singer

Face masks muffle speech and make communication more difficult, especially for people with hearing loss. This study examines the acoustic attenuation caused by different face masks, including medical, cloth, and transparent masks, using a head-shaped loudspeaker and a live human talker. The results suggest that all masks attenuate frequencies above 1 kHz, that attenuation is greatest in front of the talker, and that there is substantial variation between mask types, especially cloth masks with different materials and weaves. Transparent masks have poor acoustic performance compared to both medical and cloth masks. Most masks have little effect on lapel microphones, suggesting that existing sound reinforcement and assistive listening systems may be effective for verbal communication with masks.

ASApr 24, 2020
Binaural Audio Source Remixing with Microphone Array Listening Devices

Ryan M. Corey, Andrew C. Singer

Augmented listening devices, such as hearing aids and augmented reality headsets, enhance human perception by changing the sounds that we hear. Microphone arrays can improve the performance of listening systems in noisy environments, but most array-based listening systems are designed to isolate a single sound source from a mixture. This work considers a source-remixing filter that alters the relative level of each source independently. Remixing rather than separating sounds can help to improve perceptual transparency: it causes less distortion to the signal spectrum and especially to the interaural cues that humans use to localize sounds in space.

ASDec 10, 2019
Motion-Tolerant Beamforming with Deformable Microphone Arrays

Ryan M. Corey, Andrew C. Singer

Microphone arrays are usually assumed to have rigid geometries: the microphones may move with respect to the sound field but remain fixed relative to each other. However, many useful arrays, such as those in wearable devices, have sensors that can move relative to each other. We compare two approaches to beamforming with deformable microphone arrays: first, by explicitly tracking the geometry of the array as it changes over time, and second, by designing a time-invariant beamformer based on the second-order statistics of the moving array. The time-invariant approach is shown to be appropriate when the motion of the array is small relative to the acoustic wavelengths of interest. The performance of the proposed beamforming system is demonstrated using a wearable microphone array on a moving human listener in a cocktail-party scenario.

ASDec 10, 2019
Cooperative Audio Source Separation and Enhancement Using Distributed Microphone Arrays and Wearable Devices

Ryan M. Corey, Matthew D. Skarha, Andrew C. Singer

Augmented listening devices such as hearing aids often perform poorly in noisy and reverberant environments with many competing sound sources. Large distributed microphone arrays can improve performance, but data from remote microphones often cannot be used for delay-constrained real-time processing. We present a cooperative audio source separation and enhancement system that leverages wearable listening devices and other microphone arrays spread around a room. The full distributed array is used to separate sound sources and estimate their statistics. Each listening device uses these statistics to design real-time binaural audio enhancement filters using its own local microphones. The system is demonstrated experimentally using 10 speech sources and 160 microphones in a large, reverberant room.

ASMar 5, 2019
Acoustic Impulse Responses for Wearable Audio Devices

Ryan M. Corey, Naoki Tsuda, Andrew C. Singer

We present an open-access dataset of over 8000 acoustic impulse from 160 microphones spread across the body and affixed to wearable accessories. The data can be used to evaluate audio capture and array processing systems using wearable devices such as hearing aids, headphones, eyeglasses, jewelry, and clothing. We analyze the acoustic transfer functions of different parts of the body, measure the effects of clothing worn over microphones, compare measurements from a live human subject to those from a mannequin, and simulate the noise-reduction performance of several beamformers. The results suggest that arrays of microphones spread across the body are more effective than those confined to a single device.

ASJul 31, 2018
Speech Separation Using Partially Asynchronous Microphone Arrays Without Resampling

Ryan M. Corey, Andrew C. Singer

We consider the problem of separating speech sources captured by multiple spatially separated devices, each of which has multiple microphones and samples its signals at a slightly different rate. Most asynchronous array processing methods rely on sample rate offset estimation and resampling, but these offsets can be difficult to estimate if the sources or microphones are moving. We propose a source separation method that does not require offset estimation or signal resampling. Instead, we divide the distributed array into several synchronous subarrays. All arrays are used jointly to estimate the time-varying signal statistics, and those statistics are used to design separate time-varying spatial filters in each array. We demonstrate the method for speech mixtures recorded on both stationary and moving microphone arrays.

ASJul 31, 2018
Delay-Performance Tradeoffs in Causal Microphone Array Processing

Ryan M. Corey, Naoki Tsuda, Andrew C. Singer

In real-time listening enhancement applications, such as hearing aid signal processing, sounds must be processed with no more than a few milliseconds of delay to sound natural to the listener. Listening devices can achieve better performance with lower delay by using microphone arrays to filter acoustic signals in both space and time. Here, we analyze the tradeoff between delay and squared-error performance of causal multichannel Wiener filters for microphone array noise reduction. We compute exact expressions for the delay-error curves in two special cases and present experimental results from real-world microphone array recordings. We find that delay-performance characteristics are determined by both the spatial and temporal correlation structures of the signals.

LGMay 19, 2017
EE-Grad: Exploration and Exploitation for Cost-Efficient Mini-Batch SGD

Mehmet A. Donmez, Maxim Raginsky, Andrew C. Singer

We present a generic framework for trading off fidelity and cost in computing stochastic gradients when the costs of acquiring stochastic gradients of different quality are not known a priori. We consider a mini-batch oracle that distributes a limited query budget over a number of stochastic gradients and aggregates them to estimate the true gradient. Since the optimal mini-batch size depends on the unknown cost-fidelity function, we propose an algorithm, {\it EE-Grad}, that sequentially explores the performance of mini-batch oracles and exploits the accumulated knowledge to estimate the one achieving the best performance in terms of cost-efficiency. We provide performance guarantees for EE-Grad with respect to the optimal mini-batch oracle, and illustrate these results in the case of strongly convex objectives. We also provide a simple numerical example that corroborates our theoretical findings.