Martin Bouchard

AS
h-index35
5papers
55citations
Novelty47%
AI Score46

5 Papers

22.0ASMay 13
LMU-Based Sequential Learning and Posterior Ensemble Fusion for Cross-Domain Infant Cry Classification

Niloofar Jazaeri, Hilmi R. Dajani, Marco Janeczek et al.

Decoding infant cry causes remains challenging for healthcare monitoring due to short nonstationary signals, limited annotations, and strong domain shifts across infants and datasets. We propose a compact acoustic framework that fuses mel-frequency cepstral coefficients (MFCCs), short-time Fourier transform (STFT) features, and fundamental-frequency (F0) contours within a multi-branch convolutional neural network (CNN) encoder, and models temporal dynamics using an enhanced Legendre Memory Unit (LMU). Compared to LSTMs, the LMU backbone provides stable sequence modeling with substantially fewer recurrent parameters, supporting efficient deployment. To improve cross-dataset generalization, we introduce calibrated posterior ensemble fusion with entropy-gated weighting to preserve domain-specific expertise while mitigating dataset bias. Experiments on Baby2020 and Baby Crying demonstrate improved macro-F1 under cross-domain evaluation, along with leakage aware splits and real-time feasibility for on-device monitoring.

CVMar 29, 2023
T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals

James Giroux, Martin Bouchard, Robert Laganiere

Object detection utilizing Frequency Modulated Continous Wave radar is becoming increasingly popular in the field of autonomous systems. Radar does not possess the same drawbacks seen by other emission-based sensors such as LiDAR, primarily the degradation or loss of return signals due to weather conditions such as rain or snow. However, radar does possess traits that make it unsuitable for standard emission-based deep learning representations such as point clouds. Radar point clouds tend to be sparse and therefore information extraction is not efficient. To overcome this, more traditional digital signal processing pipelines were adapted to form inputs residing directly in the frequency domain via Fast Fourier Transforms. Commonly, three transformations were used to form Range-Azimuth-Doppler cubes in which deep learning algorithms could perform object detection. This too has drawbacks, namely the pre-processing costs associated with performing multiple Fourier Transforms and normalization. We explore the possibility of operating on raw radar inputs from analog to digital converters via the utilization of complex transformation layers. Moreover, we introduce hierarchical Swin Vision transformers to the field of radar object detection and show their capability to operate on inputs varying in pre-processing, along with different radar configurations, i.e. relatively low and high numbers of transmitters and receivers, while obtaining on par or better results than the state-of-the-art.

43.0ASApr 29
Multi-Speaker DOA Estimation in Binaural Hearing Aids using Deep Learning and Speaker Count Fusion

Farnaz Jazaeri, Homayoun Kamkar-Parsi, François Grondin et al.

For extracting a target speaker voice, direction-of-arrival (DOA) estimation is crucial for binaural hearing aids operating in noisy, multi-speaker environments. Among the solutions developed for this task, a deep learning convolutional recurrent neural network (CRNN) model leveraging spectral phase differences and magnitude ratios between microphone signals is a popular option. In this paper, we explore adding source-count information for multi-sources DOA estimation. The use of dual-task training with joint multi-sources DOA estimation and source counting is first considered. We then consider using the source count as an auxiliary feature in a standalone DOA estimation system, where the number of active sources (0, 1, or 2+) is integrated into the CRNN architecture through early, mid, and late fusion strategies. Experiments using real binaural recordings are performed. Results show that the dual-task training does not improve DOA estimation performance, although it benefits source-count prediction. However, a ground-truth (oracle) source count used as an auxiliary feature significantly enhances standalone DOA estimation performance, with late fusion yielding up to 14% higher average F1-scores over the baseline CRNN. This highlights the potential of using source-count estimation for robust DOA estimation in binaural hearing aids.

SPSep 11, 2025
mRadNet: A Compact Radar Object Detector with MetaFormer

Huaiyu Chen, Fahed Hassanat, Robert Laganiere et al.

Frequency-modulated continuous wave radars have gained increasing popularity in the automotive industry. Its robustness against adverse weather conditions makes it a suitable choice for radar object detection in advanced driver assistance systems. These real-time embedded systems have requirements for the compactness and efficiency of the model, which have been largely overlooked in previous work. In this work, we propose mRadNet, a novel radar object detection model with compactness in mind. mRadNet employs a U-net style architecture with MetaFormer blocks, in which separable convolution and attention token mixers are used to capture both local and global features effectively. More efficient token embedding and merging strategies are introduced to further facilitate the lightweight design. The performance of mRadNet is validated on the CRUW dataset, improving state-of-the-art performance with the least number of parameters and FLOPs.

ASNov 3, 2018
A Robust Target Linearly Constrained Minimum Variance Beamformer With Spatial Cues Preservation for Binaural Hearing Aids

Hala As'ad, Martin Bouchard, Homayoun Kamkar-Parsi

In this paper, a binaural beamforming algorithm for hearing aid applications is introduced.The beamforming algorithm is designed to be robust to some error in the estimate of the target speaker direction. The algorithm has two main components: a robust target linearly constrained minimum variance (TLCMV) algorithm based on imposing two constraints around the estimated direction of the target signal, and a post-processor to help with the preservation of binaural cues. The robust TLCMV provides a good level of noise reduction and low level of target distortion under realistic conditions. The post-processor enhances the beamformer abilities to preserve the binaural cues for both diffuse-like background noise and directional interferers (competing speakers), while keeping a good level of noise reduction. The introduced algorithm does not require knowledge or estimation of the directional interferers' directions nor the second-order statistics of noise-only components. The introduced algorithm requires an estimate of the target speaker direction, but it is designed to be robust to some deviation from the estimated direction. Compared with recently proposed state-of-the-art methods, comprehensive evaluations are performed under complex realistic acoustic scenarios generated in both anechoic and mildly reverberant environments, considering a mismatch between estimated and true sources direction of arrival. Mismatch between the anechoic propagation models used for the design of the beamformers and the mildly reverberant propagation models used to generate the simulated directional signals is also considered. The results illustrate the robustness of the proposed algorithm to such mismatches.