ASMay 27, 2025Code
PSRB: A Comprehensive Benchmark for Evaluating Persian ASR SystemsNima Sedghiyeh, Sara Sadeghi, Reza Khodadadi et al.
Although Automatic Speech Recognition (ASR) systems have become an integral part of modern technology, their evaluation remains challenging, particularly for low-resource languages such as Persian. This paper introduces Persian Speech Recognition Benchmark(PSRB), a comprehensive benchmark designed to address this gap by incorporating diverse linguistic and acoustic conditions. We evaluate ten ASR systems, including state-of-the-art commercial and open-source models, to examine performance variations and inherent biases. Additionally, we conduct an in-depth analysis of Persian ASR transcriptions, identifying key error types and proposing a novel metric that weights substitution errors. This metric enhances evaluation robustness by reducing the impact of minor and partial errors, thereby improving the precision of performance assessment. Our findings indicate that while ASR models generally perform well on standard Persian, they struggle with regional accents, children's speech, and specific linguistic challenges. These results highlight the necessity of fine-tuning and incorporating diverse, representative training datasets to mitigate biases and enhance overall ASR performance. PSRB provides a valuable resource for advancing ASR research in Persian and serves as a framework for developing benchmarks in other low-resource languages. A subset of the PSRB dataset is publicly available at https://huggingface.co/datasets/PartAI/PSRB.
SPDec 16, 2018
Deep UL2DL: Channel Knowledge Transfer from Uplink to DownlinkMohammad Sadegh Safari, Vahid Pourahmadi, Shabnam Sodagari
Knowledge of the channel state information (CSI) at the transmitter side is one of the primary sources of information that can be used for the efficient allocation of wireless resources. Obtaining downlink (DL) CSI in Frequency Division Duplexing (FDD) systems from uplink (UL) CSI is not as straightforward as in TDD systems. Therefore, users usually feed the DL-CSI back to the transmitter. To remove the need for feedback (and thus having less signaling overhead), we propose to use two recent deep neural network structures, i.e., convolutional neural networks and generative adversarial networks (GANs) to infer the DL-CSI by observing the UL-CSI. The core idea of our data-driven scheme is exploiting the fact that both DL and UL channels share the same propagation environment. As such, we extracted the environment information from the UL channel response to a latent domain and then transferred the derived environment information from the latent domain to predict the DL channel. To overcome incorrect latent domain and the problem of oversimplistic assumptions, in this work, we did not use any specific parametric model and instead used data-driven approaches to discover the underlying structure of data without any prior model assumptions. To overcome the challenge of capturing the UL-DL joint distribution, we used a mean square error-based variant of the GAN structure with improved convergence properties called boundary equilibrium GAN (BEGAN). For training and testing we used simulated data of Extended Vehicular-A (EVA) and Extended Typical Urban (ETU) models. Simulation results verified that our methods can accurately infer and predict the downlink CSI from the uplink CSI for different multipath environments in FDD communications.