ASJan 18, 2025
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition SystemsAmin Robatian, Mohammad Hajipour, Mohammad Reza Peyghan et al.
Automatic Speech Recognition (ASR) systems have demonstrated remarkable performance across various applications. However, limited data and the unique language features of specific domains, such as low-resource languages, significantly degrade their performance and lead to higher Word Error Rates (WER). In this study, we propose Generative Error Correction via Retrieval-Augmented Generation (GEC-RAG), a novel approach designed to improve ASR accuracy for low-resource domains, like Persian. Our approach treats the ASR system as a black-box, a common practice in cloud-based services, and proposes a Retrieval-Augmented Generation (RAG) approach within the In-Context Learning (ICL) scheme to enhance the quality of ASR predictions. By constructing a knowledge base that pairs ASR predictions (1-best and 5-best hypotheses) with their corresponding ground truths, GEC-RAG retrieves lexically similar examples to the ASR transcription using the Term Frequency-Inverse Document Frequency (TF-IDF) measure. This process provides relevant error patterns of the system alongside the ASR transcription to the Generative Large Language Model (LLM), enabling targeted corrections. Our results demonstrate that this strategy significantly reduces WER in Persian and highlights a potential for domain adaptation and low-resource scenarios. This research underscores the effectiveness of using RAG in enhancing ASR systems without requiring direct model modification or fine-tuning, making it adaptable to any domain by simply updating the transcription knowledge base with domain-specific data.
CROct 18, 2025
A Versatile Framework for Designing Group-Sparse Adversarial AttacksAlireza Heshmati, Saman Soleimani Roudi, Sajjad Amini et al.
Existing adversarial attacks often neglect perturbation sparsity, limiting their ability to model structural changes and to explain how deep neural networks (DNNs) process meaningful input patterns. We propose ATOS (Attack Through Overlapping Sparsity), a differentiable optimization framework that generates structured, sparse adversarial perturbations in element-wise, pixel-wise, and group-wise forms. For white-box attacks on image classifiers, we introduce the Overlapping Smoothed L0 (OSL0) function, which promotes convergence to a stationary point while encouraging sparse, structured perturbations. By grouping channels and adjacent pixels, ATOS improves interpretability and helps identify robust versus non-robust features. We approximate the L-infinity gradient using the logarithm of the sum of exponential absolute values to tightly control perturbation magnitude. On CIFAR-10 and ImageNet, ATOS achieves a 100% attack success rate while producing significantly sparser and more structurally coherent perturbations than prior methods. The structured group-wise attack highlights critical regions from the network's perspective, providing counterfactual explanations by replacing class-defining regions with robust features from the target class.
CRSep 20, 2017
An improvement on LSB+ methodKazem Qazanfari, Shahrokh Ghaemmaghami
The Least Significant Bit (LSB) substitution is an old and simple data hiding method that could almost effortlessly be implemented in spatial or transform domain over any digital media. This method can be attacked by several steganalysis methods, because it detectably changes statistical and perceptual characteristics of the cover signal. A typical method for steganalysis of the LSB substitution is the histogram attack that attempts to diagnose anomalies in the cover image's histogram. A well-known method to stand the histogram attack is the LSB+ steganography that intentionally embeds some extra bits to make the histogram look natural. However, the LSB+ method still affects the perceptual and statistical characteristics of the cover signal. In this paper, we propose a new method for image steganography, called LSB++, which improves over the LSB+ image steganography by decreasing the amount of changes made to the perceptual and statistical attributes of the cover image. We identify some sensitive pixels affecting the signal characteristics, and then lock and keep them from the extra bit embedding process of the LSB+ method, by introducing a new embedding key. Evaluation results show that, without reducing the embedding capacity, our method can decrease potentially detectable changes caused by the embedding process.
MMJan 8, 2015
Enhance Robustness of Image-in-Image Watermarking through Data PartitioningHossein Bakhshi Golestani, Shahrokh Ghaemmaghami
Vulnerability of watermarking schemes against intense signal processing attacks is generally a major concern, particularly when there are techniques to reproduce an acceptable copy of the original signal with no chance for detecting the watermark. In this paper, we propose a two-layer, data partitioning (DP) based, image in image watermarking method in the DCT domain to improve the watermark detection performance. Truncated singular value decomposition, binary wavelet decomposition and spatial scalability idea in H.264/SVC are analyzed and employed as partitioning methods. It is shown that the proposed scheme outperforms its two recent competitors in terms of both data payload and robustness to intense attacks.