18.4ASApr 10
Enhancing ASR Performance in the Medical Domain for Dravidian LanguagesSri Charan Devarakonda, Ravi Sastry Kolluru, Manjula Sri Rayudu et al.
Automatic Speech Recognition (ASR) for low-resource Dravidian languages like Telugu and Kannada faces significant challenges in specialized medical domains due to limited annotated data and morphological complexity. This work proposes a novel confidence-aware training framework that integrates real and synthetic speech data through a hybrid confidence mechanism combining static perceptual and acoustic similarity metrics with dynamic model entropy. Unlike direct fine-tuning approaches, the proposed methodology employs both fixed-weight and learnable-weight confidence aggregation strategies to guide sample weighting during training, enabling effective utilization of heterogeneous data sources. The framework is evaluated on Telugu and Kannada medical datasets containing both real recordings and TTS-generated synthetic speech. A 5-gram KenLM language model is applied for post-decoding correction. Results show that the hybrid confidence-aware approach with learnable weights substantially reduces recognition errors: Telugu Word Error Rate (WER) decreases from 24.3% to 15.8% (8.5% absolute improvement), while Kannada WER drops from 31.7% to 25.4% (6.3% absolute improvement), both significantly outperforming standard fine-tuning baselines. These findings confirm that combining adaptive confidence-aware training with statistical language modeling delivers superior performance for domain-specific ASR in morphologically complex Dravidian languages.
CVSep 29, 2021Code
Improvising the Learning of Neural Networks on Hyperspherical ManifoldLalith Bharadwaj Baru, Sai Vardhan Kanumolu, Akshay Patel Shilhora et al.
The impact of convolution neural networks (CNNs) in the supervised settings provided tremendous increment in performance. The representations learned from CNN's operated on hyperspherical manifold led to insightful outcomes in face recognition, face identification, and other supervised tasks. A broad range of activation functions were developed with hypersphere intuition which performs superior to softmax in euclidean space. The main motive of this research is to provide insights. First, the stereographic projection is implied to transform data from Euclidean space ($\mathbb{R}^{n}$) to hyperspherical manifold ($\mathbb{S}^{n}$) to analyze the performance of angular margin losses. Secondly, proving theoretically and practically that decision boundaries constructed on hypersphere using stereographic projection obliges the learning of neural networks. Experiments have demonstrated that applying stereographic projection on existing state-of-the-art angular margin objective functions improved performance for standard image classification data sets (CIFAR-10,100). Further, we ran our experiments on malaria-thin blood smear images, resulting in effective outcomes. The code is publicly available at:https://github.com/barulalithb/stereo-angular-margin.
CVOct 7, 2020
COVID-19 Classification Using Staked Ensembles: A Comprehensive AnalysisLalith Bharadwaj B, Rohit Boddeda, Sai Vardhan K et al.
The issue of COVID-19, increasing with a massive mortality rate. This led to the WHO declaring it as a pandemic. In this situation, it is crucial to perform efficient and fast diagnosis. The reverse transcript polymerase chain reaction (RTPCR) test is conducted to detect the presence of SARS-CoV-2. This test is time-consuming and instead chest CT (or Chest X-ray) can be used for a fast and accurate diagnosis. Automated diagnosis is considered to be important as it reduces human effort and provides accurate and low-cost tests. The contributions of our research are three-fold. First, it is aimed to analyse the behaviour and performance of variant vision models ranging from Inception to NAS networks with the appropriate fine-tuning procedure. Second, the behaviour of these models is visually analysed by plotting CAMs for individual networks and determining classification performance with AUCROC curves. Thirdly, stacked ensembles techniques are imparted to provide higher generalisation on combining the fine-tuned models, in which six ensemble neural networks are designed by combining the existing fine-tuned networks. Implying these stacked ensembles provides a great generalization to the models. The ensemble model designed by combining all the fine-tuned networks obtained a state-of-the-art accuracy score of 99.17%. The precision and recall for the COVID-19 class are 99.99% and 89.79% respectively, which resembles the robustness of the stacked ensembles.