CVAIMay 11

Confidence-Guided Diffusion Augmentation for Enhanced Bangla Compound Character Recognition

Md. Sultan Al Rayhan, Maheen Islam
arXiv:2605.109164.1
Predicted impact top 98% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers working on low-resource script recognition, this work provides a method to improve classification accuracy through quality-aware diffusion augmentation.

The paper tackles handwritten Bangla compound character recognition by proposing a confidence-guided diffusion augmentation framework that generates high-quality synthetic samples to address data scarcity and class imbalance. The best model achieves 89.2% accuracy, surpassing the previous benchmark by a substantial margin.

Recognition of handwritten Bangla compound characters remains a challenging problem due to complex character structures, large intra-class variation, and limited availability of high-quality annotated data. Existing Bangla handwritten character recognition systems often struggle to generalize across diverse writing styles, particularly for compound characters containing intricate ligatures and diacritical variations. In this work, we propose a confidence-guided diffusion augmentation framework for low-resolution Bangla compound character recognition. Our framework combines class-conditional diffusion modeling with classifier guidance to synthesize high-quality handwritten compound character samples. To further improve generation quality, we introduce Squeeze-and-Excitation enhanced residual blocks within the diffusion model's U-Net backbone. We additionally propose a confidence-based filtering mechanism where pre-trained classifiers act as quality gates to retain only highly class-consistent synthetic samples. The filtered synthetic images are fused with the original training data and used to retrain multiple classification architectures. Experiments conducted on the AIBangla compound character dataset demonstrate consistent performance improvements across ResNet50, DenseNet121, VGG16, and Vision Transformer architectures. Our best-performing model achieves 89.2\% classification accuracy, surpassing the previously published AIBangla benchmark by a substantial margin. The results demonstrate that quality-aware diffusion augmentation can effectively enhance handwritten character recognition performance in low-resource script domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes