CVLGSep 2, 2023

Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach

arXiv:2309.00848v5
Originality Synthesis-oriented
AI Analysis

This work addresses document analysis for Bengali script, which is incremental as it applies existing methods to a specific domain.

The paper tackles Bengali Document Layout Analysis by using a YOLOv8-based ensemble model with post-processing, achieving improved performance over base architectures on the BaDLAD dataset.

This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes