PaddleOCR 3.0 Technical Report
This is an incremental update to an existing toolkit, addressing the demand for document understanding in applications involving large language models.
The authors tackled the problem of document understanding by introducing PaddleOCR 3.0, an open-source toolkit with three solutions for multilingual text recognition, hierarchical document parsing, and key information extraction, achieving competitive accuracy and efficiency with models under 100 million parameters compared to billion-parameter vision-language models.
This technical report introduces PaddleOCR 3.0, an Apache-licensed open-source toolkit for OCR and document parsing. To address the growing demand for document understanding in the era of large language models, PaddleOCR 3.0 presents three major solutions: (1) PP-OCRv5 for multilingual text recognition, (2) PP-StructureV3 for hierarchical document parsing, and (3) PP-ChatOCRv4 for key information extraction. Compared to mainstream vision-language models (VLMs), these models with fewer than 100 million parameters achieve competitive accuracy and efficiency, rivaling billion-parameter VLMs. In addition to offering a high-quality OCR model library, PaddleOCR 3.0 provides efficient tools for training, inference, and deployment, supports heterogeneous hardware acceleration, and enables developers to easily build intelligent document applications.