Callico: a Versatile Open-Source Document Image Annotation Platform
It addresses the need for efficient, collaborative annotation tools in document recognition projects, offering a practical solution for researchers and practitioners, though it is incremental as it builds on existing annotation concepts.
The paper introduces Callico, an open-source web platform designed to streamline document image annotation for tasks like OCR and layout analysis, demonstrating its utility through real-world applications such as transcribing municipal registers and indexing historical records.
This paper presents Callico, a web-based open source platform designed to simplify the annotation process in document recognition projects. The move towards data-centric AI in machine learning and deep learning underscores the importance of high-quality data, and the need for specialised tools that increase the efficiency and effectiveness of generating such data. For document image annotation, Callico offers dual-display annotation for digitised documents, enabling simultaneous visualisation and annotation of scanned images and text. This capability is critical for OCR and HTR model training, document layout analysis, named entity recognition, form-based key value annotation or hierarchical structure annotation with element grouping. The platform supports collaborative annotation with versatile features backed by a commitment to open source development, high-quality code standards and easy deployment via Docker. Illustrative use cases - including the transcription of the Belfort municipal registers, the indexing of French World War II prisoners for the ICRC, and the extraction of personal information from the Socface project's census lists - demonstrate Callico's applicability and utility.