CRCVLGOct 12, 2023

Invisible Threats: Backdoor Attack in OCR Systems

arXiv:2310.08259v12 citationsh-index: 39
Originality Synthesis-oriented
AI Analysis

This exposes a critical weakness in OCR systems, potentially disrupting NLP applications that rely on OCR preprocessing, though it is incremental as it applies an existing attack method to a new domain.

The paper tackles the vulnerability of deep neural network-based OCR systems to backdoor attacks, where attackers insert a backdoor during training that activates with specific patterns, causing the model to output non-readable characters in about 90% of poisoned instances while maintaining overall performance.

Optical Character Recognition (OCR) is a widely used tool to extract text from scanned documents. Today, the state-of-the-art is achieved by exploiting deep neural networks. However, the cost of this performance is paid at the price of system vulnerability. For instance, in backdoor attacks, attackers compromise the training phase by inserting a backdoor in the victim's model that will be activated at testing time by specific patterns while leaving the overall model performance intact. This work proposes a backdoor attack for OCR resulting in the injection of non-readable characters from malicious input images. This simple but effective attack exposes the state-of-the-art OCR weakness, making the extracted text correct to human eyes but simultaneously unusable for the NLP application that uses OCR as a preprocessing step. Experimental results show that the attacked models successfully output non-readable characters for around 90% of the poisoned instances without harming their performance for the remaining instances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes