Beáta Megyesi

h-index2

4papers

37citations

Novelty42%

AI Score48

Ranked #29,091 of 194,257 authors (top 15%)#10,436 in CV (top 18%)

4 Papers

6.5CVApr 26Code

Learning to Decipher from Pixels -- A Case Study of Copiale

Lei Kang, Giuseppe De Gregorio, Raphaela Heil et al.

Historical encrypted manuscripts require both paleographic interpretation of cipher symbols and cryptanalytic recovery of plaintext. Most existing computational workflows rely on a transcription-first paradigm, in which handwritten symbols are transcribed prior to decipherment. This intermediate step is labor-intensive, error-prone, and not always aligned with the goal of direct plaintext recovery. We propose an end-to-end, transcription-free approach that directly maps handwritten cipher images to plaintext. Using the Copiale cipher as a case study, we introduce the first text-line-level dataset pairing cipher images with German plaintext. We show that pretraining on generic handwriting data followed by cipher-specific fine-tuning substantially improves decipherment accuracy. Our results demonstrate that transcription-free image-to-plaintext decipherment is both feasible and effective for historical substitution ciphers, offering a simplified and scalable alternative to traditional pipelines. https://github.com/leitro/Decipher-from-Pixels-Copiale

8.4CRJun 3

Attention-Augmented LSTMs for Automatic Homophonic Ciphertext Decipherment

Micaella Bruton, Meriem Beloucif, Beáta Megyesi

Homophonic substitution ciphers replace each plaintext letter with one of several possible ciphertext codes, deliberately weakening letter-frequency patterns and making automated decipherment difficult. This paper evaluates whether an attention-augmented Long Short-Term Memory (LSTM) model can learn such mappings in a historically motivated shared-key setting: all ciphertexts draw from the same known homophonic code pool, while individual keys use different consistent subsets of that pool. Using synthetic ciphertexts generated with ChronoFidelius from historical English and Swedish texts dated 1500--1899, we test performance across ciphertext lengths, centuries, variable-length codes, and simulated transcription errors. Models are trained only on aligned ciphertext--plaintext pairs, without external language models, frequency statistics, or key-search heuristics. Results show near-perfect character-level decryption accuracy across both languages and all periods, including short and noisy ciphertexts. The model also fails predictably on ciphertexts outside the shared pool, indicating that it functions as a practical tool for decipherment and key-space verification when key reuse is suspected.

6.5CVJul 21, 2021Code

Few Shots Are All You Need: A Progressive Few Shot Learning Approach for Low Resource Handwritten Text Recognition

Mohamed Ali Souibgui, Alicia Fornés, Yousri Kessentini et al.

Handwritten text recognition in low resource scenarios, such as manuscripts with rare alphabets, is a challenging problem. The main difficulty comes from the very few annotated data and the limited linguistic information (e.g. dictionaries and language models). Thus, we propose a few-shot learning-based handwriting recognition approach that significantly reduces the human labor annotation process, requiring only few images of each alphabet symbol. The method consists in detecting all the symbols of a given alphabet in a textline image and decoding the obtained similarity scores to the final sequence of transcribed symbols. Our model is first pretrained on synthetic line images generated from any alphabet, even though different from the target domain. A second training step is then applied to diminish the gap between the source and target data. Since this retraining would require annotation of thousands of handwritten symbols together with their bounding boxes, we propose to avoid such human effort through an unsupervised progressive learning approach that automatically assigns pseudo-labels to the non-annotated data. The evaluation on different manuscript datasets show that our model can lead to competitive results with a significant reduction in human effort. The code will be publicly available in this repository: \url{https://github.com/dali92002/HTRbyMatching}

CVJun 26

Joint Transcription and Decryption of Images of Encrypted Handwritten Documents: A Comparison with the Traditional Pipeline

Marino Oliveros-Blanco, Lei Kang, Alicia Fornés et al.

Historical encrypted manuscripts present a challenging problem at the intersection of cryptology, linguistics, paleography, and computer vision. Current automatic decipherment approaches usually rely on a two-stage pipeline: transcription of cipher symbols from manuscript images, followed by decryption into plaintext. However, this design is sensitive to transcription errors, which propagate to the final output. We present Direct Image Decryption, an end-to-end approach that directly maps encrypted manuscript images to plaintext, bypassing the intermediate transcription stage. Using the Copiale cipher as a case study, we build a synthetic data generation pipeline to create large-scale cipher-like training data and compare the traditional pipeline with the proposed joint architecture. Results show that joint image-to-plaintext modeling is a promising alternative to traditional transcription-based pipelines.