CVFeb 3
Symbol-Aware Reasoning with Masked Discrete Diffusion for Handwritten Mathematical Expression RecognitionTakaya Kawakatsu, Ryo Ishiyama
Handwritten Mathematical Expression Recognition (HMER) requires reasoning over diverse symbols and 2D structural layouts, yet autoregressive models struggle with exposure bias and syntactic inconsistency. We present a discrete diffusion framework that reformulates HMER as iterative symbolic refinement instead of sequential generation. Through multi-step remasking, the proposal progressively refines both symbols and structural relations, removing causal dependencies and improving structural consistency. A symbol-aware tokenization and Random-Masking Mutual Learning further enhance syntactic alignment and robustness to handwriting diversity. On the MathWriting benchmark, the proposal achieves 5.56\% CER and 60.42\% EM, outperforming strong Transformer and commercial baselines. Consistent gains on CROHME 2014--2023 demonstrate that discrete diffusion provides a new paradigm for structure-aware visual recognition beyond generative modeling.
CVJun 29, 2025
Computer-Aided Multi-Stroke Character Simplification by Stroke RemovalRyo Ishiyama, Shinnosuke Matsuo, Seiichi Uchida
Multi-stroke characters in scripts such as Chinese and Japanese can be highly complex, posing significant challenges for both native speakers and, especially, non-native learners. If these characters can be simplified without degrading their legibility, it could reduce learning barriers for non-native speakers, facilitate simpler and legible font designs, and contribute to efficient character-based communication systems. In this paper, we propose a framework to systematically simplify multi-stroke characters by selectively removing strokes while preserving their overall legibility. More specifically, we use a highly accurate character recognition model to assess legibility and remove those strokes that minimally impact it. Experimental results on 1,256 character classes with 5, 10, 15, and 20 strokes reveal several key findings, including the observation that even after removing multiple strokes, many characters remain distinguishable. These findings suggest the potential for more formalized simplification strategies.
LGMay 8, 2024
Test-Time Augmentation for Traveling Salesperson ProblemRyo Ishiyama, Takahiro Shirakawa, Seiichi Uchida et al.
We propose Test-Time Augmentation (TTA) as an effective technique for addressing combinatorial optimization problems, including the Traveling Salesperson Problem. In general, deep learning models possessing the property of invariance, where the output is uniquely determined regardless of the node indices, have been proposed to learn graph structures efficiently. In contrast, we interpret the permutation of node indices, which exchanges the elements of the distance matrix, as a TTA scheme. The results demonstrate that our method is capable of obtaining shorter solutions than the latest models. Furthermore, we show that the probability of finding a solution closer to an exact solution increases depending on the augmentation size.