AISep 21, 2023
Constraints First: A New MDD-based Model to Generate Sentences Under ConstraintsAlexandre Bonlarron, Aurélie Calabrèse, Pierre Kornprobst et al.
This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.
31.5HCApr 29
Quantifying the Cost of Manual Navigation: A Comparison of Gesture-Based Magnification versus Direct Access Reading in Digital Layout-based DocumentsSebastián Gallardo, Hui-Yin Wu, Dorian Mazauric et al.
Understanding how diverse audiences engage with structured media is critical to ensure a consistent quality of experience. In this context, we quantify the behavioral and performance cost of manual navigation (e.g., pinch and zoom) versus direct structural access in layout-based digital documents. We specifically investigate newspaper reading when visual access to structural cues (headlines as entry points) is constrained. Participants completed two tasks-reading all headlines aloud and locating target articles-under two conditions: (1) original edition with gesture-based magnification (pan and zoom), which is the industry standard for digital documents, and (2) large-print edition supporting direct-access reading. We collected performance measures (success ratio and completion time), behavioral integrity through reading path analysis, alongside perceived workload and preferences (NASA-TLX). Results from linear mixed-effects models show that the large-print condition yielded not only better performance than gesture-based magnification (18% improvement in reading speed, 30% improvement in speed to locate a target), but more importantly, restored the natural reading strategy that gesture-based magnification interaction disrupts. Readers also reported lower workload and higher preference. These findings highlight the importance of developing automated methods for generating large-print editions, where layout adaptation complements font scaling to support accessibility and quality of experience.
CVFeb 10, 2012
Streaming an image through the eye: The retina seen as a dithered scalable image coderKhaled Masmoudi, Marc Antonini, Pierre Kornprobst
We propose the design of an original scalable image coder/decoder that is inspired from the mammalians retina. Our coder accounts for the time-dependent and also nondeterministic behavior of the actual retina. The present work brings two main contributions: As a first step, (i) we design a deterministic image coder mimicking most of the retinal processing stages and then (ii) we introduce a retinal noise in the coding process, that we model here as a dither signal, to gain interesting perceptual features. Regarding our first contribution, our main source of inspiration will be the biologically plausible model of the retina called Virtual Retina. The main novelty of this coder is to show that the time-dependent behavior of the retina cells could ensure, in an implicit way, scalability and bit allocation. Regarding our second contribution, we reconsider the inner layers of the retina. We emit a possible interpretation for the non-determinism observed by neurophysiologists in their output. For this sake, we model the retinal noise that occurs in these layers by a dither signal. The dithering process that we propose adds several interesting features to our image coder. The dither noise whitens the reconstruction error and decorrelates it from the input stimuli. Furthermore, integrating the dither noise in our coder allows a faster recognition of the fine details of the image during the decoding process. Our present paper goal is twofold. First, we aim at mimicking as closely as possible the retina for the design of a novel image coder while keeping encouraging performances. Second, we bring a new insight concerning the non-deterministic behavior of the retina.