Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications
This addresses a critical reliability issue for users of sequence-to-sequence and image-to-sequence models in applications like machine translation and image captioning, offering incremental improvements in verification and robustness testing.
The paper tackles the problem of neural models producing outputs of undesirable length, which can cause computational inefficiency and downstream failures, by proposing an output-size modulation problem and developing methods to evaluate robustness and formally verify output length constraints, achieving outputs 50 times longer than inputs in experiments.
Models such as Sequence-to-Sequence and Image-to-Sequence are widely used in real world applications. While the ability of these neural architectures to produce variable-length outputs makes them extremely effective for problems like Machine Translation and Image Captioning, it also leaves them vulnerable to failures of the form where the model produces outputs of undesirable length. This behavior can have severe consequences such as usage of increased computation and induce faults in downstream modules that expect outputs of a certain length. Motivated by the need to have a better understanding of the failures of these models, this paper proposes and studies the novel output-size modulation problem and makes two key technical contributions. First, to evaluate model robustness, we develop an easy-to-compute differentiable proxy objective that can be used with gradient-based algorithms to find output-lengthening inputs. Second and more importantly, we develop a verification approach that can formally verify whether a network always produces outputs within a certain length. Experimental results on Machine Translation and Image Captioning show that our output-lengthening approach can produce outputs that are 50 times longer than the input, while our verification approach can, given a model and input domain, prove that the output length is below a certain size.