Searching for Search Errors in Neural Morphological Inflection
This challenges prior assumptions about neural models' inadequacy for language generation, suggesting poor calibration may be task-specific rather than general.
The paper investigates the issue of neural sequence-to-sequence models producing empty strings as optimal outputs in word-level tasks, finding that in morphological inflection, the empty string is rarely the most probable solution and greedy search often finds the global optimum.
Neural sequence-to-sequence models are currently the predominant choice for language generation tasks. Yet, on word-level tasks, exact inference of these models reveals the empty string is often the global optimum. Prior works have speculated this phenomenon is a result of the inadequacy of neural models for language generation. However, in the case of morphological inflection, we find that the empty string is almost never the most probable solution under the model. Further, greedy search often finds the global optimum. These observations suggest that the poor calibration of many neural models may stem from characteristics of a specific subset of tasks rather than general ill-suitedness of such models for language generation.