Longer Fixations, More Computation: Gaze-Guided Recurrent Neural Networks
This work addresses the inefficiency of uniform token processing in NLP models, offering an incremental improvement for language tasks.
The paper tackled the problem of making machine learning models process text more like humans by varying computational effort per token, and found that gaze-guided RNNs improved language modeling performance over a baseline, with model-predicted fixations resembling human ones.
Humans read texts at a varying pace, while machine learning models treat each token in the same way in terms of a computational process. Therefore, we ask, does it help to make models act more like humans? In this paper, we convert this intuition into a set of novel models with fixation-guided parallel RNNs or layers and conduct various experiments on language modeling and sentiment analysis tasks to test their effectiveness, thus providing empirical validation for this intuition. Our proposed models achieve good performance on the language modeling task, considerably surpassing the baseline model. In addition, we find that, interestingly, the fixation duration predicted by neural networks bears some resemblance to humans' fixation. Without any explicit guidance, the model makes similar choices to humans. We also investigate the reasons for the differences between them, which explain why "model fixations" are often more suitable than human fixations, when used to guide language models.