Deep Learning Models for Automated Classification of Dog Emotional States from Facial Expressions
This work addresses the incremental problem of automating emotion recognition in non-verbal animals like dogs for researchers or pet care applications.
The paper tackles the problem of automated classification of dog emotional states from facial expressions, which is underexplored due to data collection challenges, and finds that a self-supervised pretrained ViT (DINO-ViT) outperforms other backbones like ResNet in classifying anticipation and frustration on a controlled dataset.
Similarly to humans, facial expressions in animals are closely linked with emotional states. However, in contrast to the human domain, automated recognition of emotional states from facial expressions in animals is underexplored, mainly due to difficulties in data collection and establishment of ground truth concerning emotional states of non-verbal users. We apply recent deep learning techniques to classify (positive) anticipation and (negative) frustration of dogs on a dataset collected in a controlled experimental setting. We explore the suitability of different backbones (e.g. ResNet, ViT) under different supervisions to this task, and find that features of a self-supervised pretrained ViT (DINO-ViT) are superior to the other alternatives. To the best of our knowledge, this work is the first to address the task of automatic classification of canine emotions on data acquired in a controlled experiment.