Food Image Recognition by Using Convolutional Neural Networks (CNNs)
This work addresses food image recognition for computer vision applications, but it is incremental as it applies standard CNNs and data augmentation to a small-scale dataset.
The study tackled food image recognition by constructing a small dataset of 5822 images across ten categories and using a five-layer CNN, achieving an overall accuracy of 74% compared to 56% with a bag-of-features and SVM model, and improved to over 90% with data augmentation.
Food image recognition is one of the promising applications of visual object recognition in computer vision. In this study, a small-scale dataset consisting of 5822 images of ten categories and a five-layer CNN was constructed to recognize these images. The bag-of-features (BoF) model coupled with support vector machine (SVM) was first evaluated for image classification, resulting in an overall accuracy of 56%; while the CNN model performed much better with an overall accuracy of 74%. Data augmentation techniques based on geometric transformation were applied to increase the size of training images, which achieved a significantly improved accuracy of more than 90% while preventing the overfitting issue that occurred to the CNN based on raw training data. Further improvements can be expected by collecting more images and optimizing the network architecture and hyper-parameters.