NC CVJul 27, 2019

Effective and efficient ROI-wise visual encoding using an end-to-end CNN regression model and selective optimization

Kai Qiao, Chi Zhang, Jian Chen, Linyuan Wang, Li Tong, Bin Yan

arXiv:1907.11885v13.311 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving visual encoding efficiency and accuracy for neuroscience researchers, though it is incremental as it builds on prior CNN-based methods.

The authors tackled the problem of predicting brain activity from visual stimuli using fMRI data by proposing an end-to-end CNN regression model with selective optimization, which achieved better predicting accuracy than existing two-step encoding models.

Recently, visual encoding based on functional magnetic resonance imaging (fMRI) have realized many achievements with the rapid development of deep network computation. Visual encoding model is aimed at predicting brain activity in response to presented image stimuli. Currently, visual encoding is accomplished mainly by firstly extracting image features through convolutional neural network (CNN) model pre-trained on computer vision task, and secondly training a linear regression model to map specific layer of CNN features to each voxel, namely voxel-wise encoding. However, the two-step manner model, essentially, is hard to determine which kind of well features are well linearly matched for beforehand unknown fMRI data with little understanding of human visual representation. Analogizing computer vision mostly related human vision, we proposed the end-to-end convolution regression model (ETECRM) in the region of interest (ROI)-wise manner to accomplish effective and efficient visual encoding. The end-to-end manner was introduced to make the model automatically learn better matching features to improve encoding performance. The ROI-wise manner was used to improve the encoding efficiency for many voxels. In addition, we designed the selective optimization including self-adapting weight learning and weighted correlation loss, noise regularization to avoid interfering of ineffective voxels in ROI-wise encoding. Experiment demonstrated that the proposed model obtained better predicting accuracy than the two-step manner of encoding models. Comparative analysis implied that end-to-end manner and large volume of fMRI data may drive the future development of visual encoding.

View on arXiv PDF

Similar