Stochastic Neighbor Embedding of Multimodal Relational Data for Image-Text Simultaneous Visualization
This addresses the need for simultaneous visualization of image-text data in applications like social media analysis, but it is incremental as it builds directly on t-SNE.
The paper tackled the problem of visualizing multimodal relational data, such as images and text tags, by extending t-SNE to propose MR-SNE, which jointly embeds augmented relations across and within domains into a low-dimensional space, showing promising performance on Flickr and Animal with Attributes 2 datasets.
Multimodal relational data analysis has become of increasing importance in recent years, for exploring across different domains of data, such as images and their text tags obtained from social networking services (e.g., Flickr). A variety of data analysis methods have been developed for visualization; to give an example, t-Stochastic Neighbor Embedding (t-SNE) computes low-dimensional feature vectors so that their similarities keep those of the observed data vectors. However, t-SNE is designed only for a single domain of data but not for multimodal data; this paper aims at visualizing multimodal relational data consisting of data vectors in multiple domains with relations across these vectors. By extending t-SNE, we herein propose Multimodal Relational Stochastic Neighbor Embedding (MR-SNE), that (1) first computes augmented relations, where we observe the relations across domains and compute those within each of domains via the observed data vectors, and (2) jointly embeds the augmented relations to a low-dimensional space. Through visualization of Flickr and Animal with Attributes 2 datasets, proposed MR-SNE is compared with other graph embedding-based approaches; MR-SNE demonstrates the promising performance.