Yuichiro Minato

CVMar 21, 2024

Application of Tensorized Neural Networks for Cloud Classification

Alifu Xiafukaiti, Devanshu Garg, Aruto Hosaka et al.

Convolutional neural networks (CNNs) have gained widespread usage across various fields such as weather forecasting, computer vision, autonomous driving, and medical image analysis due to its exceptional ability to extract spatial information, share parameters, and learn local features. However, the practical implementation and commercialization of CNNs in these domains are hindered by challenges related to model sizes, overfitting, and computational time. To address these limitations, our study proposes a groundbreaking approach that involves tensorizing the dense layers in the CNN to reduce model size and computational time. Additionally, we incorporate attention layers into the CNN and train it using Contrastive self-supervised learning to effectively classify cloud information, which is crucial for accurate weather forecasting. We elucidate the key characteristics of tensorized neural network (TNN), including the data compression rate, accuracy, and computational speed. The results indicate how TNN change their properties under the batch size setting.

DIS-NNDec 16, 2021

Explainable Natural Language Processing with Matrix Product States

Jirawat Tangpanitanon, Chanatip Mangkang, Pradeep Bhadola et al.

Despite empirical successes of recurrent neural networks (RNNs) in natural language processing (NLP), theoretical understanding of RNNs is still limited due to intrinsically complex non-linear computations. We systematically analyze RNNs' behaviors in a ubiquitous NLP task, the sentiment analysis of movie reviews, via the mapping between a class of RNNs called recurrent arithmetic circuits (RACs) and a matrix product state (MPS). Using the von-Neumann entanglement entropy (EE) as a proxy for information propagation, we show that single-layer RACs possess a maximum information propagation capacity, reflected by the saturation of the EE. Enlarging the bond dimension beyond the EE saturation threshold does not increase model prediction accuracies, so a minimal model that best estimates the data statistics can be inferred. Although the saturated EE is smaller than the maximum EE allowed by the area law, our minimal model still achieves ~99% training accuracies in realistic sentiment analysis data sets. Thus, low EE is not a warrant against the adoption of single-layer RACs for NLP. Contrary to a common belief that long-range information propagation is the main source of RNNs' successes, we show that single-layer RACs harness high expressiveness from the subtle interplay between the information propagation and the word vector embeddings. Our work sheds light on the phenomenology of learning in RACs, and more generally on the explainability of RNNs for NLP, using tools from many-body quantum physics.

Yuichiro Minato

2 Papers