Shuhei Nitta

LG
3papers
29citations
Novelty63%
AI Score28

3 Papers

LGNov 15, 2023
Scalable Federated Learning for Clients with Different Input Image Sizes and Numbers of Output Categories

Shuhei Nitta, Taiji Suzuki, Albert Rodríguez Mulet et al.

Federated learning is a privacy-preserving training method which consists of training from a plurality of clients but without sharing their confidential data. However, previous work on federated learning do not explore suitable neural network architectures for clients with different input images sizes and different numbers of output categories. In this paper, we propose an effective federated learning method named ScalableFL, where the depths and widths of the local models for each client are adjusted according to the clients' input image size and the numbers of output categories. In addition, we provide a new bound for the generalization gap of federated learning. In particular, this bound helps to explain the effectiveness of our scalable neural network approach. We demonstrate the effectiveness of ScalableFL in several heterogeneous client settings for both image classification and object detection tasks.

LGOct 29, 2019
Decomposable-Net: Scalable Low-Rank Compression for Neural Networks

Atsushi Yaguchi, Taiji Suzuki, Shuhei Nitta et al.

Compressing DNNs is important for the real-world applications operating on resource-constrained devices. However, we typically observe drastic performance deterioration when changing model size after training is completed. Therefore, retraining is required to resume the performance of the compressed models suitable for different devices. In this paper, we propose Decomposable-Net (the network decomposable in any size), which allows flexible changes to model size without retraining. We decompose weight matrices in the DNNs via singular value decomposition and adjust ranks according to the target model size. Unlike the existing low-rank compression methods that specialize the model to a fixed size, we propose a novel backpropagation scheme that jointly minimizes losses for both of full- and low-rank networks. This enables not only to maintain the performance of a full-rank network {\it without retraining} but also to improve low-rank networks in multiple sizes. Additionally, we introduce a simple criterion for rank selection that effectively suppresses approximation error. In experiments on the ImageNet classification task, Decomposable-Net yields superior accuracy in a wide range of model sizes. In particular, Decomposable-Net achieves the top-1 accuracy of $73.2\%$ with $0.27\times$MACs with ResNet-50, compared to Tucker decomposition ($67.4\% / 0.30\times$), Trained Rank Pruning ($70.6\% / 0.28\times$), and universally slimmable networks ($71.4\% / 0.26\times$).

LGDec 19, 2018
Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks

Atsushi Yaguchi, Taiji Suzuki, Wataru Asano et al.

In recent years, deep neural networks (DNNs) have been applied to various machine leaning tasks, including image recognition, speech recognition, and machine translation. However, large DNN models are needed to achieve state-of-the-art performance, exceeding the capabilities of edge devices. Model reduction is thus needed for practical use. In this paper, we point out that deep learning automatically induces group sparsity of weights, in which all weights connected to an output channel (node) are zero, when training DNNs under the following three conditions: (1) rectified-linear-unit (ReLU) activations, (2) an $L_2$-regularized objective function, and (3) the Adam optimizer. Next, we analyze this behavior both theoretically and experimentally, and propose a simple model reduction method: eliminate the zero weights after training the DNN. In experiments on MNIST and CIFAR-10 datasets, we demonstrate the sparsity with various training setups. Finally, we show that our method can efficiently reduce the model size and performs well relative to methods that use a sparsity-inducing regularizer.