Masahiro Morikura

6papers

639citations

Novelty51%

AI Score26

Ranked #163,160 of 201,326 authors (top 81%)#35,967 in LG (top 85%)

6 Papers

NIApr 1, 2021

Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space

Akihito Taya, Takayuki Nishio, Masahiro Morikura et al.

This paper proposes a fully decentralized federated learning (FL) scheme for Internet of Everything (IoE) devices that are connected via multi-hop networks. Because FL algorithms hardly converge the parameters of machine learning (ML) models, this paper focuses on the convergence of ML models in function spaces. Considering that the representative loss functions of ML tasks e.g, mean squared error (MSE) and Kullback-Leibler (KL) divergence, are convex functionals, algorithms that directly update functions in function spaces could converge to the optimal solution. The key concept of this paper is to tailor a consensus-based optimization algorithm to work in the function space and achieve the global optimum in a distributed manner. This paper first analyzes the convergence of the proposed algorithm in a function space, which is referred to as a meta-algorithm, and shows that the spectral graph theory can be applied to the function space in a manner similar to that of numerical vectors. Then, consensus-based multi-hop federated distillation (CMFD) is developed for a neural network (NN) to implement the meta-algorithm. CMFD leverages knowledge distillation to realize function aggregation among adjacent devices without parameter averaging. An advantage of CMFD is that it works even with different NN models among the distributed learners. Although CMFD does not perfectly reflect the behavior of the meta-algorithm, the discussion of the meta-algorithm's convergence property promotes an intuitive understanding of CMFD, and simulation evaluations show that NN models converge using CMFD for several tasks. The simulation results also show that CMFD achieves higher accuracy than parameter aggregation for weakly connected networks, and CMFD is more stable than parameter aggregation methods.

LGFeb 16, 2021

Zero-Shot Adaptation for mmWave Beam-Tracking on Overhead Messenger Wires through Robust Adversarial Reinforcement Learning

Masao Shinzaki, Yusuke Koda, Koji Yamamoto et al.

Millimeter wave (mmWave) beam-tracking based on machine learning enables the development of accurate tracking policies while obviating the need to periodically solve beam-optimization problems. However, its applicability is still arguable when training-test gaps exist in terms of environmental parameters that affect the node dynamics. From this skeptical point of view, the contribution of this study is twofold. First, by considering an example scenario, we confirm that the training-test gap adversely affects the beam-tracking performance. More specifically, we consider nodes placed on overhead messenger wires, where the node dynamics are affected by several environmental parameters, e.g, the wire mass and tension. Although these are particular scenarios, they yield insight into the validation of the training-test gap problems. Second, we demonstrate the feasibility of \textit{zero-shot adaptation} as a solution, where a learning agent adapts to environmental parameters unseen during training. This is achieved by leveraging a robust adversarial reinforcement learning (RARL) technique, where such training-and-test gaps are regarded as disturbances by adversaries that are jointly trained with a legitimate beam-tracking agent. Numerical evaluations demonstrate that the beam-tracking policy learned via RARL can be applied to a wide range of environmental parameters without severely degrading the received power.

DCAug 14, 2020

Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data

Sohei Itahara, Takayuki Nishio, Yusuke Koda et al.

This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instead of model parameter exchange employed by the typical frameworks. In DS-FL, the communication cost depends only on the output dimensions of the models and does not scale up according to the model size. The exchanged model outputs are used to label each sample of the open dataset, which creates an additionally labeled dataset. Based on the new dataset, local models are further trained, and model performance is enhanced owing to the data augmentation effect. We further highlight that in DS-FL, the heterogeneity of the devices' dataset leads to ambiguous of each data sample and lowing of the training convergence. To prevent this, we propose entropy reduction averaging, where the aggregated model outputs are intentionally sharpened. Moreover, extensive experiments show that DS-FL reduces communication costs up to 99% relative to those of the FL benchmark while achieving similar or higher classification accuracy.

NIApr 21, 2020

Lottery Hypothesis based Unsupervised Pre-training for Model Compression in Federated Learning

Sohei Itahara, Takayuki Nishio, Masahiro Morikura et al.

Federated learning (FL) enables a neural network (NN) to be trained using privacy-sensitive data on mobile devices while retaining all the data on their local storages. However, FL asks the mobile devices to perform heavy communication and computation tasks, i.e., devices are requested to upload and download large-volume NN models and train them. This paper proposes a novel unsupervised pre-training method adapted for FL, which aims to reduce both the communication and computation costs through model compression. Since the communication and computation costs are highly dependent on the volume of NN models, reducing the volume without decreasing model performance can reduce these costs. The proposed pre-training method leverages unlabeled data, which is expected to be obtained from the Internet or data repository much more easily than labeled data. The key idea of the proposed method is to obtain a ``good'' subnetwork from the original NN using the unlabeled data based on the lottery hypothesis. The proposed method trains an original model using a denoising auto encoder with the unlabeled data and then prunes small-magnitude parameters of the original model to generate a small but good subnetwork. The proposed method is evaluated using an image classification task. The results show that the proposed method requires 35\% less traffic and computation time than previous methods when achieving a certain test accuracy.

LGMay 17, 2019

Hybrid-FL for Wireless Networks: Cooperative Learning Mechanism Using Non-IID Data

Naoya Yoshida, Takayuki Nishio, Masahiro Morikura et al.

This paper proposes a cooperative mechanism for mitigating the performance degradation due to non-independent-and-identically-distributed (non-IID) data in collaborative machine learning (ML), namely federated learning (FL), which trains an ML model using the rich data and computational resources of mobile clients without gathering their data to central systems. The data of mobile clients is typically non-IID owing to diversity among mobile clients' interests and usage, and FL with non-IID data could degrade the model performance. Therefore, to mitigate the degradation induced by non-IID data, we assume that a limited number (e.g., less than 1%) of clients allow their data to be uploaded to a server, and we propose a hybrid learning mechanism referred to as Hybrid-FL, wherein the server updates the model using the data gathered from the clients and aggregates the model with the models trained by clients. The Hybrid-FL solves both client- and data-selection problems via heuristic algorithms, which try to select the optimal sets of clients who train models with their own data, clients who upload their data to the server, and data uploaded to the server. The algorithms increase the number of clients participating in FL and make more data gather in the server IID, thereby improving the prediction accuracy of the aggregated model. Evaluations, which consist of network simulations and ML experiments, demonstrate that the proposed scheme achieves a 13.5% higher classification accuracy than those of the previously proposed schemes for the non-IID case.

SPMay 17, 2019

Deep Reinforcement Learning-Based Channel Allocation for Wireless LANs with Graph Convolutional Networks

Kota Nakashima, Shotaro Kamiya, Kazuki Ohtsu et al.

Last year, IEEE 802.11 Extremely High Throughput Study Group (EHT Study Group) was established to initiate discussions on new IEEE 802.11 features. Coordinated control methods of the access points (APs) in the wireless local area networks (WLANs) are discussed in EHT Study Group. The present study proposes a deep reinforcement learning-based channel allocation scheme using graph convolutional networks (GCNs). As a deep reinforcement learning method, we use a well-known method double deep Q-network. In densely deployed WLANs, the number of the available topologies of APs is extremely high, and thus we extract the features of the topological structures based on GCNs. We apply GCNs to a contention graph where APs within their carrier sensing ranges are connected to extract the features of carrier sensing relationships. Additionally, to improve the learning speed especially in an early stage of learning, we employ a game theory-based method to collect the training data independently of the neural network model. The simulation results indicate that the proposed method can appropriately control the channels when compared to extant methods.