Tadashi Wadayama

h-index17

11papers

149citations

Novelty49%

AI Score43

Ranked #56,129 of 194,257 authors (top 29%)#146 in IT (top 19%)

11 Papers

1.8LGDec 23, 2022

Deep Unfolding-based Weighted Averaging for Federated Learning in Heterogeneous Environments

Ayano Nakai-Kasai, Tadashi Wadayama

Federated learning is a collaborative model training method that iterates model updates by multiple clients and aggregation of the updates by a central server. Device and statistical heterogeneity of participating clients cause significant performance degradation so that an appropriate aggregation weight should be assigned to each client in the aggregation phase of the server. To adjust the aggregation weights, this paper employs deep unfolding, which is known as the parameter tuning method that leverages both learning capability using training data like deep learning and domain knowledge. This enables us to directly incorporate the heterogeneity of the environment of interest into the tuning of the aggregation weights. The proposed approach can be combined with various federated learning algorithms. The results of numerical experiments indicate that a higher test accuracy for unknown class-balanced data can be obtained with the proposed method than that with conventional heuristic weighting methods. The proposed method can handle large-scale learning models with the aid of pretrained models such that it can perform practical real-world tasks. Convergence rate of federated learning algorithms with the proposed method is also provided in this paper.

5.1LGMay 1

Federated Learning with Hypergradient-based Online Update of Aggregation Weights

Ayano Nakai-Kasai, Tadashi Wadayama

Federated learning using mobile and Internet of Things devices requires not only the ability to handle heterogeneity of clients' data distributions but also high adaptability to varying communication environments. We propose FedHAW (Federated Learning with Hypergradient-based update of Aggregation Weights) that implements online updates of aggregation weights. FedHAW updates the aggregation weights by using hypergradient, the gradient of the objective function with respect to the weights, which can be calculated with low computational overhead. Simulation results show that the proposed method possesses high generalization performance in heterogeneous environments and high robustness to communication errors.

4.1LGMay 22, 2025Code

Multi-Output Gaussian Processes for Graph-Structured Data

Ayano Nakai-Kasai, Tadashi Wadayama

Graph-structured data is a type of data to be obtained associated with a graph structure where vertices and edges describe some kind of data correlation. This paper proposes a regression method on graph-structured data, which is based on multi-output Gaussian processes (MOGP), to capture both the correlation between vertices and the correlation between associated data. The proposed formulation is built on the definition of MOGP. This allows it to be applied to a wide range of data configurations and scenarios. Moreover, it has high expressive capability due to its flexibility in kernel design. It includes existing methods of Gaussian processes for graph-structured data as special cases and is possible to remove restrictions on data configurations, model selection, and inference scenarios in the existing methods. The performance of extensions achievable by the proposed formulation is evaluated through computer experiments with synthetic and real data.

2.3LGOct 26, 2020Code

Convergence Acceleration via Chebyshev Step: Plausible Interpretation of Deep-Unfolded Gradient Descent

Satoshi Takabe, Tadashi Wadayama

Deep unfolding is a promising deep-learning technique, whose network architecture is based on expanding the recursive structure of existing iterative algorithms. Although convergence acceleration is a remarkable advantage of deep unfolding, its theoretical aspects have not been revealed yet. The first half of this study details the theoretical analysis of the convergence acceleration in deep-unfolded gradient descent (DUGD) whose trainable parameters are step sizes. We propose a plausible interpretation of the learned step-size parameters in DUGD by introducing the principle of Chebyshev steps derived from Chebyshev polynomials. The use of Chebyshev steps in gradient descent (GD) enables us to bound the spectral radius of a matrix governing the convergence speed of GD, leading to a tight upper bound on the convergence rate. The convergence rate of GD using Chebyshev steps is shown to be asymptotically optimal, although it has no momentum terms. We also show that Chebyshev steps numerically explain the learned step-size parameters in DUGD well. In the second half of the study, %we apply the theory of Chebyshev steps and Chebyshev-periodical successive over-relaxation (Chebyshev-PSOR) is proposed for accelerating linear/nonlinear fixed-point iterations. Theoretical analysis and numerical experiments indicate that Chebyshev-PSOR exhibits significantly faster convergence for various examples such as Jacobi method and proximal gradient methods.

3.3ITApr 20, 2020

Deep Unfolded Multicast Beamforming

Satoshi Takabe, Tadashi Wadayama

Multicast beamforming is a promising technique for multicast communication. Providing an efficient and powerful beamforming design algorithm is a crucial issue because multicast beamforming problems such as a max-min-fair problem are NP-hard in general. Recently, deep learning-based approaches have been proposed for beamforming design. Although these approaches using deep neural networks exhibit reasonable performance gain compared with conventional optimization-based algorithms, their scalability is an emerging problem for large systems in which beamforming design becomes a more demanding task. In this paper, we propose a novel deep unfolded trainable beamforming design with high scalability and efficiency. The algorithm is designed by expanding the recursive structure of an existing algorithm based on projections onto convex sets and embedding a constant number of trainable parameters to the expanded network, which leads to a scalable and stable training process. Numerical results show that the proposed algorithm can accelerate its convergence speed by using unsupervised learning, which is a challenging training process for deep unfolding.

7.2LGJan 15, 2020

Theoretical Interpretation of Learned Step Size in Deep-Unfolded Gradient Descent

Satoshi Takabe, Tadashi Wadayama

Deep unfolding is a promising deep-learning technique in which an iterative algorithm is unrolled to a deep network architecture with trainable parameters. In the case of gradient descent algorithms, as a result of the training process, one often observes the acceleration of the convergence speed with learned non-constant step size parameters whose behavior is not intuitive nor interpretable from conventional theory. In this paper, we provide a theoretical interpretation of the learned step size of deep-unfolded gradient descent (DUGD). We first prove that the training process of DUGD reduces not only the mean squared error loss but also the spectral radius related to the convergence rate. Next, we show that minimizing the upper bound of the spectral radius naturally leads to the Chebyshev step which is a sequence of the step size based on Chebyshev polynomials. The numerical experiments confirm that the Chebyshev steps qualitatively reproduce the learned step size parameters in DUGD, which provides a plausible interpretation of the learned parameters. Additionally, we show that the Chebyshev steps achieve the lower bound of the convergence rate for the first-order method in a specific limit without learning parameters or momentum terms.

2.3ITOct 23, 2019

Trainable Projected Gradient Detector for Sparsely Spread Code Division Multiple Access

Satoshi Takabe, Yuki Yamauchi, Tadashi Wadayama

Sparsely spread code division multiple access (SCDMA) is a promising non-orthogonal multiple access technique for future wireless communications. In this paper, we propose a novel trainable multiuser detector called sparse trainable projected gradient (STPG) detector, which is based on the notion of deep unfolding. In the STPG detector, trainable parameters are embedded to a projected gradient descent algorithm, which can be trained by standard deep learning techniques such as back propagation and stochastic gradient descent. Advantages of the detector are its low computational cost and small number of trainable parameters, which enables us to treat massive SCDMA systems. In particular, its computational cost is smaller than a conventional belief propagation (BP) detector while the STPG detector exhibits nearly same detection performance with a BP detector. We also propose a scalable joint learning of signature sequences and the STPG detector for signature design. Numerical results show that the joint learning improves multiuser detection performance particular in the low SNR regime.

5.1ITApr 16, 2019

Complex Trainable ISTA for Linear and Nonlinear Inverse Problems

Satoshi Takabe, Tadashi Wadayama, Yonina C. Eldar

Complex-field signal recovery problems from noisy linear/nonlinear measurements appear in many areas of signal processing and wireless communications. In this paper, we propose a trainable iterative signal recovery algorithm named complex-field TISTA (C-TISTA) which treats complex-field nonlinear inverse problems. C-TISTA is based on the concept of deep unfolding and consists of a gradient descent step with the Wirtinger derivatives followed by a shrinkage step with a trainable complex-valued shrinkage function. Importantly, it contains a small number of trainable parameters so that its training process can be executed efficiently. Numerical results indicate that C-TISTA shows remarkable signal recovery performance compared with existing algorithms.

11.3ITDec 25, 2018

Trainable Projected Gradient Detector for Massive Overloaded MIMO Channels: Data-driven Tuning Approach

Satoshi Takabe, Masayuki Imanishi, Tadashi Wadayama et al.

This paper presents a deep learning-aided iterative detection algorithm for massive overloaded multiple-input multiple-output (MIMO) systems where the number of transmit antennas $n$ is larger than that of receive antennas $m$. Since the proposed algorithm is based on the projected gradient descent method with trainable parameters, it is named the trainable projected gradient-detector (TPG-detector). The trainable internal parameters, such as the step-size parameter, can be optimized with standard deep learning techniques, i.e., the back propagation and stochastic gradient descent algorithms. This approach is referred to as data-driven tuning, and ensures fast convergence during parameter estimation in the proposed scheme. The TPG-detector mainly consists of matrix-vector product operations whose computational cost is proportional to $m n$ for each iteration. In addition, the number of trainable parameters in the TPG-detector is independent of the number of antennas. These features of the TPG-detector result in a fast and stable training process and reasonable scalability for large systems. Numerical simulations show that the proposed detector achieves a comparable detection performance to those of existing algorithms for massive overloaded MIMO channels, e.g., the state-of-the-art IW-SOAV detector, with a lower computation cost.

10.8ITJun 28, 2018

Deep Learning-Aided Projected Gradient Detector for Massive Overloaded MIMO Channels

Satoshi Takabe, Masayuki Imanishi, Tadashi Wadayama et al.

The paper presents a deep learning-aided iterative detection algorithm for massive overloaded MIMO systems. Since the proposed algorithm is based on the projected gradient descent method with trainable parameters, it is named as trainable projected descent-detector (TPG-detector). The trainable internal parameters can be optimized with standard deep learning techniques such as back propagation and stochastic gradient descent algorithms. This approach referred to as data-driven tuning brings notable advantages of the proposed scheme such as fast convergence. The numerical experiments show that TPG-detector achieves comparable detection performance to those of the known algorithms for massive overloaded MIMO channels with lower computation cost.

1.2ITOct 29, 2016

Sparse Signal Recovery for Binary Compressed Sensing by Majority Voting Neural Networks

Daisuke Ito, Tadashi Wadayama

In this paper, we propose majority voting neural networks for sparse signal recovery in binary compressed sensing. The majority voting neural network is composed of several independently trained feedforward neural networks employing the sigmoid function as an activation function. Our empirical study shows that a choice of a loss function used in training processes for the network is of prime importance. We found a loss function suitable for sparse signal recovery, which includes a cross entropy-like term and an $L_1$ regularized term. From the experimental results, we observed that the majority voting neural network achieves excellent recovery performance, which is approaching the optimal performance as the number of component nets grows. The simple architecture of the majority voting neural networks would be beneficial for both software and hardware implementations.