Tasuku Soma

7papers

100citations

Novelty42%

AI Score41

Ranked #88,291 of 205,806 authors (top 43%)#19,601 in LG (top 46%)

7 Papers

LGDec 28, 2022

Near-Optimal Algorithms for Group Distributionally Robust Optimization and Beyond

Tasuku Soma, Khashayar Gatmiry, Sharut Gupta et al.

Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also provide a new information-theoretic lower bound that implies our bounds are tight for group DRO. Empirically, too, our algorithms outperform known methods.

NAJun 28, 2016

Finding a low-rank basis in a matrix subspace

Yuji Nakatsukasa, Tasuku Soma, André Uschmajew

For a given matrix subspace, how can we find a basis that consists of low-rank matrices? This is a generalization of the sparse vector problem. It turns out that when the subspace is spanned by rank-1 matrices, the matrices can be obtained by the tensor CP decomposition. For the higher rank case, the situation is not as straightforward. In this work we present an algorithm based on a greedy process applicable to higher rank problems. Our algorithm first estimates the minimum rank by applying soft singular value thresholding to a nuclear norm relaxation, and then computes a matrix with that rank using the method of alternating projections. We provide local convergence results, and compare our algorithm with several alternative approaches. Applications include data compression beyond the classical truncated SVD, computing accurate eigenvectors of a near-multiple eigenvalue, image separation and graph Laplacian eigenproblems.

69.6OCApr 24

Accelerating operator Sinkhorn iteration with overrelaxation

Tasuku Soma, André Uschmajew

We propose accelerated versions of the operator Sinkhorn iteration for operator scaling using successive overrelaxation. We analyze the local convergence rates of these accelerated methods via linearization, which allows us to determine the asymptotically optimal relaxation parameter based on Young's SOR theorem. Using the Hilbert metric on positive definite cones, we also obtain a global convergence result for a geodesic version of overrelaxation in a specific range of relaxation parameters. These techniques generalize corresponding results obtained for matrix scaling by Thibault et al. (Algorithms, 14(5):143, 2021) and Lehmann et al. (Optim. Lett., 16(8):2209--2220, 2022). Numerical experiments demonstrate that the proposed methods outperform the original operator Sinkhorn iteration in certain applications.

45.3OCMar 13

Numerically stable variants of overrelaxation for operator Sinkhorn iteration

Henrik Eisenmann, Tasuku Soma, Xun Tang et al.

We consider accelerated versions of the operator Sinkhorn iteration (OSI) for solving scaling problems for completely positive maps. Based on the interpretation of OSI as alternating fixed point iteration, it has been recently proposed to achieve acceleration by means of nonlinear successive overrelaxation (SOR), e.g.~with respect to geodesics in Hilbert metric. The direct implementation of the proposed SOR algorithms, however, can be numerically unstable for ill-conditioned instances, limiting the achievable accuracy. Here we derive equivalent versions of OSI with SOR where, similar to the original OSI formulation, scalings are applied on the fly in order to take advantage of preconditioning effects. Numerical experiments confirm that this modification allows for numerically stable SOR-acceleration of OSI even in ill-conditioned cases.

LGFeb 14, 2020

Statistical Learning with Conditional Value at Risk

Tasuku Soma, Yuichi Yoshida

We propose a risk-averse statistical learning framework wherein the performance of a learning algorithm is evaluated by the conditional value-at-risk (CVaR) of losses rather than the expected loss. We devise algorithms based on stochastic gradient descent for this framework. While existing studies of CVaR optimization require direct access to the underlying distribution, our algorithms make a weaker assumption that only i.i.d.\ samples are given. For convex and Lipschitz loss functions, we show that our algorithm has $O(1/\sqrt{n})$-convergence to the optimal CVaR, where $n$ is the number of samples. For nonconvex and smooth loss functions, we show a generalization bound on CVaR. By conducting numerical experiments on various machine learning tasks, we demonstrate that our algorithms effectively minimize CVaR compared with other baseline algorithms.

LGSep 7, 2018

Fast greedy algorithms for dictionary selection with generalized sparsity constraints

Kaito Fujii, Tasuku Soma

In dictionary selection, several atoms are selected from finite candidates that successfully approximate given data points in the sparse representation. We propose a novel efficient greedy algorithm for dictionary selection. Not only does our algorithm work much faster than the known methods, but it can also handle more complex sparsity constraints, such as average sparsity. Using numerical experiments, we show that our algorithm outperforms the known methods for dictionary selection, achieving competitive performances with dictionary learning algorithms in a smaller running time.

MLJun 19, 2018

Maximally Invariant Data Perturbation as Explanation

Satoshi Hara, Kouichi Ikeno, Tasuku Soma et al.

While several feature scoring methods are proposed to explain the output of complex machine learning models, most of them lack formal mathematical definitions. In this study, we propose a novel definition of the feature score using the maximally invariant data perturbation, which is inspired from the idea of adversarial example. In adversarial example, one seeks the smallest data perturbation that changes the model's output. In our proposed approach, we consider the opposite: we seek the maximally invariant data perturbation that does not change the model's output. In this way, we can identify important input features as the ones with small allowable data perturbations. To find the maximally invariant data perturbation, we formulate the problem as linear programming. The experiment on the image classification with VGG16 shows that the proposed method could identify relevant parts of the images effectively.