Toshio Endo

3papers

39citations

Novelty37%

AI Score37

Ranked #115,722 of 201,326 authors (top 57%)#25,630 in LG (top 60%)

3 Papers

DCMay 13

SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication

Chen Zhuang, Lingqi Zhang, Benjamin Brock et al.

Distributed Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental operation in high-performance computing and deep learning applications. The major performance bottleneck in distributed SpMM lies in substantial communication overhead, which limits both performance and scalability. In this paper, we identify two key sources of communication inefficiency in distributed SpMM: redundant data transfer due to sparsity unawareness, and suboptimal utilization of hierarchical network topology. To address these, we propose (1) a fine-grained, sparsity-aware communication strategy that reduces communication overhead by exploiting the sparsity pattern of the sparse matrix, and (2) a hierarchical communication strategy that maps the sparsity-aware strategy onto two-tier GPU network architectures, minimizing redundant data movement across slower inter-node links. We implement these optimizations in a comprehensive distributed SpMM framework, \method{}. Extensive evaluations on real-world datasets show that \method{} demonstrates strong scalability up to 128 GPUs, achieving geometric mean speedups of 221.5$\times$, 56.0$\times$, 23.4$\times$, and 8.8$\times$ in SpMM over four state-of-the-art baselines (CAGNET, SPA, BCL, and CoLa, respectively) at this scale.

LGMar 27, 2022

mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations

Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa et al.

The growing amount of data and advances in data science have created a need for a new kind of cloud platform that provides users with flexibility, strong security, and the ability to couple with supercomputers and edge devices through high-performance networks. We have built such a nation-wide cloud platform, called "mdx" to meet this need. The mdx platform's virtualization service, jointly operated by 9 national universities and 2 national research institutes in Japan, launched in 2021, and more features are in development. Currently mdx is used by researchers in a wide variety of domains, including materials informatics, geo-spatial information science, life science, astronomical science, economics, social science, and computer science. This paper provides an the overview of the mdx platform, details the motivation for its development, reports its current status, and outlines its future plans.

LGJul 11, 2019

Profiling based Out-of-core Hybrid Method for Large Neural Networks

Yuki Ito, Haruki Imai, Tung Le Duc et al.

GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swapping method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movement or increase of computation. In order to reduce the overhead, it is important to consider characteristics of each layer such as sizes and cost for recomputation. Based on this direction, we proposed Profiling based out-of-core Hybrid method (PoocH). PoocH determines target layers of swapping or recomputing based on runtime profiling. We implemented PoocH by extending a deep learning framework, Chainer, and we evaluated its performance. With PoocH, we successfully computed an NN requiring 50 GB memory on a single GPU with 16 GB memory. Compared with in-core cases, performance degradation was 38 \% on x86 machine and 28 \% on POWER9 machine.