Brian Vaughan

2papers

2 Papers

5.8NEMay 9
Drain-Vortex Optimization: A Population-Based Metaheuristic Inspired by Multi-Drain Free-Vortex Flow

Mohsen Omidi, Brian Vaughan

This paper proposes Drain-Vortex Optimization (DVO), a population-based metaheuristic for continuous optimization. DVO models each candidate solution as a particle moving in a multi-drain vortex field. Its update rule decomposes motion into radial attraction toward selected drain centres and tangential rotation governed by a regularized free-vortex law. A three-phase mechanism switches between far-field exploration, spiral inward motion, and localized core exploitation according to the normalized distance to the assigned drain. The method also uses adaptive spiral exploitation, population-level vortex basin assignment, and optional stochastic basin switching to support structured diversity. DVO is evaluated against PSO, GWO, WOA, SCA, AOA, EO, and SVOA using a calibration--validation protocol. CEC 2022 is used only to select the final DVO configuration, while CEC 2017, classical functions, and five constrained engineering design problems are used for out-of-sample validation. On CEC 2017, DVO achieves the best mean $\log_{10}$ error on 34 of 58 cases and the best Friedman average rank (1.67), and is significantly better than every baseline under Holm-corrected Wilcoxon tests. On CEC 2022, DVO obtains the best Friedman rank (2.13) and is significantly better than five of the seven baselines; the differences against PSO and SVOA are not significant. DVO is less competitive on simple scalable classical functions and on small constrained engineering designs, which clarifies its operating regime. The algorithm is implemented in a vectorized GPU form that executes independent runs in parallel.

DCJun 28, 2020
PyTorch Distributed: Experiences on Accelerating Data Parallel Training

Shen Li, Yanli Zhao, Rohan Varma et al.

This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module. PyTorch is a widely-adopted scientific computing package used in deep learning research and applications. Recent advances in deep learning argue for the value of large datasets and large models, which necessitates the ability to scale out model training to more computational resources. Data parallelism has emerged as a popular solution for distributed training thanks to its straightforward principle and broad applicability. In general, the technique of distributed data parallelism replicates the model on every computational resource to generate gradients independently and then communicates those gradients at each iteration to keep model replicas consistent. Despite the conceptual simplicity of the technique, the subtle dependencies between computation and communication make it non-trivial to optimize the distributed training efficiency. As of v1.5, PyTorch natively provides several techniques to accelerate distributed data parallel, including bucketing gradients, overlapping computation with communication, and skipping gradient synchronization. Evaluations show that, when configured appropriately, the PyTorch distributed data parallel module attains near-linear scalability using 256 GPUs.