LGSep 6, 2025
OptiProxy-NAS: Optimization Proxy based End-to-End Neural Architecture SearchBo Lyu, Yu Cui, Tuo Shi et al.
Neural architecture search (NAS) is a hard computationally expensive optimization problem with a discrete, vast, and spiky search space. One of the key research efforts dedicated to this space focuses on accelerating NAS via certain proxy evaluations of neural architectures. Different from the prevalent predictor-based methods using surrogate models and differentiable architecture search via supernetworks, we propose an optimization proxy to streamline the NAS as an end-to-end optimization framework, named OptiProxy-NAS. In particular, using a proxy representation, the NAS space is reformulated to be continuous, differentiable, and smooth. Thereby, any differentiable optimization method can be applied to the gradient-based search of the relaxed architecture parameters. Our comprehensive experiments on $12$ NAS tasks of $4$ search spaces across three different domains including computer vision, natural language processing, and resource-constrained NAS fully demonstrate the superior search results and efficiency. Further experiments on low-fidelity scenarios verify the flexibility.
LGNov 15, 2021
AutoGMap: Learning to Map Large-scale Sparse Graphs on Memristive CrossbarsBo Lyu, Shengbo Wang, Shiping Wen et al.
The sparse representation of graphs has shown great potential for accelerating the computation of graph applications (e.g., Social Networks, Knowledge Graphs) on traditional computing architectures (CPU, GPU, or TPU). But the exploration of large-scale sparse graph computing on processing-in-memory (PIM) platforms (typically with memristive crossbars) is still in its infancy. To implement the computation or storage of large-scale or batch graphs on memristive crossbars, a natural assumption is that a large-scale crossbar is demanded, but with low utilization. Some recent works question this assumption, to avoid the waste of storage and computational resource, the fixed-size or progressively scheduled ''block partition'' schemes are proposed. However, these methods are coarse-grained or static, and are not effectively sparsity-aware. This work proposes the dynamic sparsity-aware mapping scheme generating method that models the problem with a sequential decision-making model, and optimizes it by reinforcement learning (RL) algorithm (REINFORCE). Our generating model (LSTM, combined with the dynamic-fill scheme) generates remarkable mapping performance on a small-scale graph/matrix data (complete mapping costs 43% area of the original matrix) and two large-scale matrix data (costing 22.5% area on qh882 and 17.1% area on qh1484). Our method may be extended to sparse graph computing on other PIM architectures, not limited to the memristive device-based platforms.
LGNov 6, 2021
TND-NAS: Towards Non-differentiable Objectives in Progressive Differentiable NAS FrameworkBo Lyu, Shiping Wen
Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS) for its high efficiency compared with the early NAS methods. Recent differentiable NAS also aims at further improving the search performance and reducing the GPU-memory consumption. However, these methods are no longer naturally capable of tackling the non-differentiable objectives, e.g., energy, resource-constrained efficiency, and other metrics, let alone the multi-objective search demands. Researches in the multi-objective NAS field target this but requires vast computational resources cause of the sole optimization of each candidate architecture. In light of this discrepancy, we propose the TND-NAS, which is with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS. Under the differentiable NAS framework, with the continuous relaxation of the search space, TND-NAS has the architecture parameters been optimized in discrete space, while resorting to the progressive search space shrinking by architecture parameters. Our representative experiment takes two objectives (Parameters, Accuracy) as an example, we achieve a series of high-performance compact architectures on CIFAR10 (1.09M/3.3%, 2.4M/2.95%, 9.57M/2.54%) and CIFAR100 (2.46M/18.3%, 5.46/16.73%, 12.88/15.20%) datasets. Favorably, compared with other multi-objective NAS methods, TND-NAS is less time-consuming (1.3 GPU-days on NVIDIA 1080Ti, 1/6 of that in NSGA-Net), and can be conveniently adapted to real-world NAS scenarios (resource-constrained, platform-specialized).