Ali Hassani

h-index8

4papers

681citations

Novelty53%

AI Score41

Ranked #67,423 of 194,257 authors (top 35%)#23,003 in CV (top 39%)

4 Papers

19.7CVApr 23, 2025Code

Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

Ali Hassani, Fengzhe Zhou, Aditya Kane et al. · gatech

Many sparse attention mechanisms such as Neighborhood Attention have typically failed to consistently deliver speedup over the self attention baseline. This is largely due to the level of complexity in attention infrastructure, and the rapid evolution of AI hardware architecture. At the same time, many state-of-the-art foundational models, particularly in computer vision, are heavily bound by attention, and need reliable sparsity to escape the O(n^2) complexity. In this paper, we study a class of promising sparse attention mechanisms that focus on locality, and aim to develop a better analytical model of their performance improvements. We first introduce Generalized Neighborhood Attention (GNA), which can describe sliding window, strided sliding window, and blocked attention. We then consider possible design choices in implementing these approaches, and create a simulator that can provide much more realistic speedup upper bounds for any given setting. Finally, we implement GNA on top of a state-of-the-art fused multi-headed attention (FMHA) kernel designed for the NVIDIA Blackwell architecture in CUTLASS. Our implementation can fully realize the maximum speedup theoretically possible in many perfectly block-sparse cases, and achieves an effective utilization of 1.3 petaFLOPs/second in FP16. In addition, we plug various GNA configurations into off-the-shelf generative models, such as Cosmos-7B, HunyuanVideo, and FLUX, and show that it can deliver 28% to 46% end-to-end speedup on B200 without any fine-tuning. We will open source our simulator and Blackwell kernels directly through the NATTEN project.

17.5CVSep 9, 2021Code

ConvMLP: Hierarchical Convolutional MLPs for Vision

Jiachen Li, Ali Hassani, Steven Walton et al.

MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods. However, most adopt spatial MLPs which take fixed dimension inputs, therefore making it difficult to apply them to downstream tasks, such as object detection and semantic segmentation. Moreover, single-stage designs further limit performance in other computer vision tasks and fully connected layers bear heavy computation. To tackle these problems, we propose ConvMLP: a hierarchical Convolutional MLP for visual recognition, which is a light-weight, stage-wise, co-design of convolution layers, and MLPs. In particular, ConvMLP-S achieves 76.8% top-1 accuracy on ImageNet-1k with 9M parameters and 2.4G MACs (15% and 19% of MLP-Mixer-B/16, respectively). Experiments on object detection and semantic segmentation further show that visual representation learned by ConvMLP can be seamlessly transferred and achieve competitive results with fewer parameters. Our code and pre-trained models are publicly available at https://github.com/SHI-Labs/Convolutional-MLPs.

36.9CVApr 12, 2021Code

Escaping the Big Data Paradigm with Compact Transformers

Ali Hassani, Steven Walton, Nikhil Shah et al.

With the rise of Transformers as the standard for language processing, and their advancements in computer vision, there has been a corresponding growth in parameter size and amounts of training data. Many have come to believe that because of this, transformers are not suitable for small sets of data. This trend leads to concerns such as: limited availability of data in certain scientific domains and the exclusion of those with limited resource from research in the field. In this paper, we aim to present an approach for small-scale learning by introducing Compact Transformers. We show for the first time that with the right size, convolutional tokenization, transformers can avoid overfitting and outperform state-of-the-art CNNs on small datasets. Our models are flexible in terms of model size, and can have as little as 0.28M parameters while achieving competitive results. Our best model can reach 98% accuracy when training from scratch on CIFAR-10 with only 3.7M parameters, which is a significant improvement in data-efficiency over previous Transformer based models being over 10x smaller than other transformers and is 15% the size of ResNet50 while achieving similar performance. CCT also outperforms many modern CNN based approaches, and even some recent NAS-based approaches. Additionally, we obtain a new SOTA result on Flowers-102 with 99.76% top-1 accuracy, and improve upon the existing baseline on ImageNet (82.71% accuracy with 29% as many parameters as ViT), as well as NLP tasks. Our simple and compact design for transformers makes them more feasible to study for those with limited computing resources and/or dealing with small datasets, while extending existing research efforts in data efficient transformers. Our code and pre-trained models are publicly available at https://github.com/SHI-Labs/Compact-Transformers.

3.0ROOct 1, 2021

Dynamic Models of Spherical Parallel Robots for Model-Based Control Schemes

Ali Hassani, Abbas Bataleblu, S. A. Khalilpour et al.

In this paper, derivation of different forms of dynamic formulation of spherical parallel robots (SPRs) is investigated. These formulations include the explicit dynamic forms, linear regressor, and Slotine-Li (SL) regressor, which are required for the design and implementation of the vast majority of model-based controllers and dynamic parameters identification schemes. To this end, the implicit dynamic of SPRs is first formulated using the principle of virtual work in task-space, and then by using an extension, their explicit dynamic formulation is derived. The dynamic equation is then analytically reformulated into linear and S-L regression form with respect to the inertial parameters, and by using the Gauss-Jordan procedure, it is reduced to a unique and closed-form structure. Finally, to illustrate the effectiveness of the proposed method, two different SPRs, namely, the ARAS-Diamond, and the 3-RRR, are examined as the case studies. The obtained results are verified by using the MSC-ADAMS software, and are shared to interested audience for public access.