Md Faisal Kabir

1.5CVAug 25, 2023

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Jonathan Cui, David A. Araujo, Suman Saha et al.

Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial interactions. Further, their token mixers only model 1- or 2-axis correlations, avoiding 3-axis spatial-channel mixing due to its computational demands. We therefore propose CS-Mixer, a hierarchical Vision MLP that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation. The proposed methodology achieves competitive results on popular image recognition benchmarks without incurring substantially more compute. Our largest model, CS-Mixer-L, reaches 83.2% top-1 accuracy on ImageNet-1k with 13.7 GFLOPs and 94 M parameters.

3.6CVSep 19, 2025

Explainable Gait Abnormality Detection Using Dual-Dataset CNN-LSTM Models

Parth Agarwal, Sangaa Chatterjee, Md Faisal Kabir et al.

Gait is a key indicator in diagnosing movement disorders, but most models lack interpretability and rely on single datasets. We propose a dual-branch CNN-LSTM framework a 1D branch on joint-based features from GAVD and a 3D branch on silhouettes from OU-MVLP. Interpretability is provided by SHAP (temporal attributions) and Grad-CAM (spatial localization).On held-out sets, the system achieves 98.6% accuracy with strong recall and F1. This approach advances explainable gait analysis across both clinical and biometric domains.

Md Faisal Kabir

2 Papers