CVFeb 11, 2025
A Survey on Mamba Architecture for Vision ApplicationsFady Ibrahim, Guangjun Liu, Guanghui Wang
Transformers have become foundational for visual tasks such as object detection, semantic segmentation, and video understanding, but their quadratic complexity in attention mechanisms presents scalability challenges. To address these limitations, the Mamba architecture utilizes state-space models (SSMs) for linear scalability, efficient processing, and improved contextual awareness. This paper investigates Mamba architecture for visual domain applications and its recent advancements, including Vision Mamba (ViM) and VideoMamba, which introduce bidirectional scanning, selective scanning mechanisms, and spatiotemporal processing to enhance image and video understanding. Architectural innovations like position embeddings, cross-scan modules, and hierarchical designs further optimize the Mamba framework for global and local feature extraction. These advancements position Mamba as a promising architecture in computer vision research and applications.
CVApr 22
Beyond ZOH: Advanced Discretization Strategies for Vision MambaFady Ibrahim, Guangjun Liu, Guanghui Wang
Vision Mamba, as a state space model (SSM), employs a zero-order hold (ZOH) discretization, which assumes that input signals remain constant between sampling instants. This assumption degrades temporal fidelity in dynamic visual environments and constrains the attainable accuracy of modern SSM-based vision models. In this paper, we present a systematic and controlled comparison of six discretization schemes instantiated within the Vision Mamba framework: ZOH, first-order hold (FOH), bilinear/Tustin transform (BIL), polynomial interpolation (POL), higher-order hold (HOH), and the fourth-order Runge-Kutta method (RK4). We evaluate each method on standard visual benchmarks to quantify its influence in image classification, semantic segmentation, and object detection. Our results demonstrate that POL and HOH yield the largest gains in accuracy at the cost of higher training-time computation. In contrast, the BIL provides consistent improvements over ZOH with modest additional overhead, offering the most favorable trade-off between precision and efficiency. These findings elucidate the pivotal role of discretization in SSM-based vision architectures and furnish empirically grounded justification for adopting BIL as the default discretization baseline for state-of-the-art SSM models.
ROJun 13, 2020
Fault Tolerant Free Gait and Footstep Planning for Hexapod Robot Based on Monte-Carlo TreeLiang Ding, Peng Xu, Haibo Gao et al.
Legged robots can pass through complex field environments by selecting gaits and discrete footholds carefully. Traditional methods plan gait and foothold separately and treat them as the single-step optimal process. However, such processing causes its poor passability in a sparse foothold environment. This paper novelly proposes a coordinative planning method for hexapod robots that regards the planning of gait and foothold as a sequence optimization problem with the consideration of dealing with the harshness of the environment as leg fault. The Monte Carlo tree search algorithm(MCTS) is used to optimize the entire sequence. Two methods, FastMCTS, and SlidingMCTS are proposed to solve some defeats of the standard MCTS applicating in the field of legged robot planning. The proposed planning algorithm combines the fault-tolerant gait method to improve the passability of the algorithm. Finally, compared with other planning methods, experiments on terrains with different densities of footholds and artificially-designed challenging terrain are carried out to verify our methods. All results show that the proposed method dramatically improves the hexapod robot's ability to pass through sparse footholds environment.