LGNov 10, 2025Code
A Closer Look at Knowledge Distillation in Spiking Neural Network TrainingXu Liu, Na Xia, Jinxing Zhou et al.
Spiking Neural Networks (SNNs) become popular due to excellent energy efficiency, yet facing challenges for effective model training. Recent works improve this by introducing knowledge distillation (KD) techniques, with the pre-trained artificial neural networks (ANNs) used as teachers and the target SNNs as students. This is commonly accomplished through a straightforward element-wise alignment of intermediate features and prediction logits from ANNs and SNNs, often neglecting the intrinsic differences between their architectures. Specifically, ANN's outputs exhibit a continuous distribution, whereas SNN's outputs are characterized by sparsity and discreteness. To mitigate this issue, we introduce two innovative KD strategies. Firstly, we propose the Saliency-scaled Activation Map Distillation (SAMD), which aligns the spike activation map of the student SNN with the class-aware activation map of the teacher ANN. Rather than performing KD directly on the raw %and distinct features of ANN and SNN, our SAMD directs the student to learn from saliency activation maps that exhibit greater semantic and distribution consistency. Additionally, we propose a Noise-smoothed Logits Distillation (NLD), which utilizes Gaussian noise to smooth the sparse logits of student SNN, facilitating the alignment with continuous logits from teacher ANN. Extensive experiments on multiple datasets demonstrate the effectiveness of our methods. Code is available~\footnote{https://github.com/SinoLeu/CKDSNN.git}.
CVDec 19, 2024Code
Prototypical Calibrating Ambiguous Samples for Micro-Action RecognitionKun Li, Dan Guo, Guoliang Chen et al.
Micro-Action Recognition (MAR) has gained increasing attention due to its crucial role as a form of non-verbal communication in social interactions, with promising potential for applications in human communication and emotion analysis. However, current approaches often overlook the inherent ambiguity in micro-actions, which arises from the wide category range and subtle visual differences between categories. This oversight hampers the accuracy of micro-action recognition. In this paper, we propose a novel Prototypical Calibrating Ambiguous Network (PCAN) to unleash and mitigate the ambiguity of MAR. Firstly, we employ a hierarchical action-tree to identify the ambiguous sample, categorizing them into distinct sets of ambiguous samples of false negatives and false positives, considering both body- and action-level categories. Secondly, we implement an ambiguous contrastive refinement module to calibrate these ambiguous samples by regulating the distance between ambiguous samples and their corresponding prototypes. This calibration process aims to pull false negative (FN) samples closer to their respective prototypes and push false positive (FP) samples apart from their affiliated prototypes. In addition, we propose a new prototypical diversity amplification loss to strengthen the model's capacity by amplifying the differences between different prototypes. Finally, we propose a prototype-guided rectification to rectify prediction by incorporating the representability of prototypes. Extensive experiments conducted on the benchmark dataset demonstrate the superior performance of our method compared to existing approaches. The code is available at https://github.com/kunli-cs/PCAN.
55.6ITMay 21
Finite-Aperture Planar Fluid Antenna ArrayZhentian Zhang, Jingyuan Xu, Kai-Kit Wong et al.
Fluid antenna systems (FASs) are emerging as a reconfigurable-aperture technology that expands physical-layer design beyond fixed, rigid antenna geometries. While the \emph{fading diversity} of FASs -- which exploits spatial channel fluctuations for signal enhancement and interference avoidance -- has been widely studied, the \emph{geometry diversity} created by reconfigurable port placement remains far less understood, particularly for planar architectures under finite-aperture constraints. This paper develops a systematic analytical framework for finite-aperture planar fluid antenna arrays (FAAs). First, we derive a closed-form characterization of the minimum inter-port distance under uniform random placement over a rectangular aperture and show that it follows a Rayleigh law. Its mean scales as $\mathcal{O}(M^{-1})$, in sharp contrast to the $\mathcal{O}(M^{-2})$ behavior in the linear case in which $M$ represents the number of candidate ports, revealing a fundamentally more favorable packing geometry in two dimensions. Secondly, we establish a universal Cramér-Rao bound (CRB) for joint elevation-azimuth estimation, governed by a $2\times 2$ \emph{geometric inertia matrix} whose determinant and eigenstructure fully capture the role of port placement in estimation precision. We further prove that both the trace and determinant of this matrix are invariant to the azimuth look direction. Third, we uncover an intrinsic \emph{precision--ambiguity trade-off}: maximizing the geometric determinant to minimize the CRB drives ports toward the aperture boundary, but simultaneously increases sidelobe-induced spatial ambiguity.
CVNov 3, 2025
CenterMamba-SAM: Center-Prioritized Scanning and Temporal Prototypes for Brain Lesion SegmentationYu Tian, Zhongheng Yang, Chenshi Liu et al.
Brain lesion segmentation remains challenging due to small, low-contrast lesions, anisotropic sampling, and cross-slice discontinuities. We propose CenterMamba-SAM, an end-to-end framework that freezes a pretrained backbone and trains only lightweight adapters for efficient fine-tuning. At its core is the CenterMamba encoder, which employs a novel 3x3 corner-axis-center short-sequence scanning strategy to enable center-prioritized, axis-reinforced, and diagonally compensated information aggregation. This design enhances sensitivity to weak boundaries and tiny foci while maintaining sparse yet effective feature representation. A memory-driven structural prompt generator maintains a prototype bank across neighboring slices, enabling automatic synthesis of reliable prompts without user interaction, thereby improving inter-slice coherence. The memory-augmented multi-scale decoder integrates memory attention modules at multiple levels, combining deep supervision with progressive refinement to restore fine details while preserving global consistency. Extensive experiments on public benchmarks demonstrate that CenterMamba-SAM achieves state-of-the-art performance in brain lesion segmentation.
ROFeb 5, 2025
GARAD-SLAM: 3D GAussian splatting for Real-time Anti Dynamic SLAMMingrui Li, Weijian Chen, Na Cheng et al.
The 3D Gaussian Splatting (3DGS)-based SLAM system has garnered widespread attention due to its excellent performance in real-time high-fidelity rendering. However, in real-world environments with dynamic objects, existing 3DGS-based SLAM systems often face mapping errors and tracking drift issues. To address these problems, we propose GARAD-SLAM, a real-time 3DGS-based SLAM system tailored for dynamic scenes. In terms of tracking, unlike traditional methods, we directly perform dynamic segmentation on Gaussians and map them back to the front-end to obtain dynamic point labels through a Gaussian pyramid network, achieving precise dynamic removal and robust tracking. For mapping, we impose rendering penalties on dynamically labeled Gaussians, which are updated through the network, to avoid irreversible erroneous removal caused by simple pruning. Our results on real-world datasets demonstrate that our method is competitive in tracking compared to baseline methods, generating fewer artifacts and higher-quality reconstructions in rendering.
CVNov 30, 2024
Towards Pixel-Level Prediction for Gaze Following: Benchmark and ApproachFeiyang Liu, Dan Guo, Jingyuan Xu et al.
Following the gaze of other people and analyzing the target they are looking at can help us understand what they are thinking, and doing, and predict the actions that may follow. Existing methods for gaze following struggle to perform well in natural scenes with diverse objects, and focus on gaze points rather than objects, making it difficult to deliver clear semantics and accurate scope of the targets. To address this shortcoming, we propose a novel gaze target prediction solution named GazeSeg, that can fully utilize the spatial visual field of the person as guiding information and lead to a progressively coarse-to-fine gaze target segmentation and recognition process. Specifically, a prompt-based visual foundation model serves as the encoder, working in conjunction with three distinct decoding modules (e.g. FoV perception, heatmap generation, and segmentation) to form the framework for gaze target prediction. Then, with the head bounding box performed as an initial prompt, GazeSeg obtains the FoV map, heatmap, and segmentation map progressively, leading to a unified framework for multiple tasks (e.g. direction estimation, gaze target segmentation, and recognition). In particular, to facilitate this research, we construct and release a new dataset, comprising 72k images with pixel-level annotations and 270 categories of gaze targets, built upon the GazeFollow dataset. The quantitative evaluation shows that our approach achieves the Dice of 0.325 in gaze target segmentation and 71.7% top-5 recognition. Meanwhile, our approach also outperforms previous state-of-the-art methods, achieving 0.953 in AUC on the gaze-following task. The dataset and code will be released.