OCMay 2
The Generalized Matrix Separation Problem: AlgorithmsXuemei Chen, Owen Deen
When given a generalized matrix separation problem, which aims to recover a low rank matrix $L_0$ and a sparse matrix $S_0$ from $M_0=L_0+HS_0$, the work \cite{CW25} proposes a novel convex optimization problem whose objective function is the sum of the $\ell_1$-norm and nuclear norm. In this paper we detail the iterative algorithms and its associated computations for solving this convex optimization problem. We present various efficient implementation strategies, with attention to practical cases where $H$ is circulant, separable, or block structured. Notably, we propose a preconditioning technique that drastically improved the performance of our algorithms in terms of efficiency, accuracy, and robustness. While this paper serves as an illustrative algorithm implementation manual, we also provide theoretical guarantee for our preconditioning strategy. Numerical results demonstrate the effectiveness of the proposed approach.
LGDec 1, 2025
A Fine Evaluation Method for Cube Copying Test for Early Detection of Alzheimer's DiseaseXinyu Jiang, Cuiyun Gao, Wenda Huang et al.
Background: Impairment of visual spatial cognitive function is the most common early clinical manifestation of Alzheimer's Disease (AD). When the Montreal Cognitive Assessment (MoCA) uses the "0/1" binary method ("pass/fail") to evaluate the visual spatial cognitive ability represented by the Cube Copying Test(CCT), the elder with less formal education generally score 0 point, resulting in serious bias in the evaluation results. Therefore, this study proposes a fine evaluation method for CCT based on dynamic handwriting feature extraction of DH-SCSM-BLA. method : The Cogni-CareV3.0 software independently developed by our team was used to collect dynamic handwriting data of CCT. Then, the spatial and motion features of segmented dynamic handwriting were extracted, and feature matrix with unequal dimensions were normalized. Finally, a bidirectional long short-term memory network model combined with attention mechanism (BiLSTM-Attention) was adopted for classification. Result: The experimental results showed that: The proposed method has significant superiority compared to similar studies, with a classification accuracy of 86.69%. The distribution of cube drawing ability scores has significant regularity for three aspects such as MCI patients and healthy control group, age, and levels of education. It was also found that score for each cognitive task including cube drawing ability score is negatively correlated with age. Score for each cognitive task including cube drawing ability score, but positively correlated with levels of education significantly. Conclusion: This study provides a relatively objective and comprehensive evaluation method for early screening and personalized intervention of visual spatial cognitive impairment.
CVSep 23, 2025
MsFIN: Multi-scale Feature Interaction Network for Traffic Accident AnticipationTongshuai Wu, Chao Lu, Ze Song et al.
With the widespread deployment of dashcams and advancements in computer vision, developing accident prediction models from the dashcam perspective has become critical for proactive safety interventions. However, two key challenges persist: modeling feature-level interactions among traffic participants (often occluded in dashcam views) and capturing complex, asynchronous multi-temporal behavioral cues preceding accidents. To deal with these two challenges, a Multi-scale Feature Interaction Network (MsFIN) is proposed for early-stage accident anticipation from dashcam videos. MsFIN has three layers for multi-scale feature aggregation, temporal feature processing and multi-scale feature post fusion, respectively. For multi-scale feature aggregation, a Multi-scale Module is designed to extract scene representations at short-term, mid-term and long-term temporal scales. Meanwhile, the Transformer architecture is leveraged to facilitate comprehensive feature interactions. Temporal feature processing captures the sequential evolution of scene and object features under causal constraints. In the multi-scale feature post fusion stage, the network fuses scene and object features across multiple temporal scales to generate a comprehensive risk representation. Experiments on DAD and DADA datasets show that MsFIN significantly outperforms state-of-the-art models with single-scale feature extraction in both prediction correctness and earliness. Ablation studies validate the effectiveness of each module in MsFIN, highlighting how the network achieves superior performance through multi-scale feature fusion and contextual interaction modeling.
CVJun 9, 2025
SpikeSMOKE: Spiking Neural Networks for Monocular 3D Object Detection with Cross-Scale Gated CodingXuemei Chen, Huamin Wang, Hangchi Shen et al.
Low energy consumption for 3D object detection is an important research area because of the increasing energy consumption with their wide application in fields such as autonomous driving. The spiking neural networks (SNNs) with low-power consumption characteristics can provide a novel solution for this research. Therefore, we apply SNNs to monocular 3D object detection and propose the SpikeSMOKE architecture in this paper, which is a new attempt for low-power monocular 3D object detection. As we all know, discrete signals of SNNs will generate information loss and limit their feature expression ability compared with the artificial neural networks (ANNs).In order to address this issue, inspired by the filtering mechanism of biological neuronal synapses, we propose a cross-scale gated coding mechanism(CSGC), which can enhance feature representation by combining cross-scale fusion of attentional methods and gated filtering mechanisms.In addition, to reduce the computation and increase the speed of training, we present a novel light-weight residual block that can maintain spiking computing paradigm and the highest possible detection performance. Compared to the baseline SpikeSMOKE under the 3D Object Detection, the proposed SpikeSMOKE with CSGC can achieve 11.78 (+2.82, Easy), 10.69 (+3.2, Moderate), and 10.48 (+3.17, Hard) on the KITTI autonomous driving dataset by AP|R11 at 0.7 IoU threshold, respectively. It is important to note that the results of SpikeSMOKE can significantly reduce energy consumption compared to the results on SMOKE. For example,the energy consumption can be reduced by 72.2% on the hard category, while the detection performance is reduced by only 4%. SpikeSMOKE-L (lightweight) can further reduce the amount of parameters by 3 times and computation by 10 times compared to SMOKE.
CVJun 10, 2024
NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking Neural NetworksYuqi Ma, Huamin Wang, Hangchi Shen et al.
Recently, brain-inspired spiking neural networks (SNNs) have attracted great research attention owing to their inherent bio-interpretability, event-triggered properties and powerful perception of spatiotemporal information, which is beneficial to handling event-based neuromorphic datasets. In contrast to conventional static image datasets, event-based neuromorphic datasets present heightened complexity in feature extraction due to their distinctive time series and sparsity characteristics, which influences their classification accuracy. To overcome this challenge, a novel approach termed Neuromorphic Momentum Contrast Learning (NeuroMoCo) for SNNs is introduced in this paper by extending the benefits of self-supervised pre-training to SNNs to effectively stimulate their potential. This is the first time that self-supervised learning (SSL) based on momentum contrastive learning is realized in SNNs. In addition, we devise a novel loss function named MixInfoNCE tailored to their temporal characteristics to further increase the classification accuracy of neuromorphic datasets, which is verified through rigorous ablation experiments. Finally, experiments on DVS-CIFAR10, DVS128Gesture and N-Caltech101 have shown that NeuroMoCo of this paper establishes new state-of-the-art (SOTA) benchmarks: 83.6% (Spikformer-2-256), 98.62% (Spikformer-2-256), and 84.4% (SEW-ResNet-18), respectively.