CVMar 21

Less is More in Semantic Space: Intrinsic Decoupling via Clifford-M for Fundus Image Classification

arXiv:2603.2080645.4

Predicted impact top 74% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the need for efficient and accurate multi-label fundus diagnosis, showing that competitive performance can be achieved without explicit frequency engineering, which is an incremental improvement over existing multi-scale approaches.

The authors propose Clifford-M, a lightweight backbone for multi-label fundus image classification that replaces explicit frequency decomposition with sparse geometric interaction via a Clifford-style rolling product. Without pre-training, it achieves a mean AUC-ROC of 0.8142 and macro-F1 of 0.5481 on ODIR-5K with only 0.85M parameters, outperforming larger CNN baselines, and shows robustness on cross-dataset evaluation (RFMiD).

Multi-label fundus diagnosis requires features that capture both fine-grained lesions and large-scale retinal structure. Many multi-scale medical vision models address this challenge through explicit frequency decomposition, but our ablation studies show that such heuristics provide limited benefit in this setting: replacing the proposed simple dual-resolution stem with Octave Convolution increased parameters by 35% and computation by a 2.23-fold increase in computation; without improving mean accuracy, while a fixed wavelet-based variant performed substantially worse. Motivated by these findings, we propose Clifford-M, a lightweight backbone that replaces both feed-forward expansion and frequency-splitting modules with sparse geometric interaction. The model is built on a Clifford-style rolling product that jointly captures alignment and structural variation with linear complexity, enabling efficient cross-scale fusion and self-refinement in a compact dual-resolution architecture. Without pre-training, Clifford-M achieves a mean AUC-ROC of 0.8142 and a mean macro-F1 (optimal threshold) of 0.5481 on ODIR-5K using only 0.85M parameters, outperforming substantially larger mid-scale CNN baselines under the same training protocol. When evaluated on RFMiD without fine-tuning, it attains 0.7425 +/- 0.0198 macro AUC and 0.7610 +/- 0.0344 micro AUC, indicating reasonable robustness to cross-dataset shift. These results suggest that competitive and efficient fundus diagnosis can be achieved without explicit frequency engineering, provided that the core feature interaction is designed to capture multi-scale structure directly.

View on arXiv PDF

Similar