Can Hu

h-index28
2papers

2 Papers

39.6CVMar 23
Survey on Remote Sensing Scene Classification: From Traditional Methods to Large Generative AI Models

Qionghao Huang, Can Hu

Remote sensing scene classification has experienced a paradigmatic transformation from traditional handcrafted feature methods to sophisticated artificial intelligence systems that now form the backbone of modern Earth observation applications. This comprehensive survey examines the complete methodological evolution, systematically tracing development from classical texture descriptors and machine learning classifiers through the deep learning revolution to current state-of-the-art foundation models and generative AI approaches. We chronicle the pivotal shift from manual feature engineering to automated hierarchical representation learning via convolutional neural networks, followed by advanced architectures including Vision Transformers, graph neural networks, and hybrid frameworks. The survey provides in-depth coverage of breakthrough developments in self-supervised foundation models and vision-language systems, highlighting exceptional performance in zero-shot and few-shot learning scenarios. Special emphasis is placed on generative AI innovations that tackle persistent challenges through synthetic data generation and advanced feature learning strategies. We analyze contemporary obstacles including annotation costs, multimodal data fusion complexities, interpretability demands, and ethical considerations, alongside current trends in edge computing deployment, federated learning frameworks, and sustainable AI practices. Based on comprehensive analysis of recent advances and gaps, we identify key future research priorities: advancing hyperspectral and multi-temporal analysis capabilities, developing robust cross-domain generalization methods, and establishing standardized evaluation protocols to accelerate scientific progress in remote sensing scene classification systems.

CVSep 17, 2025
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

Peng Xu, Shengwu Xiong, Jiajun Zhang et al.

This paper reviews the MARS2 2025 Challenge on Multimodal Reasoning. We aim to bring together different approaches in multimodal machine learning and LLMs via a large benchmark. We hope it better allows researchers to follow the state-of-the-art in this very dynamic area. Meanwhile, a growing number of testbeds have boosted the evolution of general-purpose large language models. Thus, this year's MARS2 focuses on real-world and specialized scenarios to broaden the multimodal reasoning applications of MLLMs. Our organizing team released two tailored datasets Lens and AdsQA as test sets, which support general reasoning in 12 daily scenarios and domain-specific reasoning in advertisement videos, respectively. We evaluated 40+ baselines that include both generalist MLLMs and task-specific models, and opened up three competition tracks, i.e., Visual Grounding in Real-world Scenarios (VG-RS), Visual Question Answering with Spatial Awareness (VQA-SA), and Visual Reasoning in Creative Advertisement Videos (VR-Ads). Finally, 76 teams from the renowned academic and industrial institutions have registered and 40+ valid submissions (out of 1200+) have been included in our ranking lists. Our datasets, code sets (40+ baselines and 15+ participants' methods), and rankings are publicly available on the MARS2 workshop website and our GitHub organization page https://github.com/mars2workshop/, where our updates and announcements of upcoming events will be continuously provided.