CVSep 25, 2023
AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic SegmentationSiqi Du, Weixi Wang, Renzhong Guo et al.
Understanding indoor scenes is crucial for urban studies. Considering the dynamic nature of indoor environments, effective semantic segmentation requires both real-time operation and high accuracy.To address this, we propose AsymFormer, a novel network that improves real-time semantic segmentation accuracy using RGB-D multi-modal information without substantially increasing network complexity. AsymFormer uses an asymmetrical backbone for multimodal feature extraction, reducing redundant parameters by optimizing computational resource distribution. To fuse asymmetric multimodal features, a Local Attention-Guided Feature Selection (LAFS) module is used to selectively fuse features from different modalities by leveraging their dependencies. Subsequently, a Cross-Modal Attention-Guided Feature Correlation Embedding (CMA) module is introduced to further extract cross-modal representations. The AsymFormer demonstrates competitive results with 54.1% mIoU on NYUv2 and 49.1% mIoU on SUNRGBD. Notably, AsymFormer achieves an inference speed of 65 FPS (79 FPS after implementing mixed precision quantization) on RTX3090, demonstrating that AsymFormer can strike a balance between high accuracy and efficiency.
CVOct 7, 2023
Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive AnalysisSiqi Du, Shengjun Tang, Weixi Wang et al.
This paper introduces a novel framework, Tree-GPT, which incorporates Large Language Models (LLMs) into the forestry remote sensing data workflow, thereby enhancing the efficiency of data analysis. Currently, LLMs are unable to extract or comprehend information from images and may generate inaccurate text due to a lack of domain knowledge, limiting their use in forestry data analysis. To address this issue, we propose a modular LLM expert system, Tree-GPT, that integrates image understanding modules, domain knowledge bases, and toolchains. This empowers LLMs with the ability to comprehend images, acquire accurate knowledge, generate code, and perform data analysis in a local environment. Specifically, the image understanding module extracts structured information from forest remote sensing images by utilizing automatic or interactive generation of prompts to guide the Segment Anything Model (SAM) in generating and selecting optimal tree segmentation results. The system then calculates tree structural parameters based on these results and stores them in a database. Upon receiving a specific natural language instruction, the LLM generates code based on a thought chain to accomplish the analysis task. The code is then executed by an LLM agent in a local environment and . For ecological parameter calculations, the system retrieves the corresponding knowledge from the knowledge base and inputs it into the LLM to guide the generation of accurate code. We tested this system on several tasks, including Search, Visualization, and Machine Learning Analysis. The prototype system performed well, demonstrating the potential for dynamic usage of LLMs in forestry research and environmental sciences.
CVNov 2, 2023
Overhead Line Defect Recognition Based on Unsupervised Semantic SegmentationWeixi Wang, Xichen Zhong, Xin Li et al.
Overhead line inspection greatly benefits from defect recognition using visible light imagery. Addressing the limitations of existing feature extraction techniques and the heavy data dependency of deep learning approaches, this paper introduces a novel defect recognition framework. This is built on the Faster RCNN network and complemented by unsupervised semantic segmentation. The approach involves identifying the type and location of the target equipment, utilizing semantic segmentation to differentiate between the device and its backdrop, and finally employing similarity measures and logical rules to categorize the type of defect. Experimental results indicate that this methodology focuses more on the equipment rather than the defects when identifying issues in overhead lines. This leads to a notable enhancement in accuracy and exhibits impressive adaptability. Thus, offering a fresh perspective for automating the inspection of distribution network equipment.
HCAug 9, 2015
Preprint Virtual Reality Based GIS Analysis PlatformWeixi Wang, Zhihan Lv, Xiaoming Li et al.
This is the preprint version of our paper on ICONIP2015. The proposed platform supports the integrated VRGIS functions including 3D spatial analysis functions, 3D visualization for spatial process and serves for 3D globe and digital city. The 3D analysis and visualization of the concerned city massive information are conducted in the platform. The amount of information that can be visualized with this platform is overwhelming, and the GIS based navigational scheme allows to have great flexibility to access the different available data sources.
GRApr 6, 2015
Preprint Big City 3D Visual AnalysisZhihan Lv, Xiaoming Li, Baoyun Zhang et al.
This is the preprint version of our paper on EUROGRAPHICS 2015. A big city visual analysis platform based on Web Virtual Reality Geographical Information System (WEBVRGIS) is presented. Extensive model editing functions and spatial analysis functions are available, including terrain analysis, spatial analysis, sunlight analysis, traffic analysis, population analysis and community analysis.
HCApr 4, 2015
WebVRGIS Based City Bigdata 3D Visualization and AnalysisXiaoming Li, Zhihan Lv, Baoyun Zhang et al.
This paper shows the WEBVRGIS platform overlying multiple types of data about Shenzhen over a 3d globe. The amount of information that can be visualized with this platform is overwhelming, and the GIS-based navigational scheme allows to have great flexibility to access the different available data sources. For example,visualising historical and forecasted passenger volume at stations could be very helpful when overlaid with other social data.