Chengcheng Chen

CV
h-index12
5papers
27citations
Novelty38%
AI Score32

5 Papers

CVSep 30, 2024Code
EEG Emotion Copilot: Optimizing Lightweight LLMs for Emotional EEG Interpretation with Assisted Medical Record Generation

Hongyu Chen, Weiming Zeng, Chengcheng Chen et al.

In the fields of affective computing (AC) and brain-machine interface (BMI), the analysis of physiological and behavioral signals to discern individual emotional states has emerged as a critical research frontier. While deep learning-based approaches have made notable strides in EEG emotion recognition, particularly in feature extraction and pattern recognition, significant challenges persist in achieving end-to-end emotion computation, including real-time processing, individual adaptation, and seamless user interaction. This paper presents the EEG Emotion Copilot, a system optimizing a lightweight large language model (LLM) with 0.5B parameters operating in a local setting, which first recognizes emotional states directly from EEG signals, subsequently generates personalized diagnostic and treatment suggestions, and finally supports the automation of assisted electronic medical records. Specifically, we demonstrate the critical techniques in the novel data structure of prompt, model pruning and fine-tuning training, and deployment strategies aiming at improving real-time performance and computational efficiency. Extensive experiments show that our optimized lightweight LLM-based copilot achieves an enhanced intuitive interface for participant interaction, superior accuracy of emotion recognition and assisted electronic medical records generation, in comparison to such models with similar scale parameters or large-scale parameters such as 1.5B, 1.8B, 3B and 7B. In summary, through these efforts, the proposed copilot is expected to advance the application of AC in the medical domain, offering innovative solution to mental health monitoring. The codes will be released at https://github.com/NZWANG/EEG_Emotion_Copilot.

CVOct 30, 2024Code
RSNet: A Light Framework for The Detection of SAR Ship Detection

Hongyu Chen, Chengcheng Chen, Fei Wang et al.

Recent advancements in synthetic aperture radar (SAR) ship detection using deep learning have significantly improved accuracy and speed, yet effectively detecting small objects in complex backgrounds with fewer parameters remains a challenge. This letter introduces RSNet, a lightweight framework constructed to enhance ship detection in SAR imagery. To ensure accuracy with fewer parameters, we proposed Waveletpool-ContextGuided (WCG) as its backbone, guiding global context understanding through multi-scale wavelet features for effective detection in complex scenes. Additionally, Waveletpool-StarFusion (WSF) is introduced as the neck, employing a residual wavelet element-wise multiplication structure to achieve higher dimensional nonlinear features without increasing network width. The Lightweight-Shared (LS) module is designed as detect components to achieve efficient detection through lightweight shared convolutional structure and multi-format compatibility. Experiments on the SAR Ship Detection Dataset (SSDD) and High-Resolution SAR Image Dataset (HRSID) demonstrate that RSNet achieves a strong balance between lightweight design and detection performance, surpassing many state-of-the-art detectors, reaching 72.5\% and 67.6\% in \textbf{\(\mathbf{mAP_{.50:.95}}\) }respectively with 1.49M parameters. Our code will be released soon.

CVDec 8, 2024Code
MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios

Yugang Chang, Hongyu Chen, Fei Wang et al.

This paper introduces the Maritime Ship Navigation Behavior Dataset (MID), designed to address challenges in ship detection within complex maritime environments using Oriented Bounding Boxes (OBB). MID contains 5,673 images with 135,884 finely annotated target instances, supporting both supervised and semi-supervised learning. It features diverse maritime scenarios such as ship encounters under varying weather, docking maneuvers, small target clustering, and partial occlusions, filling critical gaps in datasets like HRSID, SSDD, and NWPU-10. MID's images are sourced from high-definition video clips of real-world navigation across 43 water areas, with varied weather and lighting conditions (e.g., rain, fog). Manually curated annotations enhance the dataset's variety, ensuring its applicability to real-world demands in busy ports and dense maritime regions. This diversity equips models trained on MID to better handle complex, dynamic environments, supporting advancements in maritime situational awareness. To validate MID's utility, we evaluated 10 detection algorithms, providing an in-depth analysis of the dataset, detection results from various models, and a comparative study of baseline algorithms, with a focus on handling occlusions and dense target clusters. The results highlight MID's potential to drive innovation in intelligent maritime traffic monitoring and autonomous navigation systems. The dataset will be made publicly available at https://github.com/VirtualNew/MID_DataSet.

CVNov 3, 2024
A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning

Fei Wang, Chengcheng Chen, Hongyu Chen et al.

Current visual question answering (VQA) tasks often require constructing multimodal datasets and fine-tuning visual language models, which demands significant time and resources. This has greatly hindered the application of VQA to downstream tasks, such as ship information analysis based on Synthetic Aperture Radar (SAR) imagery. To address this challenge, this letter proposes a novel VQA approach that integrates object detection networks with visual language models, specifically designed for analyzing ships in SAR images. This integration aims to enhance the capabilities of VQA systems, focusing on aspects such as ship location, density, and size analysis, as well as risk behavior detection. Initially, we conducted baseline experiments using YOLO networks on two representative SAR ship detection datasets, SSDD and HRSID, to assess each model's performance in terms of detection accuracy. Based on these results, we selected the optimal model, YOLOv8n, as the most suitable detection network for this task. Subsequently, leveraging the vision-language model Qwen2-VL, we designed and implemented a VQA task specifically for SAR scenes. This task employs the ship location and size information output by the detection network to generate multi-turn dialogues and scene descriptions for SAR imagery. Experimental results indicate that this method not only enables fundamental SAR scene question-answering without the need for additional datasets or fine-tuning but also dynamically adapts to complex, multi-turn dialogue requirements, demonstrating robust semantic understanding and adaptability.

CVMar 11, 2025
Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method

Fei Wang, Chengcheng Chen, Hongyu Chen et al.

Recently, large language models (LLMs) and vision-language models (VLMs) have achieved significant success, demonstrating remarkable capabilities in understanding various images and videos, particularly in classification and detection tasks. However, due to the substantial differences between remote sensing images and conventional optical images, these models face considerable challenges in comprehension, especially in detection tasks. Directly prompting VLMs with detection instructions often leads to unsatisfactory results. To address this issue, this letter explores the application of VLMs for object detection in remote sensing images. Specifically, we constructed supervised fine-tuning (SFT) datasets using publicly available remote sensing object detection datasets, including SSDD, HRSID, and NWPU-VHR-10. In these new datasets, we converted annotation information into JSON-compliant natural language descriptions, facilitating more effective understanding and training for the VLM. We then evaluate the detection performance of various fine-tuning strategies for VLMs and derive optimized model weights for object detection in remote sensing images. Finally, we evaluate the model's prior knowledge capabilities using natural language queries. Experimental results demonstrate that, without modifying the model architecture, remote sensing object detection can be effectively achieved using natural language alone. Additionally, the model exhibits the ability to perform certain vision question answering (VQA) tasks. Our datasets and related code will be released soon.