LGSep 23, 2024
Research on Dynamic Data Flow Anomaly Detection based on Machine LearningLiyang Wang, Yu Cheng, Hao Gong et al.
The sophistication and diversity of contemporary cyberattacks have rendered the use of proxies, gateways, firewalls, and encrypted tunnels as a standalone defensive strategy inadequate. Consequently, the proactive identification of data anomalies has emerged as a prominent area of research within the field of data security. The majority of extant studies concentrate on sample equilibrium data, with the consequence that the detection effect is not optimal in the context of unbalanced data. In this study, the unsupervised learning method is employed to identify anomalies in dynamic data flows. Initially, multi-dimensional features are extracted from real-time data, and a clustering algorithm is utilised to analyse the patterns of the data. This enables the potential outliers to be automatically identified. By clustering similar data, the model is able to detect data behaviour that deviates significantly from normal traffic without the need for labelled data. The results of the experiments demonstrate that the proposed method exhibits high accuracy in the detection of anomalies across a range of scenarios. Notably, it demonstrates robust and adaptable performance, particularly in the context of unbalanced data.
IVSep 20, 2024
Deep Learning-Based Channel Squeeze U-Structure for Lung Nodule Detection and SegmentationMingxiu Sui, Jiacheng Hu, Tong Zhou et al.
This paper introduces a novel deep-learning method for the automatic detection and segmentation of lung nodules, aimed at advancing the accuracy of early-stage lung cancer diagnosis. The proposed approach leverages a unique "Channel Squeeze U-Structure" that optimizes feature extraction and information integration across multiple semantic levels of the network. This architecture includes three key modules: shallow information processing, channel residual structure, and channel squeeze integration. These modules enhance the model's ability to detect and segment small, imperceptible, or ground-glass nodules, which are critical for early diagnosis. The method demonstrates superior performance in terms of sensitivity, Dice similarity coefficient, precision, and mean Intersection over Union (IoU). Extensive experiments were conducted on the Lung Image Database Consortium (LIDC) dataset using five-fold cross-validation, showing excellent stability and robustness. The results indicate that this approach holds significant potential for improving computer-aided diagnosis systems, providing reliable support for radiologists in clinical practice and aiding in the early detection of lung cancer, especially in resource-limited settings
IVSep 18, 2024
Axial Attention Transformer Networks: A New Frontier in Breast Cancer DetectionWeijie He, Runyuan Bao, Yiru Cang et al.
This paper delves into the challenges and advancements in the field of medical image segmentation, particularly focusing on breast cancer diagnosis. The authors propose a novel Transformer-based segmentation model that addresses the limitations of traditional convolutional neural networks (CNNs), such as U-Net, in accurately localizing and segmenting small lesions within breast cancer images. The model introduces an axial attention mechanism to enhance the computational efficiency and address the issue of global contextual information that is often overlooked by CNNs. Additionally, the paper discusses improvements tailored to the small dataset challenge, including the incorporation of relative position information and a gated axial attention mechanism to refine the model's focus on relevant features. The proposed model aims to significantly improve the segmentation accuracy of breast cancer images, offering a more efficient and effective tool for computer-aided diagnosis.
CLSep 23, 2024
Improving Academic Skills Assessment with NLP and Ensemble LearningXinyi Huang, Yingyi Wu, Danyang Zhang et al.
This study addresses the critical challenges of assessing foundational academic skills by leveraging advancements in natural language processing (NLP). Traditional assessment methods often struggle to provide timely and comprehensive feedback on key cognitive and linguistic aspects, such as coherence, syntax, and analytical reasoning. Our approach integrates multiple state-of-the-art NLP models, including BERT, RoBERTa, BART, DeBERTa, and T5, within an ensemble learning framework. These models are combined through stacking techniques using LightGBM and Ridge regression to enhance predictive accuracy. The methodology involves detailed data preprocessing, feature extraction, and pseudo-label learning to optimize model performance. By incorporating sophisticated NLP techniques and ensemble learning, this study significantly improves the accuracy and efficiency of assessments, offering a robust solution that surpasses traditional methods and opens new avenues for educational technology research focused on enhancing core academic competencies.
LGNov 25, 2024
Enhancing Few-Shot Learning with Integrated Data and GAN Model ApproachesYinqiu Feng, Aoran Shen, Jiacheng Hu et al.
This paper presents an innovative approach to enhancing few-shot learning by integrating data augmentation with model fine-tuning in a framework designed to tackle the challenges posed by small-sample data. Recognizing the critical limitations of traditional machine learning models that require large datasets-especially in fields such as drug discovery, target recognition, and malicious traffic detection-this study proposes a novel strategy that leverages Generative Adversarial Networks (GANs) and advanced optimization techniques to improve model performance with limited data. Specifically, the paper addresses the noise and bias issues introduced by data augmentation methods, contrasting them with model-based approaches, such as fine-tuning and metric learning, which rely heavily on related datasets. By combining Markov Chain Monte Carlo (MCMC) sampling and discriminative model ensemble strategies within a GAN framework, the proposed model adjusts generative and discriminative distributions to simulate a broader range of relevant data. Furthermore, it employs MHLoss and a reparameterized GAN ensemble to enhance stability and accelerate convergence, ultimately leading to improved classification performance on small-sample images and structured datasets. Results confirm that the MhERGAN algorithm developed in this research is highly effective for few-shot learning, offering a practical solution that bridges data scarcity with high-performing model adaptability and generalization.
CLNov 19, 2024
Graph Neural Network-Based Entity Extraction and Relationship Reasoning in Complex Knowledge GraphsJunliang Du, Guiran Liu, Jia Gao et al.
This study proposed a knowledge graph entity extraction and relationship reasoning algorithm based on a graph neural network, using a graph convolutional network and graph attention network to model the complex structure in the knowledge graph. By building an end-to-end joint model, this paper achieves efficient recognition and reasoning of entities and relationships. In the experiment, this paper compared the model with a variety of deep learning algorithms and verified its superiority through indicators such as AUC, recall rate, precision rate, and F1 value. The experimental results show that the model proposed in this paper performs well in all indicators, especially in complex knowledge graphs, it has stronger generalization ability and stability. This provides strong support for further research on knowledge graphs and also demonstrates the application potential of graph neural networks in entity extraction and relationship reasoning.
CVOct 16, 2024
Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithmGuanming Huang, Aoran Shen, Yuxiang Hu et al.
This paper explores the application of knowledge distillation technology in target detection tasks, especially the impact of different distillation temperatures on the performance of student models. By using YOLOv5l as the teacher network and a smaller YOLOv5s as the student network, we found that with the increase of distillation temperature, the student's detection accuracy gradually improved, and finally achieved mAP50 and mAP50-95 indicators that were better than the original YOLOv5s model at a specific temperature. Experimental results show that appropriate knowledge distillation strategies can not only improve the accuracy of the model but also help improve the reliability and stability of the model in practical applications. This paper also records in detail the accuracy curve and loss function descent curve during the model training process and shows that the model converges to a stable state after 150 training cycles. These findings provide a theoretical basis and technical reference for further optimizing target detection algorithms.
IRFeb 5, 2025
Contrastive Learning for Cold Start Recommendation with Adaptive Feature FusionJiacheng Hu, Tai An, Zidong Yu et al.
This paper proposes a cold start recommendation model that integrates contrastive learning, aiming to solve the problem of performance degradation of recommendation systems in cold start scenarios due to the scarcity of user and item interaction data. The model dynamically adjusts the weights of key features through an adaptive feature selection module and effectively integrates user attributes, item meta-information, and contextual features by combining a multimodal feature fusion mechanism, thereby improving recommendation performance. In addition, the model introduces a contrastive learning mechanism to enhance the robustness and generalization ability of feature representation by constructing positive and negative sample pairs. Experiments are conducted on the MovieLens-1M dataset. The results show that the proposed model significantly outperforms mainstream recommendation methods such as Matrix Factorization, LightGBM, DeepFM, and AutoRec in terms of HR, NDCG, MRR, and Recall, especially in cold start scenarios. Ablation experiments further verify the key role of each module in improving model performance, and the learning rate sensitivity analysis shows that a moderate learning rate is crucial to the optimization effect of the model. This study not only provides a new solution to the cold start problem but also provides an important reference for the application of contrastive learning in recommendation systems. In the future, this model is expected to play a role in a wider range of scenarios, such as real-time recommendation and cross-domain recommendation.
CVNov 12, 2024
Artistic Neural Style Transfer Algorithms with Activation SmoothingXiangtian Li, Han Cao, Zhaoyang Zhang et al.
The works of Gatys et al. demonstrated the capability of Convolutional Neural Networks (CNNs) in creating artistic style images. This process of transferring content images in different styles is called Neural Style Transfer (NST). In this paper, we re-implement image-based NST, fast NST, and arbitrary NST. We also explore to utilize ResNet with activation smoothing in NST. Extensive experimental results demonstrate that smoothing transformation can greatly improve the quality of stylization results.
CVNov 27, 2024
Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled DataAoran Shen, Minghao Dai, Jiacheng Hu et al.
In the 21st-century information age, with the development of big data technology, effectively extracting valuable information from massive data has become a key issue. Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data. Especially when labeled data is scarce, their performance is greatly limited. This study optimizes data mining algorithms by introducing semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data, thereby achieving more accurate data analysis and pattern recognition under limited labeled data conditions. Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification, and continuously improve the model prediction performance through an iterative process. The experimental results demonstrate that the proposed method significantly outperforms traditional machine learning techniques such as Support Vector Machine (SVM), XGBoost, and Multi-Layer Perceptron (MLP) on the CIFAR-10 image classification dataset. Notable improvements were observed in key performance metrics, including accuracy, recall, and F1 score. Furthermore, the robustness and noise-resistance capabilities of the semi-supervised CNN model were validated through experiments under varying noise levels, confirming its practical applicability in real-world scenarios.
CLOct 28, 2024
Deep Learning for Medical Text Processing: BERT Model Fine-Tuning and Comparative StudyJiacheng Hu, Yiru Cang, Guiran Liu et al.
This paper proposes a medical literature summary generation method based on the BERT model to address the challenges brought by the current explosion of medical information. By fine-tuning and optimizing the BERT model, we develop an efficient summary generation system that can quickly extract key information from medical literature and generate coherent, accurate summaries. In the experiment, we compared various models, including Seq-Seq, Attention, Transformer, and BERT, and demonstrated that the improved BERT model offers significant advantages in the Rouge and Recall metrics. Furthermore, the results of this study highlight the potential of knowledge distillation techniques to further enhance model performance. The system has demonstrated strong versatility and efficiency in practical applications, offering a reliable tool for the rapid screening and analysis of medical literature.
LGFeb 5, 2025
A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic ModelsJunliang Du, Shiyu Dou, Bohuan Yang et al.
This paper studies a Markov network model for unbalanced data, aiming to solve the problems of classification bias and insufficient minority class recognition ability of traditional machine learning models in environments with uneven class distribution. By constructing joint probability distribution and conditional dependency, the model can achieve global modeling and reasoning optimization of sample categories. The study introduced marginal probability estimation and weighted loss optimization strategies, combined with regularization constraints and structured reasoning methods, effectively improving the generalization ability and robustness of the model. In the experimental stage, a real credit card fraud detection dataset was selected and compared with models such as logistic regression, support vector machine, random forest and XGBoost. The experimental results show that the Markov network performs well in indicators such as weighted accuracy, F1 score, and AUC-ROC, significantly outperforming traditional classification models, demonstrating its strong decision-making ability and applicability in unbalanced data scenarios. Future research can focus on efficient model training, structural optimization, and deep learning integration in large-scale unbalanced data environments and promote its wide application in practical applications such as financial risk control, medical diagnosis, and intelligent monitoring.
IVOct 31, 2024
Deep Learning with HM-VGG: AI Strategies for Multi-modal Image AnalysisJunliang Du, Yiru Cang, Tong Zhou et al.
This study introduces the Hybrid Multi-modal VGG (HM-VGG) model, a cutting-edge deep learning approach for the early diagnosis of glaucoma. The HM-VGG model utilizes an attention mechanism to process Visual Field (VF) data, enabling the extraction of key features that are vital for identifying early signs of glaucoma. Despite the common reliance on large annotated datasets, the HM-VGG model excels in scenarios with limited data, achieving remarkable results with small sample sizes. The model's performance is underscored by its high metrics in Precision, Accuracy, and F1-Score, indicating its potential for real-world application in glaucoma detection. The paper also discusses the challenges associated with ophthalmic image analysis, particularly the difficulty of obtaining large volumes of annotated data. It highlights the importance of moving beyond single-modality data, such as VF or Optical Coherence Tomography (OCT) images alone, to a multimodal approach that can provide a richer, more comprehensive dataset. This integration of different data types is shown to significantly enhance diagnostic accuracy. The HM- VGG model offers a promising tool for doctors, streamlining the diagnostic process and improving patient outcomes. Furthermore, its applicability extends to telemedicine and mobile healthcare, making diagnostic services more accessible. The research presented in this paper is a significant step forward in the field of medical image processing and has profound implications for clinical ophthalmology.
CLDec 11, 2024
Accurate Medical Named Entity Recognition Through Specialized NLP ModelsJiacheng Hu, Runyuan Bao, Yang Lin et al.
This study evaluated the effect of BioBERT in medical text processing for the task of medical named entity recognition. Through comparative experiments with models such as BERT, ClinicalBERT, SciBERT, and BlueBERT, the results showed that BioBERT achieved the best performance in both precision and F1 score, verifying its applicability and superiority in the medical field. BioBERT enhances its ability to understand professional terms and complex medical texts through pre-training on biomedical data, providing a powerful tool for medical information extraction and clinical decision support. The study also explored the privacy and compliance challenges of BioBERT when processing medical data, and proposed future research directions for combining other medical-specific models to improve generalization and robustness. With the development of deep learning technology, the potential of BioBERT in application fields such as intelligent medicine, personalized treatment, and disease prediction will be further expanded. Future research can focus on the real-time and interpretability of the model to promote its widespread application in the medical field.
IRDec 4, 2024
Enhancing Recommendation Systems with GNNs and Addressing Over-SmoothingWenyi Liu, Ziqi Zhang, Xinshi Li et al.
This paper addresses key challenges in enhancing recommendation systems by leveraging Graph Neural Networks (GNNs) and addressing inherent limitations such as over-smoothing, which reduces model effectiveness as network hierarchy deepens. The proposed approach introduces three GNN-based recommendation models, specifically designed to mitigate over-smoothing through innovative mechanisms like residual connections and identity mapping within the aggregation propagation process. These modifications enable more effective information flow across layers, preserving essential user-item interaction details to improve recommendation accuracy. Additionally, the study emphasizes the critical need for interpretability in recommendation systems, aiming to provide transparent and justifiable suggestions tailored to dynamic user preferences. By integrating collaborative filtering with GNN architectures, the proposed models not only enhance predictive accuracy but also align recommendations more closely with individual behaviors, adapting to nuanced shifts in user interests. This work advances the field by tackling both technical and user-centric challenges, contributing to the development of robust and explainable recommendation systems capable of managing the complexity and scale of modern online environments.
LGAug 20, 2025
Structure-Aware Temporal Modeling for Chronic Disease Progression PredictionJiacheng Hu, Bo Zhang, Ting Xu et al.
This study addresses the challenges of symptom evolution complexity and insufficient temporal dependency modeling in Parkinson's disease progression prediction. It proposes a unified prediction framework that integrates structural perception and temporal modeling. The method leverages graph neural networks to model the structural relationships among multimodal clinical symptoms and introduces graph-based representations to capture semantic dependencies between symptoms. It also incorporates a Transformer architecture to model dynamic temporal features during disease progression. To fuse structural and temporal information, a structure-aware gating mechanism is designed to dynamically adjust the fusion weights between structural encodings and temporal features, enhancing the model's ability to identify key progression stages. To improve classification accuracy and stability, the framework includes a multi-component modeling pipeline, consisting of a graph construction module, a temporal encoding module, and a prediction output layer. The model is evaluated on real-world longitudinal Parkinson's disease data. The experiments involve comparisons with mainstream models, sensitivity analysis of hyperparameters, and graph connection density control. Results show that the proposed method outperforms existing approaches in AUC, RMSE, and IPW-F1 metrics. It effectively distinguishes progression stages and improves the model's ability to capture personalized symptom trajectories. The overall framework demonstrates strong generalization and structural scalability, providing reliable support for intelligent modeling of chronic progressive diseases such as Parkinson's disease.
CVDec 22, 2024
Detecting and Classifying Defective Products in Images Using YOLOZhen Qi, Liwei Ding, Xiangtian Li et al.
With the continuous advancement of industrial automation, product quality inspection has become increasingly important in the manufacturing process. Traditional inspection methods, which often rely on manual checks or simple machine vision techniques, suffer from low efficiency and insufficient accuracy. In recent years, deep learning technology, especially the YOLO (You Only Look Once) algorithm, has emerged as a prominent solution in the field of product defect detection due to its efficient real-time detection capabilities and excellent classification performance. This study aims to use the YOLO algorithm to detect and classify defects in product images. By constructing and training a YOLO model, we conducted experiments on multiple industrial product datasets. The results demonstrate that this method can achieve real-time detection while maintaining high detection accuracy, significantly improving the efficiency and accuracy of product quality inspection. This paper further analyzes the advantages and limitations of the YOLO algorithm in practical applications and explores future research directions.
LGMar 3, 2025
A Hybrid CNN-Transformer Model for Heart Disease Prediction Using Life History DataRan Hao, Yanlin Xiang, Junliang Du et al.
This study proposed a hybrid model of a convolutional neural network (CNN) and a Transformer to predict and diagnose heart disease. Based on CNN's strength in detecting local features and the Transformer's high capacity in sensing global relations, the model is able to successfully detect risk factors of heart disease from high-dimensional life history data. Experimental results show that the proposed model outperforms traditional benchmark models like support vector machine (SVM), convolutional neural network (CNN), and long short-term memory network (LSTM) on several measures like accuracy, precision, and recall. This demonstrates its strong ability to deal with multi-dimensional and unstructured data. In order to verify the effectiveness of the model, experiments removing certain parts were carried out, and the results of the experiments showed that it is important to use both CNN and Transformer modules in enhancing the model. This paper also discusses the incorporation of additional features and approaches in future studies to enhance the model's performance and enable it to operate effectively in diverse conditions. This study presents novel insights and methods for predicting heart disease using machine learning, with numerous potential applications especially in personalized medicine and health management.
CVFeb 10, 2025
Multi-Scale Transformer Architecture for Accurate Medical Image ClassificationJiacheng Hu, Yanlin Xiang, Yang Lin et al.
This study introduces an AI-driven skin lesion classification algorithm built on an enhanced Transformer architecture, addressing the challenges of accuracy and robustness in medical image analysis. By integrating a multi-scale feature fusion mechanism and refining the self-attention process, the model effectively extracts both global and local features, enhancing its ability to detect lesions with ambiguous boundaries and intricate structures. Performance evaluation on the ISIC 2017 dataset demonstrates that the improved Transformer surpasses established AI models, including ResNet50, VGG19, ResNext, and Vision Transformer, across key metrics such as accuracy, AUC, F1-Score, and Precision. Grad-CAM visualizations further highlight the interpretability of the model, showcasing strong alignment between the algorithm's focus areas and actual lesion sites. This research underscores the transformative potential of advanced AI models in medical imaging, paving the way for more accurate and reliable diagnostic tools. Future work will explore the scalability of this approach to broader medical imaging tasks and investigate the integration of multimodal data to enhance AI-driven diagnostic frameworks for intelligent healthcare.
CLJan 24, 2025
Dynamic Adaptation of LoRA Fine-Tuning for Efficient and Task-Specific Optimization of Large Language ModelsXiaoxuan Liao, Chihang Wang, Shicheng Zhou et al.
This paper presents a novel methodology of fine-tuning for large language models-dynamic LoRA. Building from the standard Low-Rank Adaptation framework, this methodology further adds dynamic adaptation mechanisms to improve efficiency and performance. The key contribution of dynamic LoRA lies within its adaptive weight allocation mechanism coupled with an input feature-based adaptive strategy. These enhancements allow for a more precise fine-tuning process that is more tailored to specific tasks. Traditional LoRA methods use static adapter settings, not considering the different importance of model layers. In contrast, dynamic LoRA introduces a mechanism that dynamically evaluates the layer's importance during fine-tuning. This evaluation enables the reallocation of adapter parameters to fit the unique demands of each individual task, which leads to better optimization results. Another gain in flexibility arises from the consideration of the input feature distribution, which helps the model generalize better when faced with complicated and diverse datasets. The joint approach boosts not only the performance over each single task but also the generalization ability of the model. The efficiency of the dynamic LoRA was validated in experiments on benchmark datasets, such as GLUE, with surprising results. More specifically, this method achieved 88.1% accuracy with an F1-score of 87.3%. Noticeably, these improvements were made at a slight increase in computational costs: only 0.1% more resources than standard LoRA. This balance between performance and efficiency positions dynamic LoRA as a practical, scalable solution for fine-tuning LLMs, especially in resource-constrained scenarios. To take it a step further, its adaptability makes it a promising foundation for much more advanced applications, including multimodal tasks.
CLDec 25, 2024
Optimizing Large Language Models with an Enhanced LoRA Fine-Tuning Algorithm for Efficiency and Robustness in NLP TasksJiacheng Hu, Xiaoxuan Liao, Jia Gao et al.
This study proposes a large language model optimization method based on the improved LoRA fine-tuning algorithm, aiming to improve the accuracy and computational efficiency of the model in natural language processing tasks. We fine-tune the large language model through a low-rank adaptation strategy, which significantly reduces the consumption of computing resources while maintaining the powerful capabilities of the pre-trained model. The experiment uses the QQP task as the evaluation scenario. The results show that the improved LoRA algorithm shows significant improvements in accuracy, F1 score, and MCC compared with traditional models such as BERT, Roberta, T5, and GPT-4. In particular, in terms of F1 score and MCC, our model shows stronger robustness and discrimination ability, which proves the potential of the improved LoRA algorithm in fine-tuning large-scale pre-trained models. In addition, this paper also discusses the application prospects of the improved LoRA algorithm in other natural language processing tasks, emphasizing its advantages in multi-task learning and scenarios with limited computing resources. Future research can further optimize the LoRA fine-tuning strategy and expand its application in larger-scale pre-trained models to improve the generalization ability and task adaptability of the model.
CVDec 29, 2024
Deep Learning in Image Classification: Evaluating VGG19's Performance on Complex Visual DataWeijie He, Tong Zhou, Yanlin Xiang et al.
This study aims to explore the automatic classification method of pneumonia X-ray images based on VGG19 deep convolutional neural network, and evaluate its application effect in pneumonia diagnosis by comparing with classic models such as SVM, XGBoost, MLP, and ResNet50. The experimental results show that VGG19 performs well in multiple indicators such as accuracy (92%), AUC (0.95), F1 score (0.90) and recall rate (0.87), which is better than other comparison models, especially in image feature extraction and classification accuracy. Although ResNet50 performs well in some indicators, it is slightly inferior to VGG19 in recall rate and F1 score. Traditional machine learning models SVM and XGBoost are obviously limited in image classification tasks, especially in complex medical image analysis tasks, and their performance is relatively mediocre. The research results show that deep learning, especially convolutional neural networks, have significant advantages in medical image classification tasks, especially in pneumonia X-ray image analysis, and can provide efficient and accurate automatic diagnosis support. This research provides strong technical support for the early detection of pneumonia and the development of automated diagnosis systems and also lays the foundation for further promoting the application and development of automated medical image processing technology.
CVMar 28, 2025
A Deep Learning Framework for Boundary-Aware Semantic SegmentationTai An, Weiqiang Huang, Da Xu et al.
As a fundamental task in computer vision, semantic segmentation is widely applied in fields such as autonomous driving, remote sensing image analysis, and medical image processing. In recent years, Transformer-based segmentation methods have demonstrated strong performance in global feature modeling. However, they still struggle with blurred target boundaries and insufficient recognition of small targets. To address these issues, this study proposes a Mask2Former-based semantic segmentation algorithm incorporating a boundary enhancement feature bridging module (BEFBM). The goal is to improve target boundary accuracy and segmentation consistency. Built upon the Mask2Former framework, this method constructs a boundary-aware feature map and introduces a feature bridging mechanism. This enables effective cross-scale feature fusion, enhancing the model's ability to focus on target boundaries. Experiments on the Cityscapes dataset demonstrate that, compared to mainstream segmentation methods, the proposed approach achieves significant improvements in metrics such as mIOU, mDICE, and mRecall. It also exhibits superior boundary retention in complex scenes. Visual analysis further confirms the model's advantages in fine-grained regions. Future research will focus on optimizing computational efficiency and exploring its potential in other high-precision segmentation tasks.
CVMar 17, 2025
Adaptive Transformer Attention and Multi-Scale Fusion for Spine 3D SegmentationYanlin Xiang, Qingyuan He, Ting Xu et al.
This study proposes a 3D semantic segmentation method for the spine based on the improved SwinUNETR to improve segmentation accuracy and robustness. Aiming at the complex anatomical structure of spinal images, this paper introduces a multi-scale fusion mechanism to enhance the feature extraction capability by using information of different scales, thereby improving the recognition accuracy of the model for the target area. In addition, the introduction of the adaptive attention mechanism enables the model to dynamically adjust the attention to the key area, thereby optimizing the boundary segmentation effect. The experimental results show that compared with 3D CNN, 3D U-Net, and 3D U-Net + Transformer, the model of this study has achieved significant improvements in mIoU, mDice, and mAcc indicators, and has better segmentation performance. The ablation experiment further verifies the effectiveness of the proposed improved method, proving that multi-scale fusion and adaptive attention mechanism have a positive effect on the segmentation task. Through the visualization analysis of the inference results, the model can better restore the real anatomical structure of the spinal image. Future research can further optimize the Transformer structure and expand the data scale to improve the generalization ability of the model. This study provides an efficient solution for the task of medical image segmentation, which is of great significance to intelligent medical image analysis.
CVDec 4, 2024
Few-Shot Learning with Adaptive Weight Masking in Conditional GANsJiacheng Hu, Zhen Qi, Jianjun Wei et al.
Deep learning has revolutionized various fields, yet its efficacy is hindered by overfitting and the requirement of extensive annotated data, particularly in few-shot learning scenarios where limited samples are available. This paper introduces a novel approach to few-shot learning by employing a Residual Weight Masking Conditional Generative Adversarial Network (RWM-CGAN) for data augmentation. The proposed model integrates residual units within the generator to enhance network depth and sample quality, coupled with a weight mask regularization technique in the discriminator to improve feature learning from small-sample categories. This method addresses the core issues of robustness and generalization in few-shot learning by providing a controlled and clear augmentation of the sample space. Extensive experiments demonstrate that RWM-CGAN not only expands the sample space effectively but also enriches the diversity and quality of generated samples, leading to significant improvements in detection and classification accuracy on public datasets. The paper contributes to the advancement of few-shot learning by offering a practical solution to the challenges posed by data scarcity and the need for rapid generalization to new tasks or categories.
IVNov 21, 2024
Enhancing Medical Image Segmentation with Deep Learning and Diffusion ModelsHouze Liu, Tong Zhou, Yanlin Xiang et al.
Medical image segmentation is crucial for accurate clinical diagnoses, yet it faces challenges such as low contrast between lesions and normal tissues, unclear boundaries, and high variability across patients. Deep learning has improved segmentation accuracy and efficiency, but it still relies heavily on expert annotations and struggles with the complexities of medical images. The small size of medical image datasets and the high cost of data acquisition further limit the performance of segmentation networks. Diffusion models, with their iterative denoising process, offer a promising alternative for better detail capture in segmentation. However, they face difficulties in accurately segmenting small targets and maintaining the precision of boundary details. This article discusses the importance of medical image segmentation, the limitations of current deep learning approaches, and the potential of diffusion models to address these challenges.