LGAug 22, 2022
Representation Learning of Knowledge Graph for Wireless Communication NetworksShiwen He, Yeyu Ou, Liangpeng Wang et al.
With the application of the fifth-generation wireless communication technologies, more smart terminals are being used and generating huge amounts of data, which has prompted extensive research on how to handle and utilize these wireless data. Researchers currently focus on the research on the upper-layer application data or studying the intelligent transmission methods concerning a specific problem based on a large amount of data generated by the Monte Carlo simulations. This article aims to understand the endogenous relationship of wireless data by constructing a knowledge graph according to the wireless communication protocols, and domain expert knowledge and further investigating the wireless endogenous intelligence. We firstly construct a knowledge graph of the endogenous factors of wireless core network data collected via a 5G/B5G testing network. Then, a novel model based on graph convolutional neural networks is designed to learn the representation of the graph, which is used to classify graph nodes and simulate the relation prediction. The proposed model realizes the automatic nodes classification and network anomaly cause tracing. It is also applied to the public datasets in an unsupervised manner. Finally, the results show that the classification accuracy of the proposed model is better than the existing unsupervised graph neural network models, such as VGAE and ARVGE.
CVDec 19, 2022
DGNet: Distribution Guided Efficient Learning for Oil Spill Image SegmentationFang Chen, Heiko Balzter, Feixiang Zhou et al.
Successful implementation of oil spill segmentation in Synthetic Aperture Radar (SAR) images is vital for marine environmental protection. In this paper, we develop an effective segmentation framework named DGNet, which performs oil spill segmentation by incorporating the intrinsic distribution of backscatter values in SAR images. Specifically, our proposed segmentation network is constructed with two deep neural modules running in an interactive manner, where one is the inference module to achieve latent feature variable inference from SAR images, and the other is the generative module to produce oil spill segmentation maps by drawing the latent feature variables as inputs. Thus, to yield accurate segmentation, we take into account the intrinsic distribution of backscatter values in SAR images and embed it in our segmentation model. The intrinsic distribution originates from SAR imagery, describing the physical characteristics of oil spills. In the training process, the formulated intrinsic distribution guides efficient learning of optimal latent feature variable inference for oil spill segmentation. The efficient learning enables the training of our proposed DGNet with a small amount of image data. This is economically beneficial to oil spill segmentation where the availability of oil spill SAR image data is limited in practice. Additionally, benefiting from optimal latent feature variable inference, our proposed DGNet performs accurate oil spill segmentation. We evaluate the segmentation performance of our proposed DGNet with different metrics, and experimental evaluations demonstrate its effective segmentations.
CVApr 17, 2023
SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill SegmentationFang Chen, Heiko Balzter, Peng Ren et al.
Effective oil spill segmentation in Synthetic Aperture Radar (SAR) images is critical for marine oil pollution cleanup, and proper image representation is helpful for accurate image segmentation. In this paper, we propose an effective oil spill image segmentation network named SRCNet by leveraging SAR image representation and the training for oil spill segmentation simultaneously. Specifically, our proposed segmentation network is constructed with a pair of deep neural nets with the collaboration of the seminal representation that describes SAR images, where one deep neural net is the generative net which strives to produce oil spill segmentation maps, and the other is the discriminative net which trys its best to distinguish between the produced and the true segmentations, and they thus built a two-player game. Particularly, the seminal representation exploited in our proposed SRCNet originates from SAR imagery, modelling with the internal characteristics of SAR images. Thus, in the training process, the collaborated seminal representation empowers the mapped generative net to produce accurate oil spill segmentation maps efficiently with small amount of training data, promoting the discriminative net reaching its optimal solution at a fast speed. Therefore, our proposed SRCNet operates effective oil spill segmentation in an economical and efficient manner. Additionally, to increase the segmentation capability of the proposed segmentation network in terms of accurately delineating oil spill details in SAR images, a regularisation term that penalises the segmentation loss is devised. This encourages our proposed SRCNet for accurately segmenting oil spill areas from SAR images. Empirical experimental evaluations from different metrics validate the effectiveness of our proposed SRCNet for oil spill image segmentation.
ROMar 11
Cybo-Waiter: A Physical Agentic Framework for Humanoid Whole-Body Locomotion-ManipulationPeng Ren, Haoyang Ge, Chuan Qi et al.
Robots are increasingly expected to execute open ended natural language requests in human environments, which demands reliable long horizon execution under partial observability. This is especially challenging for humanoids because locomotion and manipulation are tightly coupled through stance, reachability, and balance. We present a humanoid agent framework that turns VLM plans into verifiable task programs and closes the loop with multi object 3D geometric supervision. A VLM planner compiles each instruction into a typed JSON sequence of subtasks with explicit predicate based preconditions and success conditions. Using SAM3 and RGB-D, we ground all task relevant entities in 3D, estimate object centroids and extents, and evaluate predicates over stable frames to obtain condition level diagnostics. The supervisor uses these diagnostics to verify subtask completion and to provide condition-level feedback for progression and replanning. We execute each subtask by coordinating humanoid locomotion and whole-body manipulation, selecting feasible motion primitives under reachability and balance constraints. Experiments on tabletop manipulation and long horizon humanoid loco manipulation tasks show improved robustness from multi object grounding, temporal stability, and recovery driven replanning.
LGJun 22, 2022
Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous SpaceZeyu Wang, Huiying Zhao, Peng Ren et al.
Sepsis is a leading cause of death in the ICU. It is a disease requiring complex interventions in a short period of time, but its optimal treatment strategy remains uncertain. Evidence suggests that the practices of currently used treatment strategies are problematic and may cause harm to patients. To address this decision problem, we propose a new medical decision model based on historical data to help clinicians recommend the best reference option for real-time treatment. Our model combines offline reinforcement learning and deep reinforcement learning to solve the problem of traditional reinforcement learning in the medical field due to the inability to interact with the environment, while enabling our model to make decisions in a continuous state-action space. We demonstrate that, on average, the treatments recommended by the model are more valuable and reliable than those recommended by clinicians. In a large validation dataset, we find out that the patients whose actual doses from clinicians matched the decisions made by AI has the lowest mortality rates. Our model provides personalized and clinically interpretable treatment decisions for sepsis to improve patient care.
CVMar 10Code
OilSAM2: Memory-Augmented SAM2 for Scalable SAR Oil Spill DetectionShuaiyu Chen, Ming Yin, Peng Ren et al.
Segmenting oil spills from Synthetic Aperture Radar (SAR) imagery remains challenging due to severe appearance variability, scale heterogeneity, and the absence of temporal continuity in real world monitoring scenarios. While foundation models such as Segment Anything (SAM) enable prompt driven segmentation, existing SAM based approaches operate on single images and cannot effectively reuse information across scenes. Memory augmented variants (e.g., SAM2) further assume temporal coherence, making them prone to semantic drift when applied to unordered SAR image collections. We propose OilSAM2, a memory augmented segmentation framework tailored for unordered SAR oil spill monitoring. OilSAM2 introduces a hierarchical feature aware multi scale memory bank that explicitly models texture, structure, and semantic level representations, enabling robust cross image information reuse. To mitigate memory drift, we further propose a structure semantic consistent memory update strategy that selectively refreshes memory based on semantic discrepancy and structural variation.Experiments on two public SAR oil spill datasets demonstrate that OilSAM2 achieves state of the art segmentation performance, delivering stable and accurate results under noisy SAR monitoring scenarios. The source code is available at https://github.com/Chenshuaiyu1120/OILSAM2.
ROMar 15
CyboRacket: A Perception-to-Action Framework for Humanoid Racket SportsPeng Ren, Chuan Qi, Haoyang Ge et al.
Dynamic ball-interaction tasks remain challenging for robots because they require tight perception-action coupling under limited reaction time. This challenge is especially pronounced in humanoid racket sports, where successful interception depends on accurate visual tracking, trajectory prediction, coordinated stepping, and stable whole-body striking. Existing robotic racket-sport systems often rely on external motion capture for state estimation or on task-specific low-level controllers that must be retrained across tasks and platforms. We present CyboRacket, a hierarchical perception-to-action framework for humanoid racket sports that integrates onboard visual perception, physics-based trajectory prediction, and large-scale pre-trained whole-body control. The framework uses onboard cameras to track the incoming object, predicts its future trajectory, and converts the estimated interception state into target end-effector and base-motion commands for whole-body execution by SONIC on the Unitree G1 humanoid robot. We evaluate the proposed framework in a vision-based humanoid tennis-hitting task. Experimental results demonstrate real-time visual tracking, trajectory prediction, and successful striking using purely onboard sensing.
CLMar 26, 2024
A Large Language Model Guided Topic Refinement Mechanism for Short Text ModelingShuyu Chang, Rui Wang, Peng Ren et al.
Modeling topics effectively in short texts, such as tweets and news snippets, is crucial to capturing rapidly evolving social trends. Existing topic models often struggle to accurately capture the underlying semantic patterns of short texts, primarily due to the sparse nature of such data. This nature of texts leads to an unavoidable lack of co-occurrence information, which hinders the coherence and granularity of mined topics. This paper introduces a novel model-agnostic mechanism, termed Topic Refinement, which leverages the advanced text comprehension capabilities of Large Language Models (LLMs) for short-text topic modeling. Unlike traditional methods, this post-processing mechanism enhances the quality of topics extracted by various topic modeling methods through prompt engineering. We guide LLMs in identifying semantically intruder words within the extracted topics and suggesting coherent alternatives to replace these words. This process mimics human-like identification, evaluation, and refinement of the extracted topics. Extensive experiments on four diverse datasets demonstrate that Topic Refinement boosts the topic quality and improves the performance in topic-related text classification tasks.
CVOct 17, 2025
LILAC: Long-sequence Incremental Low-latency Arbitrary Motion Stylization via Streaming VAE-Diffusion with Causal DecodingPeng Ren, Hai Yang
Generating long and stylized human motions in real time is critical for applications that demand continuous and responsive character control. Despite its importance, existing streaming approaches often operate directly in the raw motion space, leading to substantial computational overhead and making it difficult to maintain temporal stability. In contrast, latent-space VAE-Diffusion-based frameworks alleviate these issues and achieve high-quality stylization, but they are generally confined to offline processing. To bridge this gap, LILAC (Long-sequence Incremental Low-latency Arbitrary Motion Stylization via Streaming VAE-Diffusion with Causal Decoding) builds upon a recent high-performing offline framework for arbitrary motion stylization and extends it to an online setting through a latent-space streaming architecture with a sliding-window causal design and the injection of decoded motion features to ensure smooth motion transitions. This architecture enables long-sequence real-time arbitrary stylization without relying on future frames or modifying the diffusion model architecture, achieving a favorable balance between stylization quality and responsiveness as demonstrated by experiments on benchmark datasets. Supplementary video and examples are available at the project page: https://pren1.github.io/lilac/
IRAug 22, 2025
DSRAG: A Domain-Specific Retrieval Framework Based on Document-derived Multimodal Knowledge GraphMengzheng Yang, Yanfei Ren, David Osei Opoku et al.
Current general-purpose large language models (LLMs) commonly exhibit knowledge hallucination and insufficient domain-specific adaptability in domain-specific tasks, limiting their effectiveness in specialized question answering scenarios. Retrieval-augmented generation (RAG) effectively tackles these challenges by integrating external knowledge to enhance accuracy and relevance. However, traditional RAG still faces limitations in domain knowledge accuracy and context modeling.To enhance domain-specific question answering performance, this work focuses on a graph-based RAG framework, emphasizing the critical role of knowledge graph quality during the generation process. We propose DSRAG (Domain-Specific RAG), a multimodal knowledge graph-driven retrieval-augmented generation framework designed for domain-specific applications. Our approach leverages domain-specific documents as the primary knowledge source, integrating heterogeneous information such as text, images, and tables to construct a multimodal knowledge graph covering both conceptual and instance layers. Building on this foundation, we introduce semantic pruning and structured subgraph retrieval mechanisms, combining knowledge graph context and vector retrieval results to guide the language model towards producing more reliable responses. Evaluations using the Langfuse multidimensional scoring mechanism show that our method excels in domain-specific question answering, validating the efficacy of integrating multimodal knowledge graphs with retrieval-augmented generation.
CVJun 22, 2025
OSDMamba: Enhancing Oil Spill Detection from Remote Sensing Images Using Selective State Space ModelShuaiyu Chen, Fu Wang, Peng Ren et al.
Semantic segmentation is commonly used for Oil Spill Detection (OSD) in remote sensing images. However, the limited availability of labelled oil spill samples and class imbalance present significant challenges that can reduce detection accuracy. Furthermore, most existing methods, which rely on convolutional neural networks (CNNs), struggle to detect small oil spill areas due to their limited receptive fields and inability to effectively capture global contextual information. This study explores the potential of State-Space Models (SSMs), particularly Mamba, to overcome these limitations, building on their recent success in vision applications. We propose OSDMamba, the first Mamba-based architecture specifically designed for oil spill detection. OSDMamba leverages Mamba's selective scanning mechanism to effectively expand the model's receptive field while preserving critical details. Moreover, we designed an asymmetric decoder incorporating ConvSSM and deep supervision to strengthen multi-scale feature fusion, thereby enhancing the model's sensitivity to minority class samples. Experimental results show that the proposed OSDMamba achieves state-of-the-art performance, yielding improvements of 8.9% and 11.8% in OSD across two publicly available datasets.
LGDec 17, 2021
Oil Spill SAR Image Segmentation via Probability Distribution ModellingFang Chen, Aihua Zhang, Heiko Balzter et al.
Segmentation of marine oil spills in Synthetic Aperture Radar (SAR) images is a challenging task because of the complexity and irregularities in SAR images. In this work, we aim to develop an effective segmentation method which addresses marine oil spill identification in SAR images by investigating the distribution representation of SAR images. To seek effective oil spill segmentation, we revisit the SAR imaging mechanism in order to attain the probability distribution representation of oil spill SAR images, in which the characteristics of SAR images are properly modelled. We then exploit the distribution representation to formulate the segmentation energy functional, by which oil spill characteristics are incorporated to guide oil spill segmentation. Moreover, the oil spill segmentation model contains the oil spill contour regularisation term and the updated level set regularisation term which enhance the representational power of the segmentation energy functional. Benefiting from the synchronisation of SAR image representation and oil spill segmentation, our proposed method establishes an effective oil spill segmentation framework. Experimental evaluations demonstrate the effectiveness of our proposed segmentation framework for different types of marine oil spill SAR image segmentation.
LGJan 5, 2020
Fractional order graph neural networkZijian Liu, Chunbo Luo, Shuai Li et al.
This paper proposes fractional order graph neural networks (FGNNs), optimized by the approximation strategy to address the challenges of local optimum of classic and fractional graph neural networks which are specialised at aggregating information from the feature and adjacent matrices of connected nodes and their neighbours to solve learning tasks on non-Euclidean data such as graphs. Meanwhile the approximate calculation of fractional order gradients also overcomes the high computational complexity of fractional order derivations. We further prove that such an approximation is feasible and the FGNN is unbiased towards global optimization solution. Extensive experiments on citation networks show that FGNN achieves great advantage over baseline models when selected appropriate fractional order.
CRJan 23, 2019
Deep Adversarial Learning in Intrusion Detection: A Data Augmentation Enhanced FrameworkHe Zhang, Xingrui Yu, Peng Ren et al.
Intrusion detection systems (IDSs) play an important role in identifying malicious attacks and threats in networking systems. As fundamental tools of IDSs, learning based classification methods have been widely employed. When it comes to detecting network intrusions in small sample sizes (e.g., emerging intrusions), the limited number and imbalanced proportion of training samples usually cause significant challenges in training supervised and semi-supervised classifiers. In this paper, we propose a general network intrusion detection framework to address the challenges of both \emph{data scarcity} and \emph{data imbalance}. The novelty of the proposed framework focuses on incorporating deep adversarial learning with statistical learning and exploiting learning based data augmentation. Given a small set of network intrusion samples, it first derives a Poisson-Gamma joint probabilistic generative model to generate synthesised intrusion data using Monte Carlo methods. Those synthesised data are then augmented by deep generative neural networks through adversarial learning. Finally, it adopts the augmented intrusion data to train supervised models for detecting network intrusions. Comprehensive experimental validations on KDD Cup 99 dataset show that the proposed framework outperforms the existing learning based IDSs in terms of improved accuracy, precision, recall, and F1-score.