AINov 6, 2023
Deep Learning-Empowered Semantic Communication Systems with a Shared Knowledge BasePeng Yi, Yang Cao, Xin Kang et al.
Deep learning-empowered semantic communication is regarded as a promising candidate for future 6G networks. Although existing semantic communication systems have achieved superior performance compared to traditional methods, the end-to-end architecture adopted by most semantic communication systems is regarded as a black box, leading to the lack of explainability. To tackle this issue, in this paper, a novel semantic communication system with a shared knowledge base is proposed for text transmissions. Specifically, a textual knowledge base constructed by inherently readable sentences is introduced into our system. With the aid of the shared knowledge base, the proposed system integrates the message and corresponding knowledge from the shared knowledge base to obtain the residual information, which enables the system to transmit fewer symbols without semantic performance degradation. In order to make the proposed system more reliable, the semantic self-information and the source entropy are mathematically defined based on the knowledge base. Furthermore, the knowledge base construction algorithm is developed based on a similarity-comparison method, in which a pre-configured threshold can be leveraged to control the size of the knowledge base. Moreover, the simulation results have demonstrated that the proposed approach outperforms existing baseline methods in terms of transmitted data size and sentence similarity.
CVApr 15, 2023
Region-Enhanced Feature Learning for Scene Semantic SegmentationXin Kang, Chaoqun Wang, Xuejin Chen
Semantic segmentation in complex scenes relies not only on object appearance but also on object location and the surrounding environment. Nonetheless, it is difficult to model long-range context in the format of pairwise point correlations due to the huge computational cost for large-scale point clouds. In this paper, we propose using regions as the intermediate representation of point clouds instead of fine-grained points or voxels to reduce the computational burden. We introduce a novel Region-Enhanced Feature Learning Network (REFL-Net) that leverages region correlations to enhance point feature learning. We design a region-based feature enhancement (RFE) module, which consists of a Semantic-Spatial Region Extraction stage and a Region Dependency Modeling stage. In the first stage, the input points are grouped into a set of regions based on their semantic and spatial proximity. In the second stage, we explore inter-region semantic and spatial relationships by employing a self-attention block on region features and then fuse point features with the region features to obtain more discriminative representations. Our proposed RFE module is plug-and-play and can be integrated with common semantic segmentation backbones. We conduct extensive experiments on ScanNetV2 and S3DIS datasets and evaluate our RFE module with different segmentation backbones. Our REFL-Net achieves 1.8% mIoU gain on ScanNetV2 and 1.7% mIoU gain on S3DIS with negligible computational cost compared with backbone models. Both quantitative and qualitative results show the powerful long-range context modeling ability and strong generalization ability of our REFL-Net.
ROMar 16
LiDAR-EVS: Enhance Extrapolated View Synthesis for 3D Gaussian Splatting with Pseudo-LiDAR SupervisionYiming Huang, Xin Kang, Sipeng Zhang et al.
3D Gaussian Splatting (3DGS) has emerged as a powerful technique for real-time LiDAR and camera synthesis in autonomous driving simulation. However, simulating LiDAR with 3DGS remains challenging for extrapolated views beyond the training trajectory, as existing methods are typically trained on single-traversal sensor scans, suffer from severe overfitting and poor generalization to novel ego-vehicle paths. To enable reliable simulation of LiDAR along unseen driving trajectories without external multi-pass data, we present LiDAR-EVS, a lightweight framework for robust extrapolated-view LiDAR simulation in autonomous driving. Designed to be plug-and-play, LiDAR-EVS readily extends to diverse LiDAR sensors and neural rendering baselines with minimal modification. Our framework comprises two key components: (1) pseudo extrapolated-view point cloud supervision with multi-frame LiDAR fusion, view transformation, occlusion curling, and intensity adjustment; (2) spatially-constrained dropout regularization that promotes robustness to diverse trajectory variations encountered in real-world driving. Extensive experiments demonstrate that LiDAR-EVS achieves SOTA performance on extrapolated-view LiDAR synthesis across three datasets, making it a promising tool for data-driven simulation, closed-loop evaluation, and synthetic data generation in autonomous driving systems.
CVApr 12, 2024
MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression RecognitionLinhuang Wang, Xin Kang, Fei Ding et al.
Unlike typical video action recognition, Dynamic Facial Expression Recognition (DFER) does not involve distinct moving targets but relies on localized changes in facial muscles. Addressing this distinctive attribute, we propose a Multi-Scale Spatio-temporal CNN-Transformer network (MSSTNet). Our approach takes spatial features of different scales extracted by CNN and feeds them into a Multi-scale Embedding Layer (MELayer). The MELayer extracts multi-scale spatial information and encodes these features before sending them into a Temporal Transformer (T-Former). The T-Former simultaneously extracts temporal information while continually integrating multi-scale spatial information. This process culminates in the generation of multi-scale spatio-temporal features that are utilized for the final classification. Our method achieves state-of-the-art results on two in-the-wild datasets. Furthermore, a series of ablation experiments and visualizations provide further validation of our approach's proficiency in leveraging spatio-temporal information within DFER.
CVDec 10, 2024
ResGS: Residual Densification of 3D Gaussian for Efficient Detail RecoveryYanzhe Lyu, Kai Cheng, Xin Kang et al.
Recently, 3D Gaussian Splatting (3D-GS) has prevailed in novel view synthesis, achieving high fidelity and efficiency. However, it often struggles to capture rich details and complete geometry. Our analysis reveals that the 3D-GS densification operation lacks adaptiveness and faces a dilemma between geometry coverage and detail recovery. To address this, we introduce a novel densification operation, residual split, which adds a downscaled Gaussian as a residual. Our approach is capable of adaptively retrieving details and complementing missing geometry. To further support this method, we propose a pipeline named ResGS. Specifically, we integrate a Gaussian image pyramid for progressive supervision and implement a selection scheme that prioritizes the densification of coarse Gaussians over time. Extensive experiments demonstrate that our method achieves SOTA rendering quality. Consistent performance improvements can be achieved by applying our residual split on various 3D-GS variants, underscoring its versatility and potential for broader application in 3D-GS-based applications.
CVOct 14, 2025
SimULi: Real-Time LiDAR and Camera Simulation with Unscented TransformsHaithem Turki, Qi Wu, Xin Kang et al.
Rigorous testing of autonomous robots, such as self-driving vehicles, is essential to ensure their safety in real-world deployments. This requires building high-fidelity simulators to test scenarios beyond those that can be safely or exhaustively collected in the real-world. Existing neural rendering methods based on NeRF and 3DGS hold promise but suffer from low rendering speeds or can only render pinhole camera models, hindering their suitability to applications that commonly require high-distortion lenses and LiDAR data. Multi-sensor simulation poses additional challenges as existing methods handle cross-sensor inconsistencies by favoring the quality of one modality at the expense of others. To overcome these limitations, we propose SimULi, the first method capable of rendering arbitrary camera models and LiDAR data in real-time. Our method extends 3DGUT, which natively supports complex camera models, with LiDAR support, via an automated tiling strategy for arbitrary spinning LiDAR models and ray-based culling. To address cross-sensor inconsistencies, we design a factorized 3D Gaussian representation and anchoring strategy that reduces mean camera and depth error by up to 40% compared to existing methods. SimULi renders 10-20x faster than ray tracing approaches and 1.5-10x faster than prior rasterization-based work (and handles a wider range of camera models). When evaluated on two widely benchmarked autonomous driving datasets, SimULi matches or exceeds the fidelity of existing state-of-the-art methods across numerous camera and LiDAR metrics.
CRSep 18, 2025
Toward a Unified Security Framework for AI Agents: Trust, Risk, and LiabilityJiayun Mo, Xin Kang, Tieyan Li et al.
The excitement brought by the development of AI agents came alongside arising problems. These concerns centered around users' trust issues towards AIs, the risks involved, and the difficulty of attributing responsibilities and liabilities. Current solutions only attempt to target each problem separately without acknowledging their inter-influential nature. The Trust, Risk and Liability (TRL) framework proposed in this paper, however, ties together the interdependent relationships of trust, risk, and liability to provide a systematic method of building and enhancing trust, analyzing and mitigating risks, and allocating and attributing liabilities. It can be applied to analyze any application scenarios of AI agents and suggest appropriate measures fitting to the context. The implications of the TRL framework lie in its potential societal impacts, economic impacts, ethical impacts, and more. It is expected to bring remarkable values to addressing potential challenges and promoting trustworthy, risk-free, and responsible usage of AI in 6G networks.
CVMay 30, 2025
LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion FrameworkXin Kang, Zihan Zheng, Lei Chu et al.
We present LTM3D, a Latent Token space Modeling framework for conditional 3D shape generation that integrates the strengths of diffusion and auto-regressive (AR) models. While diffusion-based methods effectively model continuous latent spaces and AR models excel at capturing inter-token dependencies, combining these paradigms for 3D shape generation remains a challenge. To address this, LTM3D features a Conditional Distribution Modeling backbone, leveraging a masked autoencoder and a diffusion model to enhance token dependency learning. Additionally, we introduce Prefix Learning, which aligns condition tokens with shape latent tokens during generation, improving flexibility across modalities. We further propose a Latent Token Reconstruction module with Reconstruction-Guided Sampling to reduce uncertainty and enhance structural fidelity in generated shapes. Our approach operates in token space, enabling support for multiple 3D representations, including signed distance fields, point clouds, meshes, and 3D Gaussian Splatting. Extensive experiments on image- and text-conditioned shape generation tasks demonstrate that LTM3D outperforms existing methods in prompt fidelity and structural accuracy while offering a generalizable framework for multi-modal, multi-representation 3D generation.
CLFeb 12, 2025
Neuro-Conceptual Artificial Intelligence: Integrating OPM with Deep Learning to Enhance Question Answering QualityXin Kang, Veronika Shteingardt, Yuhan Wang et al.
Knowledge representation and reasoning are critical challenges in Artificial Intelligence (AI), particularly in integrating neural and symbolic approaches to achieve explainable and transparent AI systems. Traditional knowledge representation methods often fall short of capturing complex processes and state changes. We introduce Neuro-Conceptual Artificial Intelligence (NCAI), a specialization of the neuro-symbolic AI approach that integrates conceptual modeling using Object-Process Methodology (OPM) ISO 19450:2024 with deep learning to enhance question-answering (QA) quality. By converting natural language text into OPM models using in-context learning, NCAI leverages the expressive power of OPM to represent complex OPM elements-processes, objects, and states-beyond what traditional triplet-based knowledge graphs can easily capture. This rich structured knowledge representation improves reasoning transparency and answer accuracy in an OPM-QA system. We further propose transparency evaluation metrics to quantitatively measure how faithfully the predicted reasoning aligns with OPM-based conceptual logic. Our experiments demonstrate that NCAI outperforms traditional methods, highlighting its potential for advancing neuro-symbolic AI by providing rich knowledge representations, measurable transparency, and improved reasoning.
CRJan 7, 2022
Towards Trustworthy DeFi Oracles: Past,Present and FutureYinjie Zhao, Xin Kang, Tieyan Li et al.
With the rapid development of blockchain technology in recent years, all kinds of blockchain-based applications have emerged. Among them, the decentralized finance (DeFi) is one of the most successful applications, which is regarded as the future of finance. The great success of DeFi relies on the real-world data which is not directly available on the blockchain. Besides, due to the deterministic nature of blockchain,the blockchain cannot directly obtain in-deterministic data from the outside world (off-chain). Thus, oracles have appeared as a viable solution to feed off-chain data to blockchain applications. In this paper, we carryout a comprehensive study on oracles, especially on DeFi oracles. We first briefly introduce the application scenarios of DeFi oracles, and then we talk about the past of DeFi oracles by categorizing them into several types based on their design features. After that, we introduce five popular DeFi oracles currently in use(such as Chainlink and Band Protocol), with the focus on their system architecture, data validation process,and their incentive mechanisms. We compare these present DeFi oracles from their data trustworthiness,data source trustworthiness and their overall trust models. Finally, we propose a set of metrics for designing trustworthiness DeFi oracles, and propose a potential trust architecture and a few promising techniques for building trustworthiness oracles.
SYDec 20, 2021
Optimization for Master-UAV-powered Auxiliary-Aerial-IRS-assisted IoT Networks: An Option-based Multi-agent Hierarchical Deep Reinforcement Learning ApproachJingren Xu, Xin Kang, Ronghaixiang Zhang et al.
This paper investigates a master unmanned aerial vehicle (MUAV)-powered Internet of Things (IoT) network, in which we propose using a rechargeable auxiliary UAV (AUAV) equipped with an intelligent reflecting surface (IRS) to enhance the communication signals from the MUAV and also leverage the MUAV as a recharging power source. Under the proposed model, we investigate the optimal collaboration strategy of these energy-limited UAVs to maximize the accumulated throughput of the IoT network. Depending on whether there is charging between the two UAVs, two optimization problems are formulated. To solve them, two multi-agent deep reinforcement learning (DRL) approaches are proposed, which are centralized training multi-agent deep deterministic policy gradient (CT-MADDPG) and multi-agent deep deterministic policy option critic (MADDPOC). It is shown that the CT-MADDPG can greatly reduce the requirement on the computing capability of the UAV hardware, and the proposed MADDPOC is able to support low-level multi-agent cooperative learning in the continuous action domains, which has great advantages over the existing option-based hierarchical DRL that only support single-agent learning and discrete actions.
CRJun 26, 2021
A Trust-Centric Privacy-Preserving Blockchain for Dynamic Spectrum Management in IoT NetworksJingwei Ye, Xin Kang, Ying-Chang Liang et al.
In this paper, we propose a trust-centric privacy-preserving blockchain for dynamic spectrum access in IoT networks. To be specific, we propose a trust evaluation mechanism to evaluate the trustworthiness of sensing nodes and design a Proof-of-Trust (PoT) consensus mechanism to build a scalable blockchain with high transaction-per-second (TPS). Moreover, a privacy protection scheme is proposed to protect sensors' real-time geolocatioin information when they upload sensing data to the blockchain. Two smart contracts are designed to make the whole procedure (spectrum sensing, spectrum auction, and spectrum allocation) run automatically. Simulation results demonstrate the expected computation cost of the PoT consensus algorithm for reliable sensing nodes is low, and the cooperative sensing performance is improved with the help of trust value evaluation mechanism.In addition, incentivization and security are also analyzed, which show that our design not only can encourage nodes' participation, but also resist to many kinds of attacks which are frequently encountered in trust-based blockchain systems.
CRJun 14, 2021
On the Trust and Trust Modelling for the Future Fully-Connected Digital World: A Comprehensive StudyHannah Lim Jing Ting, Xin Kang, Tieyan Li et al.
With the fast development of digital technologies, we are running into a digital world. The relationship among people and the connections among things become more and more complex, and new challenges arise. To tackle these challenges, trust-a soft security mechanism-is considered as a promising technology. Thus, in this survey, we do a comprehensive study on the trust and trust modelling for the future digital world. We revisit the definitions and properties of trust, and analysis the trust theories and discuss their impact on digital trust modelling. We analyze the digital world and its corresponding environment where people, things, and infrastructure connect with each other. We detail the challenges that require trust in these digital scenarios. Under our analysis of trust and the digital world, we define different types of trust relationships and find out the factors that are needed to ensure a fully representative model. Next, to meet the challenges of digital trust modelling, comprehensive trust model evaluation criteria are proposed, and potential securities and privacy issues of trust modelling are analyzed. Finally, we provide a wide-ranging analysis of different methodologies, mathematical theories, and how they can be applied to trust modelling.