LGMay 26
TED: Related Party Transaction guided Tax Evasion Detection on Heterogeneous GraphYiming Xu, Bin Shi, Bo Dong et al.
Tax evasion causes severe losses of government revenues and disturbs the economic order of fair competition. To help alleviate this problem, the latest tax evasion detection solutions utilize expert knowledge to extract features and then train classifiers to determine whether a company is suspected of tax evasion. However, existing solutions mainly focus on the statistical features of the company, but fail to exploit the rich interactive information in tax scenarios, which affect the detection performance. In this paper, we first model the tax scenario as a heterogeneous graph and study the tax evasion detection problem under the heterogeneous graph model. To improve the performance of tax evasion detection, a novel graph neural network model is proposed to extract the comprehensive information of heterogeneous graphs. Specifically, we use heterogeneous and complex related party transaction groups to filter low-level noise information. Moreover, a hierarchical attention mechanism is designed to capture the deeper structure and semantic information hidden in the related party transaction group. We apply our method to the real risk management system of the tax bureau, and evaluate it on two human-labeled real-world tax datasets. The results demonstrate that our method significantly outperforms the state-of-the-art in the tax evasion detection task.
ITMay 14
Digital Twin Synchronization Over Mobile Embodied AI Network With Agentic IntelligenceZhouxiang Zhao, Jiaxiang Wang, Yahao Ding et al.
Efficient digital twin (DT) synchronization relies on maintaining high-fidelity virtual representations with minimal age of information (AoI). However, the synergistic potential of cooperative sensing and autonomous mobility of the sensing agent remains underexplored in existing DT synchronization frameworks. In this paper, we propose an agentic AI-empowered mobile embodied AI network (MEAN) framework for DT synchronization. In the proposed hybrid architecture, the base station (BS) conducts global orchestration, while the agents autonomously execute a five-stage closed-loop workflow: move-to-sense, cooperative sensing, onboard semantic processing, channel-aware mobility, and uplink transmission. To optimize synchronization performance, we formulate a joint topology dispatching and multidimensional resource allocation problem aimed at minimizing the maximum twin deviation across regions, subject to heterogeneous sensing fidelity and energy budget constraints. To tackle this, we develop a hierarchical two-layer optimization algorithm, where the outer-layer refines multi-agent assignment via a dynamic matching game, and the inner-layer iteratively optimizes the continuous resources. Extensive simulation results verify the convergence of the proposed algorithm and demonstrate its substantial superiority over multiple baseline schemes in reducing synchronization deviation. Furthermore, the results reveal that semantic compression serves as a vital substitute for channel resources in latency reduction under constrained bandwidth, while autonomous velocity adaptation provides an essential degree of freedom for the system to navigate the fundamental energy-time trade-off.
CVMar 14
Sparse-Dense Mixture of Experts Adapter for Multi-Modal TrackingYabin Zhu, Jianqi Li, Chenglong Li et al.
Parameter-efficient fine-tuning (PEFT) techniques, such as prompts and adapters, are widely used in multi-modal tracking because they alleviate issues of full-model fine-tuning, including time inefficiency, high resource consumption, parameter storage burden, and catastrophic forgetting. However, due to cross-modal heterogeneity, most existing PEFT-based methods struggle to effectively represent multi-modal features within a unified framework with shared parameters. To address this problem, we propose a novel Sparse-Dense Mixture of Experts Adapter (SDMoEA) framework for PEFT-based multi-modal tracking under a unified model structure. Specifically, we design an SDMoE module as the multi-modal adapter to model modality-specific and shared information efficiently. SDMoE consists of a sparse MoE and a dense-shared MoE: the former captures modality-specific information, while the latter models shared cross-modal information. Furthermore, to overcome limitations of existing tracking methods in modeling high-order correlations during multi-level multi-modal fusion, we introduce a Gram-based Semantic Alignment Hypergraph Fusion (GSAHF) module. It first employs Gram matrices for cross-modal semantic alignment, ensuring that the constructed hypergraph accurately reflects semantic similarity and high-order dependencies between modalities. The aligned features are then integrated into the hypergraph structure to exploit its ability to model high-order relationships, enabling deep fusion of multi-level multi-modal information. Extensive experiments demonstrate that the proposed method achieves superior performance compared with other PEFT approaches on several multi-modal tracking benchmarks, including LasHeR, RGBT234, VTUAV, VisEvent, COESOT, DepthTrack, and VOT-RGBD2022.
CVDec 9, 2020Code
DS-Net: Dynamic Spatiotemporal Network for Video Salient Object DetectionJing Liu, Jiaxiang Wang, Weikang Wang et al.
As moving objects always draw more attention of human eyes, the temporal motive information is always exploited complementarily with spatial information to detect salient objects in videos. Although efficient tools such as optical flow have been proposed to extract temporal motive information, it often encounters difficulties when used for saliency detection due to the movement of camera or the partial movement of salient objects. In this paper, we investigate the complimentary roles of spatial and temporal information and propose a novel dynamic spatiotemporal network (DS-Net) for more effective fusion of spatiotemporal information. We construct a symmetric two-bypass network to explicitly extract spatial and temporal features. A dynamic weight generator (DWG) is designed to automatically learn the reliability of corresponding saliency branch. And a top-down cross attentive aggregation (CAA) procedure is designed so as to facilitate dynamic complementary aggregation of spatiotemporal features. Finally, the features are modified by spatial attention with the guidance of coarse saliency map and then go through decoder part for final saliency map. Experimental results on five benchmarks VOS, DAVIS, FBMS, SegTrack-v2, and ViSal demonstrate that the proposed method achieves superior performance than state-of-the-art algorithms. The source code is available at https://github.com/TJUMMG/DS-Net.
NIApr 1
Agentic AI-Empowered Wireless Agent Networks With Semantic-Aware Collaboration via ILACZhouxiang Zhao, Jiaxiang Wang, Zhaohui Yang et al.
The rapid development of agentic artificial intelligence (AI) is driving future wireless networks to evolve from passive data pipes into intelligent collaborative ecosystems under the emerging paradigm of integrated learning and communication (ILAC). However, realizing efficient agentic collaboration faces challenges not only in handling semantic redundancy but also in the lack of an integrated mechanism for communication, computation, and control. To address this, we propose a wireless agent network (WAN) framework that orchestrates a progressive knowledge aggregation mechanism. Specifically, we formulate the aggregation process as a joint energy minimization problem where the agents perform semantic compression to eliminate redundancy, optimize transmission power to deliver semantic payloads, and adjust physical trajectories to proactively enhance channel qualities. To solve this problem, we develop a hierarchical algorithm that integrates inner-level resource optimization with outer-level topology evolution. Theoretically, we reveal that incorporating a potential field into the topology evolution effectively overcomes the short-sightedness of greedy matching, providing a mathematically rigorous heuristic for long-term energy minimization. Simulation results demonstrate that the proposed framework achieves superior energy efficiency and scalability compared to conventional benchmarks, validating the efficacy of semantic-aware collaboration in dynamic environments.
ITApr 7
Near-Field Integrated Sensing, Computing and Semantic Communication in Digital Twin-Assisted Vehicular NetworksYinchao Yang, Yahao Ding, Jiaxiang Wang et al.
Digital twin (DT) technology offers transformative potential for vehicular networks, enabling high-fidelity virtual representations for enhanced safety and automation. However, seamless DT synchronization in dynamic environments faces challenges such as massive data transmission, precision sensing, and strict computational constraints. This paper proposes an integrated sensing, computing, and semantic communication (ISCSC) framework tailored for DT-assisted vehicular networks in the near-field (NF) regime. Leveraging a multi-user multiple-input multiple-output (MU-MIMO) configuration, each roadside unit (RSU) employs semantic communication to serve vehicles while simultaneously utilizing millimeter-wave (mmWave) radar for environmental mapping. We implement particle filtering at RSUs to achieve high-precision vehicle tracking. To optimize performance, we formulate a joint optimization problem balancing semantic communication rates and sensing accuracy under limited computational resources and power budget. Our solution includes a hybrid heuristic algorithm for vehicle-to-RSU assignment and an alternating optimization approach for determining semantic extraction ratios and beamforming matrices. Performance is extensively evaluated via the Cramér-Rao bound (CRB) for angle and distance estimation, semantic transmission rates, and resource utilization. Numerical results demonstrate that the proposed ISCSC framework achieves a 20% improvement in transmission rate while maintaining the sensing accuracy of existing integrated sensing and communication (ISAC) schemes under constrained resource conditions.
CVFeb 16, 2025
Exploiting Point-Language Models with Dual-Prompts for 3D Anomaly DetectionJiaxiang Wang, Haote Xu, Xiaolu Chen et al.
Anomaly detection (AD) in 3D point clouds is crucial in a wide range of industrial applications, especially in various forms of precision manufacturing. Considering the industrial demand for reliable 3D AD, several methods have been developed. However, most of these approaches typically require training separate models for each category, which is memory-intensive and lacks flexibility. In this paper, we propose a novel Point-Language model with dual-prompts for 3D ANomaly dEtection (PLANE). The approach leverages multi-modal prompts to extend the strong generalization capabilities of pre-trained Point-Language Models (PLMs) to the domain of 3D point cloud AD, achieving impressive detection performance across multiple categories using a single model. Specifically, we propose a dual-prompt learning method, incorporating both text and point cloud prompts. The method utilizes a dynamic prompt creator module (DPCM) to produce sample-specific dynamic prompts, which are then integrated with class-specific static prompts for each modality, effectively driving the PLMs. Additionally, based on the characteristics of point cloud data, we propose a pseudo 3D anomaly generation method (Ano3D) to improve the model's detection capabilities in an unsupervised setting. Experimental results demonstrate that the proposed method, which is under the multi-class-one-model paradigm, achieves a +8.7%/+17% gain on anomaly detection and localization performance as compared to the state-of-the-art one-class-one-model methods for the Anomaly-ShapeNet dataset, and obtains +4.3%/+4.1% gain for the Real3D-AD dataset. Code will be available upon publication.
CVFeb 1
MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy OptimizationHaitao Zhang, Yingying Wang, Jiaxiang Wang et al.
Medical Anomaly Detection (MedAD) presents a significant opportunity to enhance diagnostic accuracy using Large Multimodal Models (LMMs) to interpret and answer questions based on medical images. However, the reliance on Supervised Fine-Tuning (SFT) on simplistic and fragmented datasets has hindered the development of models capable of plausible reasoning and robust multimodal generalization. To overcome this, we introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs. On this foundation, we propose a two-stage training framework. The first stage, Cognitive Injection, uses SFT to instill foundational medical knowledge and align the model with a structured think-then-answer paradigm. Given that standard policy optimization can produce reasoning that is disconnected from the final answer, the second stage incorporates Consistency Group Relative Policy Optimization (Con-GRPO). This novel algorithm incorporates a crucial consistency reward to ensure the generated reasoning process is relevant and logically coherent with the final diagnosis. Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10\%. This superior performance stems from its ability to generate transparent and logically consistent reasoning pathways, offering a promising approach to enhancing the trustworthiness and interpretability of AI for clinical decision support.
ROAug 12, 2020
Automatic assembly of aero engine low pressure turbine shaft based on 3D vision measurementJiaxiang Wang, Kunyong Chen
In order to solve the problem of low automation of Aero-engine Turbine shaft assembly and the difficulty of non-contact high-precision measurement, a structured light binocular measurement technology for key components of aero-engine is proposed in this paper. Combined with three-dimensional point cloud data processing and assembly position matching algorithm, the high-precision measurement of shaft hole assembly posture in the process of turbine shaft docking is realized. Firstly, the screw thread curve on the bolt surface is segmented based on PCA projection and edge point cloud clustering, and Hough transform is used to model fit the three-dimensional thread curve. Then the preprocessed two-dimensional convex hull is constructed to segment the key hole location features, and the mounting surface and hole location obtained by segmentation are fitted based on RANSAC method. Finally, the geometric feature matching is used the evaluation index of turbine shaft assembly is established to optimize the pose. The final measurement accuracy of mounting surface matching is less than 0.05mm, and the measurement accuracy of mounting hole matching based on minimum ance optimization is less than 0.1 degree. The measurement algorithm is implemented on the automatic assembly test-bed of a certain type of aero-engine low-pressure turbine rotor. In the narrow installation space, the assembly process of the turbine shaft assembly, such as the automatic alignment and docking of the shaft hole, the automatic heating and temperature measurement of the installation seam, and the automatic tightening of the two guns, are realized in the narrow installation space Guidance, real-time inspection and assembly result evaluation.