CVSep 6, 2022
Finger Multimodal Feature Fusion and Recognition Based on Channel Spatial AttentionJian Guo, Jiaxiang Tu, Hengyi Ren et al.
Due to the instability and limitations of unimodal biometric systems, multimodal systems have attracted more and more attention from researchers. However, how to exploit the independent and complementary information between different modalities remains a key and challenging problem. In this paper, we propose a multimodal biometric fusion recognition algorithm based on fingerprints and finger veins (Fingerprint Finger Veins-Channel Spatial Attention Fusion Module, FPV-CSAFM). Specifically, for each pair of fingerprint and finger vein images, we first propose a simple and effective Convolutional Neural Network (CNN) to extract features. Then, we build a multimodal feature fusion module (Channel Spatial Attention Fusion Module, CSAFM) to fully fuse the complementary information between fingerprints and finger veins. Different from existing fusion strategies, our fusion method can dynamically adjust the fusion weights according to the importance of different modalities in channel and spatial dimensions, so as to better combine the information between different modalities and improve the overall recognition performance. To evaluate the performance of our method, we conduct a series of experiments on multiple public datasets. Experimental results show that the proposed FPV-CSAFM achieves excellent recognition performance on three multimodal datasets based on fingerprints and finger veins.
86.9CVMay 5Code
Can Multimodal Large Language Models Understand Pathologic Movements? A Pilot Study on Seizure SemiologyLina Zhang, Tonmoy Monsoor, Mehmet Efe Lorasdagi et al.
Multimodal Large Language Models (MLLMs) have demonstrated robust capabilities in recognizing everyday human activities, yet their potential for analyzing clinically significant involuntary movements in neurological disorders remains largely unexplored. This pilot study evaluates the capability of MLLMs for automated recognition of pathological movements in seizure videos. We assessed the zero-shot performance of state-of-the-art MLLMs on 20 ILAE-defined semiological features across 90 clinical seizure recordings. MLLMs outperformed fine-tuned Convolutional Neural Network (CNN) and Vision Transformer (ViT) baseline models on 13 of 18 features without task-specific training, demonstrating particular strength in recognizing salient postural and contextual features while struggling with subtle, high-frequency movements. Feature-targeted signal enhancement (facial cropping, pose estimation, audio denoising) improved performance on 10 of 20 features. Expert evaluation showed that 94.3 percent of MLLM-generated explanations for correctly predicted cases achieved at least 60 percent faithfulness scores, aligning with epileptologist reasoning. These findings demonstrate the potential of adapting general-purpose MLLMs for specialized clinical video analysis through targeted preprocessing strategies, offering a path toward interpretable, efficient diagnostic assistance. Our code is publicly available at https://github.com/LinaZhangUCLA/PathMotionMLLM.
66.0CVMay 21
Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology UnderstandingLina Zhang, Tonmoy Monsoor, Peizheng Li et al.
While Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in general video understanding, their capacity to interpret involuntary, and spatio-temporally evolving pathologic motor behaviors such as seizure semiology remains largely untested. To address this gap, we introduce Seizure-Semiology-Suite, a clinically grounded dataset and benchmark for fine-grained, structured seizure semiology understanding. The dataset includes 438 seizure videos annotated with over 35,000 dense labels covering 20 ILAE-defined semiological features. Building on this dataset, we propose a seven-task hierarchical benchmark that systematically evaluates MLLMs from low-level visual perception to temporal sequencing, narrative report generation, and seizure diagnosis. To enable clinically meaningful evaluation of generated reports, we further introduce the Report Quality Index for Seizure Semiology (Seizure-RQI). Extensive baselines across 11 open-weight MLLMs reveal systematic weaknesses in laterality reasoning, temporal localization, symptom sequencing, and clinically faithful reporting. We show that seizure-specific fine-tuning substantially improves performance across tasks, and that a two-stage neuro-symbolic framework achieves an F1 score of 0.96 on epileptic versus non-epileptic seizure classification. Seizure-Semiology-Suite establishes a rigorous benchmark for evaluating multimodal models in safety-critical medical video understanding and guides the development of clinically reliable, domain-adaptive multimodal intelligence.
16.2ITMar 25
A Measurement-Calibrated AI-Assisted Digital Twin for Terahertz Wireless Data CentersMingjie Zhu, Yejian Lyu, Ziming Yu et al.
Terahertz (THz) wireless communication has emerged as a promising solution for future data center interconnects; however, accurate channel characterization and system-level performance evaluation in complex indoor environments remain challenging. In this work, a measurement-calibrated AI-assisted digital twin (DT) framework is developed for THz wireless data centers by tightly integrating channel measurements, ray-tracing (RT), and implicit neural field (INF) modeling. Specifically, channel measurements are first conducted using a vector network analyzer at 300 GHz under both line-of-sight (LoS) and non-line-of-sight (NLoS) scenarios. RT simulations performed on the Sionna platform capture the dominant multipath structures and show good consistency with measured results. Building upon measurement and RT data, an RT-conditioned INF is developed to construct a continuous radio-frequency (RF) field representation, enabling accurate prediction in RT-missing NLoS regions. The comprehensive RF map generated by DT can provide system-level analysis and decisions for wireless data centers.
LGSep 12, 2024
Tera-SpaceCom: GNN-based Deep Reinforcement Learning for Joint Resource Allocation and Task Offloading in TeraHertz Band Space NetworksZhifeng Hu, Chong Han, Wolfgang Gerstacker et al.
Terahertz (THz) space communications (Tera-SpaceCom) is envisioned as a promising technology to enable various space science and communication applications. Mainly, the realm of Tera-SpaceCom consists of THz sensing for space exploration, data centers in space providing cloud services for space exploration tasks, and a low earth orbit (LEO) mega-constellation relaying these tasks to ground stations (GSs) or data centers via THz links. Moreover, to reduce the computational burden on data centers as well as resource consumption and latency in the relaying process, the LEO mega-constellation provides satellite edge computing (SEC) services to directly compute space exploration tasks without relaying these tasks to data centers. The LEO satellites that receive space exploration tasks offload (i.e., distribute) partial tasks to their neighboring LEO satellites, to further reduce their computational burden. However, efficient joint communication resource allocation and computing task offloading for the Tera-SpaceCom SEC network is an NP-hard mixed-integer nonlinear programming problem (MINLP), due to the discrete nature of space exploration tasks and sub-arrays as well as the continuous nature of transmit power. To tackle this challenge, a graph neural network (GNN)-deep reinforcement learning (DRL)-based joint resource allocation and task offloading (GRANT) algorithm is proposed with the target of long-term resource efficiency (RE). Particularly, GNNs learn relationships among different satellites from their connectivity information. Furthermore, multi-agent and multi-task mechanisms cooperatively train task offloading and resource allocation. Compared with benchmark solutions, GRANT not only achieves the highest RE with relatively low latency, but realizes the fewest trainable parameters and the shortest running time.
LGOct 8, 2023
Deep Reinforcement Learning Based Cross-Layer Design in Terahertz Mesh Backhaul NetworksZhifeng Hu, Chong Han, Xudong Wang
Supporting ultra-high data rates and flexible reconfigurability, Terahertz (THz) mesh networks are attractive for next-generation wireless backhaul systems that empower the integrated access and backhaul (IAB). In THz mesh backhaul networks, the efficient cross-layer routing and long-term resource allocation is yet an open problem due to dynamic traffic demands as well as possible link failures caused by the high directivity and high non-line-of-sight (NLoS) path loss of THz spectrum. In addition, unpredictable data traffic and the mixed integer programming property with the NP-hard nature further challenge the effective routing and long-term resource allocation design. In this paper, a deep reinforcement learning (DRL) based cross-layer design in THz mesh backhaul networks (DEFLECT) is proposed, by considering dynamic traffic demands and possible sudden link failures. In DEFLECT, a heuristic routing metric is first devised to facilitate resource efficiency (RE) enhancement regarding energy and sub-array usages. Furthermore, a DRL based resource allocation algorithm is developed to realize long-term RE maximization and fast recovery from broken links. Specifically in the DRL method, the exploited multi-task structure cooperatively benefits joint power and sub-array allocation. Additionally, the leveraged hierarchical architecture realizes tailored resource allocation for each base station and learned knowledge transfer for fast recovery. Simulation results show that DEFLECT routing consumes less resource, compared to the minimal hop-count metric. Moreover, unlike conventional DRL methods causing packet loss and second-level latency, DEFLECT DRL realizes the long-term RE maximization with no packet loss and millisecond-level latency, and recovers resource-efficient backhaul from broken links within 1s.
CVApr 20, 2024
PAFedFV: Personalized and Asynchronous Federated Learning for Finger Vein RecognitionHengyu Mu, Jian Guo, Chong Han et al.
With the increasing emphasis on user privacy protection, biometric recognition based on federated learning have become the latest research hotspot. However, traditional federated learning methods cannot be directly applied to finger vein recognition, due to heterogeneity of data and open-set verification. Therefore, only a few application cases have been proposed. And these methods still have two drawbacks. (1) Uniform model results in poor performance in some clients, as the finger vein data is highly heterogeneous and non-Independently Identically Distributed (non-IID). (2) On individual client, a large amount of time is underutilized, such as the time to wait for returning model from server. To address those problems, this paper proposes a Personalized and Asynchronous Federated Learning for Finger Vein Recognition (PAFedFV) framework. PAFedFV designs personalized model aggregation method to solve the heterogeneity among non-IID data. Meanwhile, it employs an asynchronized training module for clients to utilize their waiting time. Finally, extensive experiments on six finger vein datasets are conducted. Base on these experiment results, the impact of non-IID finger vein data on performance of federated learning are analyzed, and the superiority of PAFedFV in accuracy and robustness are demonstrated.
LGMay 8, 2025
Graph Neural Network Aided Deep Reinforcement Learning for Resource Allocation in Dynamic Terahertz UAV NetworksZhifeng Hu, Chong Han
Terahertz (THz) unmanned aerial vehicle (UAV) networks with flexible topologies and ultra-high data rates are expected to empower numerous applications in security surveillance, disaster response, and environmental monitoring, among others. However, the dynamic topologies hinder the efficient long-term joint power and antenna array resource allocation for THz links among UAVs. Furthermore, the continuous nature of power and the discrete nature of antennas cause this joint resource allocation problem to be a mixed-integer nonlinear programming (MINLP) problem with non-convexity and NP-hardness. Inspired by recent rapid advancements in deep reinforcement learning (DRL), a graph neural network (GNN) aided DRL algorithm for resource allocation in the dynamic THz UAV network with an emphasis on self-node features (GLOVE) is proposed in this paper, with the aim of resource efficiency (RE) maximization. When training the allocation policy for each UAV, GLOVE learns the relationship between this UAV and its neighboring UAVs via GNN, while also emphasizing the important self-node features of this UAV. In addition, a multi-task structure is leveraged by GLOVE to cooperatively train resource allocation decisions for the power and sub-arrays of all UAVs. Experimental results illustrate that GLOVE outperforms benchmark schemes in terms of the highest RE and the lowest latency. Moreover, unlike the benchmark methods with severe packet loss, GLOVE maintains zero packet loss during the entire training process, demonstrating its better robustness under the highly dynamic THz UAV network.
SPOct 21, 2025
DiffPace: Diffusion-based Plug-and-play Augmented Channel Estimation in mmWave and Terahertz Ultra-Massive MIMO SystemsZhengdong Hu, Chong Han, Wolfgang Gerstacker et al.
Millimeter-wave (mmWave) and Terahertz (THz)-band communications hold great promise in meeting the growing data-rate demands of next-generation wireless networks, offering abundant bandwidth. To mitigate the severe path loss inherent to these high frequencies and reduce hardware costs, ultra-massive multiple-input multiple-output (UM-MIMO) systems with hybrid beamforming architectures can deliver substantial beamforming gains and enhanced spectral efficiency. However, accurate channel estimation (CE) in mmWave and THz UM-MIMO systems is challenging due to high channel dimensionality and compressed observations from a limited number of RF chains, while the hybrid near- and far-field radiation patterns, arising from large array apertures and high carrier frequencies, further complicate CE. Conventional compressive sensing based frameworks rely on predefined sparsifying matrices, which cannot faithfully capture the hybrid near-field and far-field channel structures, leading to degraded estimation performance. This paper introduces DiffPace, a diffusion-based plug-and-play method for channel estimation. DiffPace uses a diffusion model (DM) to capture the channel distribution based on the hybrid spherical and planar-wave (HPSM) model. By applying the plug-and-play approach, it leverages the DM as prior knowledge, improving CE accuracy. Moreover, DM performs inference by solving an ordinary differential equation, minimizing the number of required inference steps compared with stochastic sampling method. Experimental results show that DiffPace achieves competitive CE performance, attaining -15 dB normalized mean square error (NMSE) at a signal-to-noise ratio (SNR) of 10 dB, with 90\% fewer inference steps compared to state-of-the-art schemes, simultaneously providing high estimation precision and enhanced computational efficiency.