LGJul 1, 2024Code
ZeroDDI: A Zero-Shot Drug-Drug Interaction Event Prediction Method with Semantic Enhanced Learning and Dual-Modal Uniform AlignmentZiyan Wang, Zhankun Xiong, Feng Huang et al.
Drug-drug interactions (DDIs) can result in various pharmacological changes, which can be categorized into different classes known as DDI events (DDIEs). In recent years, previously unobserved/unseen DDIEs have been emerging, posing a new classification task when unseen classes have no labelled instances in the training stage, which is formulated as a zero-shot DDIE prediction (ZS-DDIE) task. However, existing computational methods are not directly applicable to ZS-DDIE, which has two primary challenges: obtaining suitable DDIE representations and handling the class imbalance issue. To overcome these challenges, we propose a novel method named ZeroDDI for the ZS-DDIE task. Specifically, we design a biological semantic enhanced DDIE representation learning module, which emphasizes the key biological semantics and distills discriminative molecular substructure-related semantics for DDIE representation learning. Furthermore, we propose a dual-modal uniform alignment strategy to distribute drug pair representations and DDIE semantic representations uniformly in a unit sphere and align the matched ones, which can mitigate the issue of class imbalance. Extensive experiments showed that ZeroDDI surpasses the baselines and indicate that it is a promising tool for detecting unseen DDIEs. Our code has been released in https://github.com/wzy-Sarah/ZeroDDI.
13.9ROMay 24
A Decentralized LiDAR-SLAM System with Certifiably Optimal Pose Graph OptimizationBaoshan Song, Feng Huang, Li-Ta Hsu
Decentralized multi-robot LiDAR-SLAM is essential for collaborative missions but faces significant challenges in maintaining global consistency. Existing frameworks predominantly rely on local-search optimization or one-time coordinate alignment, which are prone to suboptimal convergence and long-term inconsistency, especially in large-scale or degenerate environments. To address these limitations, this paper presents the first decentralized LiDAR-SLAM system that integrates a state-of-the-art certifiably optimal Pose Graph Optimization (PGO) backend. By leveraging the Riemannian Block Coordinate Descent (RBCD) algorithm, our system ensures globally consistent trajectory estimation without requiring accurate initial guesses. Experimental results demonstrate that the proposed framework achieves superior robustness, improving trajectory RMSE by up to 48.9% compared to the state-of-the-art DiSCo-SLAM.
CVMar 1Code
UD-SfPNet: An Underwater Descattering Shape-from-Polarization Network for 3D Normal ReconstructionPuyun Wang, Kaimin Yu, Huayang He et al.
Underwater optical imaging is severely hindered by scattering, but polarization imaging offers the unique dual advantages of descattering and shape-from-polarization (SfP) 3D reconstruction. To exploit these advantages, this paper proposes UD-SfPNet, an underwater descattering shape-from-polarization network that leverages polarization cues for improved 3D surface normal prediction. The framework jointly models polarization-based image descattering and SfP normal estimation in a unified pipeline, avoiding error accumulation from sequential processing and enabling global optimization across both tasks. UD-SfPNet further incorporates a novel color embedding module to enhance geometric consistency by exploiting the relationship between color encodings and surface orientation. A detail enhancement convolution module is also included to better preserve high-frequency geometric details that are lost under scattering. Experiments on the MuS-Polar3D dataset show that the proposed method significantly improves reconstruction accuracy, achieving a mean surface normal angular error of 15.12$^\circ$ (the lowest among compared methods). These results confirm the efficacy of combining descattering with polarization-based shape inference, and highlight the practical significance and potential applications of UD-SfPNet for optical 3D imaging in challenging underwater environments. The code is available at https://github.com/WangPuyun/UD-SfPNet.
CVJan 15
Global Context Compression with Interleaved Vision-Text TransformationDian Jiao, Jiaxin Duan, Shuai Zhao et al.
Recent achievements of vision-language models in end-to-end OCR point to a new avenue for low-loss compression of textual information. This motivates earlier works that render the Transformer's input into images for prefilling, which effectively reduces the number of tokens through visual encoding, thereby alleviating the quadratically increased Attention computations. However, this partial compression fails to save computational or memory costs at token-by-token inference. In this paper, we investigate global context compression, which saves tokens at both prefilling and inference stages. Consequently, we propose VIST2, a novel Transformer that interleaves input text chunks alongside their visual encoding, while depending exclusively on visual tokens in the pre-context to predict the next text token distribution. Around this idea, we render text chunks into sketch images and train VIST2 in multiple stages, starting from curriculum-scheduled pretraining for optical language modeling, followed by modal-interleaved instruction tuning. We conduct extensive experiments using VIST2 families scaled from 0.6B to 8B to explore the training recipe and hyperparameters. With a 4$\times$ compression ratio, the resulting models demonstrate significant superiority over baselines on long writing tasks, achieving, on average, a 3$\times$ speedup in first-token generation, 77% reduction in memory usage, and 74% reduction in FLOPS. Our codes and datasets will be public to support further studies.
86.0OPTICSApr 8
Enhanced Self-Supervised Multi-Image Super-Resolution for Camera Array ImagesYating Chen, Feng Huang, Xianyu Wu et al.
Conventional multi-image super-resolution (MISR) methods, such as burst and video SR, rely on sequential frames from a single camera. Consequently, they suffer from complex image degradation and severe occlusion, increasing the difficulty of accurate image restoration. In contrast, multi-aperture camera-array imaging captures spatially distributed views with sampling offsets forming a stable disk-like distribution, which enhances the non-redundancy of observed data. Existing MISR algorithms fail to fully exploit these unique properties. Supervised MISR methods tend to overfit the degradation patterns in training data, and current self-supervised learning (SSL) techniques struggle to recover fine-grained details. To address these issues, this paper thoroughly investigates the strengths, limitations and applicability boundaries of multi-image-to-single-image (Multi-to-Single) and multi-image-to-multi-image (Multi-to-Multi) SSL methods. We propose the Multi-to-Single-Guided Multi-to-Multi SSL framework that combines the advantages of Multi-to-Single and Multi-to-Multi to generate visually appealing and high-fidelity images rich in texture details. The Multi-to-Single-Guided Multi-to-Multi SSL framework provides a new paradigm for integrating deep neural network with classical physics-based variational methods. To enhance the ability of MISR network to recover high-frequency details from aliased artifacts, this paper proposes a novel camera-array SR network called dual Transformer suitable for SSL. Experiments on synthetic and real-world datasets demonstrate the superiority of the proposed method.
CVAug 4, 2024
Single-Point Supervised High-Resolution Dynamic Network for Infrared Small Target DetectionJing Wu, Rixiang Ni, Feng Huang et al.
Infrared small target detection (IRSTD) tasks are extremely challenging for two main reasons: 1) it is difficult to obtain accurate labelling information that is critical to existing methods, and 2) infrared (IR) small target information is easily lost in deep networks. To address these issues, we propose a single-point supervised high-resolution dynamic network (SSHD-Net). In contrast to existing methods, we achieve state-of-the-art (SOTA) detection performance using only single-point supervision. Specifically, we first design a high-resolution cross-feature extraction module (HCEM), that achieves bi-directional feature interaction through stepped feature cascade channels (SFCC). It balances network depth and feature resolution to maintain deep IR small-target information. Secondly, the effective integration of global and local features is achieved through the dynamic coordinate fusion module (DCFM), which enhances the anti-interference ability in complex backgrounds. In addition, we introduce the high-resolution multilevel residual module (HMRM) to enhance the semantic information extraction capability. Finally, we design the adaptive target localization detection head (ATLDH) to improve detection accuracy. Experiments on the publicly available datasets NUDT-SIRST and IRSTD-1k demonstrate the effectiveness of our method. Compared to other SOTA methods, our method can achieve better detection performance with only a single point of supervision.
CVSep 21, 2025Code
MO R-CNN: Multispectral Oriented R-CNN for Object Detection in Remote Sensing ImageLeiyu Wang, Biao Jin, Feng Huang et al.
Oriented object detection for multi-spectral imagery faces significant challenges due to differences both within and between modalities. Although existing methods have improved detection accuracy through complex network architectures, their high computational complexity and memory consumption severely restrict their performance. Motivated by the success of large kernel convolutions in remote sensing, we propose MO R-CNN, a lightweight framework for multi-spectral oriented detection featuring heterogeneous feature extraction network (HFEN), single modality supervision (SMS), and condition-based multimodal label fusion (CMLF). HFEN leverages inter-modal differences to adaptively align, merge, and enhance multi-modal features. SMS constrains multi-scale features and enables the model to learn from multiple modalities. CMLF fuses multimodal labels based on specific rules, providing the model with a more robust and consistent supervisory signal. Experiments on the DroneVehicle, VEDAI and OGSOD datasets prove the superiority of our method. The source code is available at:https://github.com/Iwill-github/MORCNN.
CVMay 9, 2024Code
Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-ResolutionYunxiang Li, Wenbin Zou, Qiaomu Wei et al.
Stereo image super-resolution utilizes the cross-view complementary information brought by the disparity effect of left and right perspective images to reconstruct higher-quality images. Cascading feature extraction modules and cross-view feature interaction modules to make use of the information from stereo images is the focus of numerous methods. However, this adds a great deal of network parameters and structural redundancy. To facilitate the application of stereo image super-resolution in downstream tasks, we propose an efficient Multi-Level Feature Fusion Network for Lightweight Stereo Image Super-Resolution (MFFSSR). Specifically, MFFSSR utilizes the Hybrid Attention Feature Extraction Block (HAFEB) to extract multi-level intra-view features. Using the channel separation strategy, HAFEB can efficiently interact with the embedded cross-view interaction module. This structural configuration can efficiently mine features inside the view while improving the efficiency of cross-view information sharing. Hence, reconstruct image details and textures more accurately. Abundant experiments demonstrate the effectiveness of MFFSSR. We achieve superior performance with fewer parameters. The source code is available at https://github.com/KarosLYX/MFFSSR.
QMDec 25, 2023
A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide GenerationYongkang Wang, Xuan Liu, Feng Huang et al.
Therapeutic peptides represent a unique class of pharmaceutical agents crucial for the treatment of human diseases. Recently, deep generative models have exhibited remarkable potential for generating therapeutic peptides, but they only utilize sequence or structure information alone, which hinders the performance in generation. In this study, we propose a Multi-Modal Contrastive Diffusion model (MMCD), fusing both sequence and structure modalities in a diffusion framework to co-generate novel peptide sequences and structures. Specifically, MMCD constructs the sequence-modal and structure-modal diffusion models, respectively, and devises a multi-modal contrastive learning strategy with intercontrastive and intra-contrastive in each diffusion timestep, aiming to capture the consistency between two modalities and boost model performance. The inter-contrastive aligns sequences and structures of peptides by maximizing the agreement of their embeddings, while the intra-contrastive differentiates therapeutic and non-therapeutic peptides by maximizing the disagreement of their sequence/structure embeddings simultaneously. The extensive experiments demonstrate that MMCD performs better than other state-of-theart deep generative methods in generating therapeutic peptides across various metrics, including antimicrobial/anticancer score, diversity, and peptide-docking.
34.4CLApr 24
Chinese-SkillSpan: A Span-Level Dataset for ESCO-Aligned Competency Extraction from Chinese Job AdsGuojing Li, Zichuan Fu, Junyi Li et al.
Job Skill Named Entity Recognition (JobSkillNER) aims to automatically extract key skill information from large-scale job posting data, which is important for improving talent-market matching efficiency and supporting personalized employment services. To the best of our knowledge, this work presents the first Chinese JobSkillNER dataset for recruitment texts. We propose annotation guidelines tailored to Chinese job postings and an LLM-empowered Macro-Micro collaborative annotation pipeline. The pipeline leverages the contextual understanding ability of large language models (LLMs) for initial annotation and then refines the results through expert sentence-level adjudication. Using this pipeline, we annotate more than 20,000 instances collected from four major recruitment platforms over the period 2014-2025. Based on these efforts, we release Chinese-SkillSpan, the first Chinese JobSkillNER dataset aligned with the ESCO occupational skill standard across four dimensions: knowledge, skill, transversal competence, and language competence (LSKT). Experimental results show that the dataset supports effective model training and evaluation, indicating that Chinese-SkillSpan helps fill a major gap in Chinese JobSkillNER resources and provides a useful benchmark for intelligent recruitment research. Code and data are available at https://sites.google.com/view/cn-skillspan-resources .
CVDec 10, 2024
RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-ResolutionJiangang Wang, Qingnan Fan, Jinwei Chen et al.
Benefiting from their powerful generative capabilities, pretrained diffusion models have garnered significant attention for real-world image super-resolution (Real-SR). Existing diffusion-based SR approaches typically utilize semantic information from degraded images and restoration prompts to activate prior for producing realistic high-resolution images. However, general-purpose pretrained diffusion models, not designed for restoration tasks, often have suboptimal prior, and manually defined prompts may fail to fully exploit the generated potential. To address these limitations, we introduce RAP-SR, a novel restoration prior enhancement approach in pretrained diffusion models for Real-SR. First, we develop the High-Fidelity Aesthetic Image Dataset (HFAID), curated through a Quality-Driven Aesthetic Image Selection Pipeline (QDAISP). Our dataset not only surpasses existing ones in fidelity but also excels in aesthetic quality. Second, we propose the Restoration Priors Enhancement Framework, which includes Restoration Priors Refinement (RPR) and Restoration-Oriented Prompt Optimization (ROPO) modules. RPR refines the restoration prior using the HFAID, while ROPO optimizes the unique restoration identifier, improving the quality of the resulting images. RAP-SR effectively bridges the gap between general-purpose models and the demands of Real-SR by enhancing restoration prior. Leveraging the plug-and-play nature of RAP-SR, our approach can be seamlessly integrated into existing diffusion-based SR methods, boosting their performance. Extensive experiments demonstrate its broad applicability and state-of-the-art results. Codes and datasets will be available upon acceptance.
CVApr 29, 2024
Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing ImagesWenbin Guan, Zijiu Yang, Xiaohong Wu et al.
Presently, the task of few-shot object detection (FSOD) in remote sensing images (RSIs) has become a focal point of attention. Numerous few-shot detectors, particularly those based on two-stage detectors, face challenges when dealing with the multiscale complexities inherent in RSIs. Moreover, these detectors present impractical characteristics in real-world applications, mainly due to their unwieldy model parameters when handling large amount of data. In contrast, we recognize the advantages of one-stage detectors, including high detection speed and a global receptive field. Consequently, we choose the YOLOv7 one-stage detector as a baseline and subject it to a novel meta-learning training framework. This transformation allows the detector to adeptly address FSOD tasks while capitalizing on its inherent advantage of lightweight. Additionally, we thoroughly investigate the samples generated by the meta-learning strategy and introduce a novel meta-sampling approach to retain samples produced by our designed meta-detection head. Coupled with our devised meta-cross loss, we deliberately utilize "negative samples" that are often overlooked to extract valuable knowledge from them. This approach serves to enhance detection accuracy and efficiently refine the overall meta-learning strategy. To validate the effectiveness of our proposed detector, we conducted performance comparisons with current state-of-the-art detectors using the DIOR and NWPU VHR-10.v2 datasets, yielding satisfactory results.
CVMar 10, 2025
Text-IRSTD: Leveraging Semantic Text to Promote Infrared Small Target Detection in Complex ScenesFeng Huang, Shuyuan Zheng, Zhaobing Qiu et al.
Infrared small target detection is currently a hot and challenging task in computer vision. Existing methods usually focus on mining visual features of targets, which struggles to cope with complex and diverse detection scenarios. The main reason is that infrared small targets have limited image information on their own, thus relying only on visual features fails to discriminate targets and interferences, leading to lower detection performance. To address this issue, we introduce a novel approach leveraging semantic text to guide infrared small target detection, called Text-IRSTD. It innovatively expands classical IRSTD to text-guided IRSTD, providing a new research idea. On the one hand, we devise a novel fuzzy semantic text prompt to accommodate ambiguous target categories. On the other hand, we propose a progressive cross-modal semantic interaction decoder (PCSID) to facilitate information fusion between texts and images. In addition, we construct a new benchmark consisting of 2,755 infrared images of different scenarios with fuzzy semantic textual annotations, called FZDT. Extensive experimental results demonstrate that our method achieves better detection performance and target contour recovery than the state-of-the-art methods. Moreover, proposed Text-IRSTD shows strong generalization and wide application prospects in unseen detection scenarios. The dataset and code will be publicly released after acceptance of this paper.
LGJun 27, 2024
Heterogeneous Causal Metapath Graph Neural Network for Gene-Microbe-Disease Association PredictionKexin Zhang, Feng Huang, Luotao Liu et al.
The recent focus on microbes in human medicine highlights their potential role in the genetic framework of diseases. To decode the complex interactions among genes, microbes, and diseases, computational predictions of gene-microbe-disease (GMD) associations are crucial. Existing methods primarily address gene-disease and microbe-disease associations, but the more intricate triple-wise GMD associations remain less explored. In this paper, we propose a Heterogeneous Causal Metapath Graph Neural Network (HCMGNN) to predict GMD associations. HCMGNN constructs a heterogeneous graph linking genes, microbes, and diseases through their pairwise associations, and utilizes six predefined causal metapaths to extract directed causal subgraphs, which facilitate the multi-view analysis of causal relations among three entity types. Within each subgraph, we employ a causal semantic sharing message passing network for node representation learning, coupled with an attentive fusion method to integrate these representations for predicting GMD associations. Our extensive experiments show that HCMGNN effectively predicts GMD associations and addresses association sparsity issue by enhancing the graph's semantics and structure.
CVJun 17, 2024
Video Frame Interpolation for Polarization via Swin-TransformerFeng Huang, Xin Zhang, Yixuan Xu et al.
Video Frame Interpolation (VFI) has been extensively explored and demonstrated, yet its application to polarization remains largely unexplored. Due to the selective transmission of light by polarized filters, longer exposure times are typically required to ensure sufficient light intensity, which consequently lower the temporal sample rates. Furthermore, because polarization reflected by objects varies with shooting perspective, focusing solely on estimating pixel displacement is insufficient to accurately reconstruct the intermediate polarization. To tackle these challenges, this study proposes a multi-stage and multi-scale network called Swin-VFI based on the Swin-Transformer and introduces a tailored loss function to facilitate the network's understanding of polarization changes. To ensure the practicality of our proposed method, this study evaluates its interpolated frames in Shape from Polarization (SfP) and Human Shape Reconstruction tasks, comparing them with other state-of-the-art methods such as CAIN, FLAVR, and VFIT. Experimental results demonstrate our approach's superior reconstruction accuracy across all tasks.
LGMay 6, 2024
Coefficient Decomposition for Spectral Graph ConvolutionFeng Huang, Wen Zhang
Spectral graph convolutional network (SGCN) is a kind of graph neural networks (GNN) based on graph signal filters, and has shown compelling expressivity for modeling graph-structured data. Most SGCNs adopt polynomial filters and learn the coefficients from the training data. Many of them focus on which polynomial basis leads to optimal expressive power and models' architecture is little discussed. In this paper, we propose a general form in terms of spectral graph convolution, where the coefficients of polynomial basis are stored in a third-order tensor. Then, we show that the convolution block in existing SGCNs can be derived by performing a certain coefficient decomposition operation on the coefficient tensor. Based on the generalized view, we develop novel spectral graph convolutions CoDeSGC-CP and -Tucker by tensor decomposition CP and Tucker on the coefficient tensor. Extensive experimental results demonstrate that the proposed convolutions achieve favorable performance improvements.
CLJan 24, 2024
Fine-grained Stateful Knowledge Exploration: Effective and Efficient Graph Retrieval with Large Language ModelsDehao Tao, Congqi Wang, Feng Huang et al.
Large Language Models (LLMs) have shown impressive capabilities, yet updating their knowledge remains a significant challenge, often leading to outdated or inaccurate responses. A proposed solution is the integration of external knowledge bases, such as knowledge graphs, with LLMs. Most existing methods use a paradigm that treats the whole question as the objective, with relevant knowledge being incrementally retrieved from the knowledge graph. However, this paradigm often leads to a granularity mismatch between the target question and the retrieved entities and relations. As a result, the information in the question cannot precisely correspond to the retrieved knowledge. This may cause redundant exploration or omission of vital knowledge, thereby leading to enhanced computational consumption and reduced retrieval accuracy. To address the limitations of coarse-grained knowledge exploration, we propose FiSKE, a novel paradigm for Fine-grained Stateful Knowledge Exploration. FiSKE first decomposes questions into fine-grained clues, then employs an adaptive mapping strategy during knowledge exploration process to resolve ambiguity in clue-to-graph mappings. This strategy dynamically infers contextual correspondences while maintaining a stateful record of the mappings. A clue-driven termination mechanism ensures rigorous augmentation--leveraging fully mapped paths for LLMs while reverting to chain-of-thought reasoning when necessary. Our approach balances precision and efficiency. Experiments on multiple datasets revealed that our paradigm surpasses current advanced methods in knowledge retrieval while significantly reducing the average number of LLM invocations.
LGDec 16, 2021
HampDTI: a heterogeneous graph automatic meta-path learning method for drug-target interaction predictionHongzhun Wang, Feng Huang, Wen Zhang
Motivation: Identifying drug-target interactions (DTIs) is a key step in drug repositioning. In recent years, the accumulation of a large number of genomics and pharmacology data has formed mass drug and target related heterogeneous networks (HNs), which provides new opportunities of developing HN-based computational models to accurately predict DTIs. The HN implies lots of useful information about DTIs but also contains irrelevant data, and how to make the best of heterogeneous networks remains a challenge. Results: In this paper, we propose a heterogeneous graph automatic meta-path learning based DTI prediction method (HampDTI). HampDTI automatically learns the important meta-paths between drugs and targets from the HN, and generates meta-path graphs. For each meta-path graph, the features learned from drug molecule graphs and target protein sequences serve as the node attributes, and then a node-type specific graph convolutional network (NSGCN) which efficiently considers node type information (drugs or targets) is designed to learn embeddings of drugs and targets. Finally, the embeddings from multiple meta-path graphs are combined to predict novel DTIs. The experiments on benchmark datasets show that our proposed HampDTI achieves superior performance compared with state-of-the-art DTI prediction methods. More importantly, HampDTI identifies the important meta-paths for DTI prediction, which could explain how drugs connect with targets in HNs.
CRNov 15, 2021
Authentication of optical physical unclonable functions based on single-pixel detectionPidong Wang, Feiliang Chen, Dong Li et al.
Physical unclonable function (PUF) has been proposed as a promising and trustworthy solution to a variety of cryptographic applications. Here we propose a non-imaging based authentication scheme for optical PUFs materialized by random scattering media, in which the characteristic fingerprints of optical PUFs are extracted from stochastical fluctuations of the scattered light intensity with respect to laser challenges which are detected by a single-pixel detector. The randomness, uniqueness, unpredictability, and robustness of the extracted fingerprints are validated to be qualified for real authentication applications. By increasing the key length and improving the signal to noise ratio, the false accept rate of a fake PUF can be dramatically lowered to the order of 10^-28. In comparison to the conventional laser-speckle-imaging based authentication with unique identity information obtained from textures of laser speckle patterns, this non-imaging scheme can be implemented at small speckle size bellowing the Nyquist--Shannon sampling criterion of the commonly used CCD or CMOS cameras, offering benefits in system miniaturization and immunity against reverse engineering attacks simultaneously.
CRSep 8, 2021
Bionic Optical Physical Unclonable Functions for Authentication and EncryptionYongbiao Wan, Pidong Wang, Feng Huang et al.
Information security is of great importance for modern society with all things connected. Physical unclonable function (PUF) as a promising hardware primitive has been intensively studied for information security. However, the widely investigated silicon PUF with low entropy is vulnerable to various attacks. Herein, we introduce a concept of bionic optical PUFs inspired from unique biological architectures, and fabricate four types of bionic PUFs by molding the surface micro-nano structures of natural plant tissues with a simple, low-cost, green and environmentally friendly manufacturing process. The laser speckle responses of all bionic PUFs are statistically demonstrated to be random, unique, unpredictable and robust enough for cryptographic applications, indicating the broad applicability of bionic PUFs. On this ground, the feasibility of implementing bionic PUFs as cryptographic primitives in entity authentication and encrypted communication is experimentally validated, which shows its promising potential in the application of future information security.
CRSep 8, 2021
Fast random number generator based on optical physical unclonable functionsKun Chen, Feng Huang, Pidong Wang et al.
We propose an approach for fast random number generation based on homemade optical physical unclonable functions (PUFs). The optical PUF is illuminated with input laser wavefront of continuous modulation to obtain different speckle patterns. Random numbers are fully extracted from speckle patterns through a simple post-processing algorithm. Our proof-of-principle experiment achieves total random number generation rate of 0.96 Gbit/s with verified randomness, which is far faster than previous optical-PUF-based schemes. Our results demonstrate that the presented random number generator (RNG) proposal has great potential to achieve ultrafast random number generation rate up to several hundreds of Gbit/s.
OCMay 16, 2021
Robust optimal policies for team Markov gamesFeng Huang, Ming Cao, Long Wang
In stochastic dynamic environments, team Markov games have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multi-agent systems. However, the optimality of the derived policies is usually sensitive to model parameters, which are typically unknown and required to be estimated from noisy data in practice. To mitigate the sensitivity of optimal policies to these uncertain parameters, we propose a robust model of team Markov games in this paper, where agents utilize robust optimization approaches to update strategies. This model extends team Markov games to the scenario of incomplete information and meanwhile provides an alternative solution concept of robust team optimality. To seek such a solution, we develop a robust iterative learning algorithm of team policies and prove its convergence. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate the curse of dimensionality. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by generalizing the game model of sequential social dilemmas to uncertain scenarios.
ROApr 12, 2021
Point wise or Feature wise? Benchmark Comparison of Public Available LiDAR Odometry Algorithms in Urban CanyonsFeng Huang, Weisong Wen, Jiachen Zhang et al.
Robust and precise localization is essential for the autonomous system with navigation requirements. Light detection and ranging (LiDAR) odometry is extensively studied in the past decades to achieve this goal. Satisfactory accuracy can be achieved in scenarios with abundant environmental features using existing LiDAR odometry (LO) algorithms. Unfortunately, the performance of the LiDAR odometry is significantly degraded in urban canyons with numerous dynamic objects and complex environmental structures. Meanwhile, it is still not clear from the existing literature which LO algorithms perform well in such challenging environments. To fill this gap, this paper evaluates an array of popular and extensively studied LO pipelines using the datasets collected in urban canyons of Hong Kong. We present the results in terms of their positioning accuracy and computational efficiency. Three major factors dominating the performance of LO in urban canyons are concluded, including the ego-vehicle dynamic, moving objects, and degree of urbanization. According to our experiment results, point-wise achieves better accuracy in urban canyons while feature-wise achieves cost-efficiency and satisfactory positioning accuracy.
CLJun 2, 2020
Event Arguments Extraction via Dilate Gated Convolutional Neural Network with Enhanced Local FeaturesZhigang Kan, Linbo Qiao, Sen Yang et al.
Event Extraction plays an important role in information-extraction to understand the world. Event extraction could be split into two subtasks: one is event trigger extraction, the other is event arguments extraction. However, the F-Score of event arguments extraction is much lower than that of event trigger extraction, i.e. in the most recent work, event trigger extraction achieves 80.7%, while event arguments extraction achieves only 58%. In pipelined structures, the difficulty of event arguments extraction lies in its lack of classification feature, and the much higher computation consumption. In this work, we proposed a novel Event Extraction approach based on multi-layer Dilate Gated Convolutional Neural Network (EE-DGCNN) which has fewer parameters. In addition, enhanced local information is incorporated into word features, to assign event arguments roles for triggers predicted by the first subtask. The numerical experiments demonstrated significant performance improvement beyond state-of-art event extraction approaches on real-world datasets. Further analysis of extraction procedure is presented, as well as experiments are conducted to analyze impact factors related to the performance improvement.
LGNov 13, 2019
Tensor Decomposition with Relational Constraints for Predicting Multiple Types of MicroRNA-disease AssociationsFeng Huang, Xiang Yue, Zhankun Xiong et al.
MicroRNAs (miRNAs) play crucial roles in multifarious biological processes associated with human diseases. Identifying potential miRNA-disease associations contributes to understanding the molecular mechanisms of miRNA-related diseases. Most of the existing computational methods mainly focus on predicting whether a miRNA-disease association exists or not. However, the roles of miRNAs in diseases are prominently diverged, for instance, Genetic variants of microRNA (mir-15) may affect expression level of miRNAs leading to B cell chronic lymphocytic leukemia, while circulating miRNAs (including mir-1246, mir-1307-3p, etc.) have potentials to detecting breast cancer in the early stage. In this paper, we aim to predict multi-type miRNA-disease associations instead of taking them as binary. To this end, we innovatively represent miRNA-disease-type triplets as a tensor and introduce Tensor Decomposition methods to solve the prediction task. Experimental results on two widely-adopted miRNA-disease datasets: HMDD v2.0 and HMDD v3.2 show that tensor decomposition methods improve a recent baseline in a large scale (up to 38% in top-1 F1). We further propose a novel method, Tensor Decomposition with Relational Constraints (TDRC), which incorporates biological features as relational constraints to further the existing tensor decomposition methods. Compared with two existing tensor decomposition methods, TDRC can produce better performance while being more efficient.
IVSep 24, 2019
pISTA-SENSE-ResNet for Parallel MRI ReconstructionTieyuan Lu, Xinlin Zhang, Yihui Huang et al.
Magnetic resonance imaging has been widely applied in clinical diagnosis, however, is limited by its long data acquisition time. Although imaging can be accelerated by sparse sampling and parallel imaging, achieving promising reconstruction images with a fast reconstruction speed remains a challenge. Recently, deep learning approaches have attracted a lot of attention for its encouraging reconstruction results but without a proper interpretability. In this letter, to enable high-quality image reconstruction for the parallel magnetic resonance imaging, we design the network structure from the perspective of sparse iterative reconstruction and enhance it with the residual structure. The experimental results of a public knee dataset show that compared with the optimization-based method and the latest deep learning parallel imaging methods, the proposed network has less error in reconstruction and is more stable under different acceleration factors.
IVSep 17, 2019
A Guaranteed Convergence Analysis for the Projected Fast Iterative Soft-Thresholding Algorithm in Parallel MRIXinlin Zhang, Hengfa Lu, Di Guo et al.
The boom of non-uniform sampling and compressed sensing techniques dramatically alleviates the lengthy data acquisition problem of magnetic resonance imaging. Sparse reconstruction, thanks to its fast computation and promising performance, has attracted researchers to put numerous efforts on it and has been adopted in commercial scanners. To perform sparse reconstruction, choosing a proper algorithm is essential in providing satisfying results and saving time in tuning parameters. The pFISTA, a simple and efficient algorithm for sparse reconstruction, has been successfully extended to parallel imaging. However, its convergence criterion is still an open question. And the existing convergence criterion of single-coil pFISTA cannot be applied to the parallel imaging pFISTA, which, therefore, imposes confusions and difficulties on users about determining the only parameter - step size. In this work, we provide the guaranteed convergence analysis of the parallel imaging version pFISTA to solve the two well-known parallel imaging reconstruction models, SENSE and SPIRiT. Along with the convergence analysis, we provide recommended step size values for SENSE and SPIRiT reconstructions to obtain fast and promising reconstructions. Experiments on in vivo brain images demonstrate the validity of the convergence criterion. Besides, experimental results show that compared to using backtracking and power iteration to determine the step size, our recommended step size achieves more than five times acceleration in reconstruction time in most tested cases.
SDApr 23, 2019
Harmonic-aligned Frame Mask Based on Non-stationary Gabor Transform with Application to Content-dependent Speaker ComparisonFeng Huang, Peter Balazs
We propose harmonic-aligned frame mask for speech signals using non-stationary Gabor transform (NSGT). A frame mask operates on the transfer coefficients of a signal and consequently converts the signal into a counterpart signal. It depicts the difference between the two signals. In preceding studies, frame masks based on regular Gabor transform were applied to single-note instrumental sound analysis. This study extends the frame mask approach to speech signals. For voiced speech, the fundamental frequency is usually changing consecutively over time. We employ NSGT with pitch-dependent and therefore time-varying frequency resolution to attain harmonic alignment in the transform domain and hence yield harmonic-aligned frame masks for speech signals. We propose to apply the harmonic-aligned frame mask to content-dependent speaker comparison. Frame masks, computed from voiced signals of a same vowel but from different speakers, were utilized as similarity measures to compare and distinguish the speaker identities (SID). Results obtained with deep neural networks demonstrate that the proposed frame mask is valid in representing speaker characteristics and shows a potential for SID applications in limited data scenarios.
CVAug 8, 2015
A straightforward method to assess motion blur for different types of displaysFuhao Chen, Jun Chen, Feng Huang
A simulation method based on the liquid crystal response and the human visual system is suitable to characterize motion blur for LCDs but not other display types. We propose a more straightforward and widely applicable method to quantify motion blur based on the width of the moving object. We thus compare various types of displays objectively. A perceptual experiment was conducted to validate the proposed method. We test varying motion velocities for nine commercial displays. We compare the three motion blur evaluation methods (simulation, human perception, and our method) using z-scores. Our comparisons indicate that our method accurately characterizes motion blur for various display types.