Hongyang Chen

LG
h-index20
53papers
947citations
Novelty49%
AI Score58

53 Papers

AINov 21, 2022
Intelligent Computing: The Latest Advances, Challenges and Future

Shiqiang Zhu, Ting Yu, Tao Xu et al.

Computing is a critical driving force in the development of human civilization. In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications. Intelligent computing has greatly broadened the scope of computing, extending it from traditional computing on data to increasingly diverse computing paradigms such as perceptual intelligence, cognitive intelligence, autonomous intelligence, and human-computer fusion intelligence. Intelligence and computing have undergone paths of different evolution and development for a long time but have become increasingly intertwined in recent years: intelligent computing is not only intelligence-oriented but also intelligence-driven. Such cross-fertilization has prompted the emergence and rapid advancement of intelligent computing. Intelligent computing is still in its infancy and an abundance of innovations in the theories, systems, and applications of intelligent computing are expected to occur soon. We present the first comprehensive survey of literature on intelligent computing, covering its theory fundamentals, the technological fusion of intelligence and computing, important applications, challenges, and future perspectives. We believe that this survey is highly timely and will provide a comprehensive reference and cast valuable insights into intelligent computing for academic and industrial researchers and practitioners.

LGOct 29, 2023
Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors

Han Liu, Xingshuo Huang, Xiaotong Zhang et al. · apple-ml

Decision-based methods have shown to be effective in black-box adversarial attacks, as they can obtain satisfactory performance and only require to access the final model prediction. Gradient estimation is a critical step in black-box adversarial attacks, as it will directly affect the query efficiency. Recent works have attempted to utilize gradient priors to facilitate score-based methods to obtain better results. However, these gradient priors still suffer from the edge gradient discrepancy issue and the successive iteration gradient direction issue, thus are difficult to simply extend to decision-based methods. In this paper, we propose a novel Decision-based Black-box Attack framework with Gradient Priors (DBA-GP), which seamlessly integrates the data-dependent gradient prior and time-dependent prior into the gradient estimation procedure. First, by leveraging the joint bilateral filter to deal with each random perturbation, DBA-GP can guarantee that the generated perturbations in edge locations are hardly smoothed, i.e., alleviating the edge gradient discrepancy, thus remaining the characteristics of the original image as much as possible. Second, by utilizing a new gradient updating strategy to automatically adjust the successive iteration gradient direction, DBA-GP can accelerate the convergence speed, thus improving the query efficiency. Extensive experiments have demonstrated that the proposed method outperforms other strong baselines significantly.

CLJun 20, 2023
Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling

Linyao Yang, Hongyang Chen, Zhao Li et al.

Recently, ChatGPT, a representative large language model (LLM), has gained considerable attention due to its powerful emergent abilities. Some researchers suggest that LLMs could potentially replace structured knowledge bases like knowledge graphs (KGs) and function as parameterized knowledge bases. However, while LLMs are proficient at learning probabilistic language patterns based on large corpus and engaging in conversations with humans, they, like previous smaller pre-trained language models (PLMs), still have difficulty in recalling facts while generating knowledge-grounded contents. To overcome these limitations, researchers have proposed enhancing data-driven PLMs with knowledge-based KGs to incorporate explicit factual knowledge into PLMs, thus improving their performance to generate texts requiring factual knowledge and providing more informed responses to user queries. This paper reviews the studies on enhancing PLMs with KGs, detailing existing knowledge graph enhanced pre-trained language models (KGPLMs) as well as their applications. Inspired by existing studies on KGPLM, this paper proposes to enhance LLMs with KGs by developing knowledge graph-enhanced large language models (KGLLMs). KGLLM provides a solution to enhance LLMs' factual reasoning ability, opening up new avenues for LLM research.

LGOct 28, 2023
Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling

Hao Wang, Zhi-Qi Cheng, Jingdong Sun et al. · cmu, uw

Multi-view or even multi-modal data is appealing yet challenging for real-world applications. Detecting anomalies in multi-view data is a prominent recent research topic. However, most of the existing methods 1) are only suitable for two views or type-specific anomalies, 2) suffer from the issue of fusion disentanglement, and 3) do not support online detection after model deployment. To address these challenges, our main ideas in this paper are three-fold: multi-view learning, disentangled representation learning, and generative model. To this end, we propose dPoE, a novel multi-view variational autoencoder model that involves (1) a Product-of-Experts (PoE) layer in tackling multi-view data, (2) a Total Correction (TC) discriminator in disentangling view-common and view-specific representations, and (3) a joint loss function in wrapping up all components. In addition, we devise theoretical information bounds to control both view-common and view-specific representations. Extensive experiments on six real-world datasets markedly demonstrate that the proposed dPoE outperforms baselines.

LGOct 22, 2023Code
Learning Invariant Molecular Representation in Latent Discrete Space

Xiang Zhuang, Qiang Zhang, Keyan Ding et al.

Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD.

CLMar 26, 2023
Boosting Few-Shot Text Classification via Distribution Estimation

Han Liu, Feng Zhang, Xiaotong Zhang et al. · apple-ml

Distribution estimation has been demonstrated as one of the most effective approaches in dealing with few-shot image classification, as the low-level patterns and underlying representations can be easily transferred across different tasks in computer vision domain. However, directly applying this approach to few-shot text classification is challenging, since leveraging the statistics of known classes with sufficient samples to calibrate the distributions of novel classes may cause negative effects due to serious category difference in text domain. To alleviate this issue, we propose two simple yet effective strategies to estimate the distributions of the novel classes by utilizing unlabeled query samples, thus avoiding the potential negative transfer issue. Specifically, we first assume a class or sample follows the Gaussian distribution, and use the original support set and the nearest few query samples to estimate the corresponding mean and covariance. Then, we augment the labeled samples by sampling from the estimated distribution, which can provide sufficient supervision for training the classification model. Extensive experiments on eight few-shot text classification datasets show that the proposed method outperforms state-of-the-art baselines significantly.

SPNov 8, 2023
Deep Learning Assisted Multiuser MIMO Load Modulated Systems for Enhanced Downlink mmWave Communications

Ercong Yu, Jinle Zhu, Qiang Li et al.

This paper is focused on multiuser load modulation arrays (MU-LMAs) which are attractive due to their low system complexity and reduced cost for millimeter wave (mmWave) multi-input multi-output (MIMO) systems. The existing precoding algorithm for downlink MU-LMA relies on a sub-array structured (SAS) transmitter which may suffer from decreased degrees of freedom and complex system configuration. Furthermore, a conventional LMA codebook with codewords uniformly distributed on a hypersphere may not be channel-adaptive and may lead to increased signal detection complexity. In this paper, we conceive an MU-LMA system employing a full-array structured (FAS) transmitter and propose two algorithms accordingly. The proposed FAS-based system addresses the SAS structural problems and can support larger numbers of users. For LMA-imposed constant-power downlink precoding, we propose an FAS-based normalized block diagonalization (FAS-NBD) algorithm. However, the forced normalization may result in performance degradation. This degradation, together with the aforementioned codebook design problems, is difficult to solve analytically. This motivates us to propose a Deep Learning-enhanced (FAS-DL-NBD) algorithm for adaptive codebook design and codebook-independent decoding. It is shown that the proposed algorithms are robust to imperfect knowledge of channel state information and yield excellent error performance. Moreover, the FAS-DL-NBD algorithm enables signal detection with low complexity as the number of bits per codeword increases.

IRMar 15Code
Learning Image-Text Matching with Optimal Partial Transport

Zhengxin Pan, Haishuai Wang, Fangyu Wu et al.

Cross-modal matching, a fundamental task in bridging vision and language, has recently garnered substantial research interest. Despite the development of numerous methods aimed at quantifying the semantic relatedness between image-text pairs, these methods often fall short of achieving both outstanding performance and high efficiency. In this paper, we propose the crOss-Modal sInkhorn maTching (OMIT) network as an effective solution to effectively improving performance while maintaining efficiency. Rooted in the theoretical foundations of Optimal Transport, OMIT harnesses the capabilities of Cross-modal Mover's Distance to precisely compute the similarity between fine-grained visual and textual fragments, utilizing Sinkhorn iterations for efficient approximation. To further alleviate the issue of redundant alignments, we seamlessly integrate partial matching into OMIT, leveraging local-to-global similarities to eliminate the interference of irrelevant fragments. We conduct extensive evaluations of OMIT on two benchmark image-text retrieval datasets, namely Flickr30K and MS-COCO. The superior performance achieved by OMIT on both datasets unequivocally demonstrates its effectiveness in cross-modal matching. Furthermore, through comprehensive visualization analysis, we elucidate OMIT's inherent tendency towards focal matching, thereby shedding light on its efficacy. Our code is publicly available at https://github.com/ppanzx/OMIT.

LGJul 4, 2023
HAGNN: Hybrid Aggregation for Heterogeneous Graph Neural Networks

Guanghui Zhu, Zhennan Zhu, Hongyang Chen et al.

Heterogeneous graph neural networks (GNNs) have been successful in handling heterogeneous graphs. In existing heterogeneous GNNs, meta-path plays an essential role. However, recent work pointed out that simple homogeneous graph model without meta-path can also achieve comparable results, which calls into question the necessity of meta-path. In this paper, we first present the intrinsic difference about meta-path-based and meta-path-free models, i.e., how to select neighbors for node aggregation. Then, we propose a novel framework to utilize the rich type semantic information in heterogeneous graphs comprehensively, namely HAGNN (Hybrid Aggregation for Heterogeneous GNNs). The core of HAGNN is to leverage the meta-path neighbors and the directly connected neighbors simultaneously for node aggregations. HAGNN divides the overall aggregation process into two phases: meta-path-based intra-type aggregation and meta-path-free inter-type aggregation. During the intra-type aggregation phase, we propose a new data structure called fused meta-path graph and perform structural semantic aware aggregation on it. Finally, we combine the embeddings generated by each phase. Compared with existing heterogeneous GNN models, HAGNN can take full advantage of the heterogeneity in heterogeneous graphs. Extensive experimental results on node classification, node clustering, and link prediction tasks show that HAGNN outperforms the existing modes, demonstrating the effectiveness of HAGNN.

CVOct 8, 2022
LW-ISP: A Lightweight Model with ISP and Deep Learning

Hongyang Chen, Kaisheng Ma

The deep learning (DL)-based methods of low-level tasks have many advantages over the traditional camera in terms of hardware prospects, error accumulation and imaging effects. Recently, the application of deep learning to replace the image signal processing (ISP) pipeline has appeared one after another; however, there is still a long way to go towards real landing. In this paper, we show the possibility of learning-based method to achieve real-time high-performance processing in the ISP pipeline. We propose LW-ISP, a novel architecture designed to implicitly learn the image mapping from RAW data to RGB image. Based on U-Net architecture, we propose the fine-grained attention module and a plug-and-play upsampling block suitable for low-level tasks. In particular, we design a heterogeneous distillation algorithm to distill the implicit features and reconstruction information of the clean image, so as to guide the learning of the student model. Our experiments demonstrate that LW-ISP has achieved a 0.38 dB improvement in PSNR compared to the previous best method, while the model parameters and calculation have been reduced by 23 times and 81 times. The inference efficiency has been accelerated by at least 15 times. Without bells and whistles, LW-ISP has achieved quite competitive results in ISP subtasks including image denoising and enhancement.

LGFeb 15, 2023
Adaptive incentive for cross-silo federated learning: A multi-agent reinforcement learning approach

Shijing Yuan, Hongze Liu, Hongtao Lv et al.

Cross-silo federated learning (FL) is a typical FL that enables organizations(e.g., financial or medical entities) to train global models on isolated data. Reasonable incentive is key to encouraging organizations to contribute data. However, existing works on incentivizing cross-silo FL lack consideration of the environmental dynamics (e.g., precision of the trained global model and data owned by uncertain clients during the training processes). Moreover, most of them assume that organizations share private information, which is unrealistic. To overcome these limitations, we propose a novel adaptive mechanism for cross-silo FL, towards incentivizing organizations to contribute data to maximize their long-term payoffs in a real dynamic training environment. The mechanism is based on multi-agent reinforcement learning, which learns near-optimal data contribution strategy from the history of potential games without organizations' private information. Experiments demonstrate that our mechanism achieves adaptive incentive and effectively improves the long-term payoffs for organizations.

CVMar 25Code
From Feature Learning to Spectral Basis Learning: A Unifying and Flexible Framework for Efficient and Robust Shape Matching

Feifan Luo, Hongyang Chen

Shape matching is a fundamental task in computer graphics and vision, with deep functional maps becoming a prominent paradigm. However, existing methods primarily focus on learning informative feature representations by constraining pointwise and functional maps, while neglecting the optimization of the spectral basis-a critical component of the functional map pipeline. This oversight often leads to suboptimal matching results. Furthermore, many current approaches rely on conventional, time-consuming functional map solvers, incurring significant computational overhead. To bridge these gaps, we introduce Advanced Functional Maps, a framework that generalizes standard functional maps by replacing fixed basis functions with learnable ones, supported by rigorous theoretical guarantees. Specifically, the spectral basis is optimized through a set of learned inhibition functions. Building on this, we propose the first unsupervised spectral basis learning method for robust non-rigid 3D shape matching, enabling the joint, end-to-end optimization of feature extraction and basis functions. Our approach incorporates a novel heat diffusion module and an unsupervised loss function, alongside a streamlined architecture that bypasses expensive solvers and auxiliary losses. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art feature-learning approaches, particularly in challenging non-isometric and topological noise scenarios, while maintaining high efficiency. Finally, we reveal that optimizing basis functions is equivalent to spectral convolution, where inhibition functions act as filters. This insight enables enhanced representations inspired by spectral graph networks, opening new avenues for future research. Our code is available at https://github.com/LuoFeifan77/Unsupervised-Spectral-Basis-Learning.

LGAug 10, 2022
Path-aware Siamese Graph Neural Network for Link Prediction

Jingsong Lv, Zhao Li, Hongyang Chen et al.

In this paper, we propose a Path-aware Siamese Graph neural network(PSG) for link prediction tasks. First, PSG captures both nodes and edge features for given two nodes, namely the structure information of k-neighborhoods and relay paths information of the nodes. Furthermore, a novel multi-task GNN framework with self-supervised contrastive learning is proposed for differentiation of positive links and negative links while content and behavior of nodes can be captured simultaneously. We evaluate the proposed algorithm PSG on two link property prediction datasets, ogbl-ddi and ogbl-collab. PSG achieves top 1 performance on ogbl-ddi until submission and top 3 performance on ogbl-collab. The experimental results verify the superiority of our proposed PSG

LGSep 25, 2024
Pre-trained Graphformer-based Ranking at Web-scale Search (Extended Abstract)

Yuchen Li, Haoyi Xiong, Linghe Kong et al.

Both Transformer and Graph Neural Networks (GNNs) have been employed in the domain of learning to rank (LTR). However, these approaches adhere to two distinct yet complementary problem formulations: ranking score regression based on query-webpage pairs, and link prediction within query-webpage bipartite graphs, respectively. While it is possible to pre-train GNNs or Transformers on source datasets and subsequently fine-tune them on sparsely annotated LTR datasets, the distributional shifts between the pair-based and bipartite graph domains present significant challenges in integrating these heterogeneous models into a unified LTR framework at web scale. To address this, we introduce the novel MPGraf model, which leverages a modular and capsule-based pre-training strategy, aiming to cohesively integrate the regression capabilities of Transformers with the link prediction strengths of GNNs. We conduct extensive offline and online experiments to rigorously evaluate the performance of MPGraf.

LGNov 18, 2024Code
TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection

Mengxuan Li, Ke Liu, Hongyang Chen et al.

Time series anomaly detection aims to identify unusual patterns in data or deviations from systems' expected behavior. The reconstruction-based methods are the mainstream in this task, which learn point-wise representation via unsupervised learning. However, the unlabeled anomaly points in training data may cause these reconstruction-based methods to learn and reconstruct anomalous data, resulting in the challenge of capturing normal patterns. In this paper, we propose a time series anomaly detection method based on implicit neural representation (INR) reconstruction, named TSINR, to address this challenge. Due to the property of spectral bias, TSINR enables prioritizing low-frequency signals and exhibiting poorer performance on high-frequency abnormal data. Specifically, we adopt INR to parameterize time series data as a continuous function and employ a transformer-based architecture to predict the INR of given data. As a result, the proposed TSINR method achieves the advantage of capturing the temporal continuity and thus is more sensitive to discontinuous anomaly data. In addition, we further design a novel form of INR continuous function to learn inter- and intra-channel information, and leverage a pre-trained large language model to amplify the intense fluctuations in anomalies. Extensive experiments demonstrate that TSINR achieves superior overall performance on both univariate and multivariate time series anomaly detection benchmarks compared to other state-of-the-art reconstruction-based methods. Our codes are available.

LGAug 2, 2024
GNN-SKAN: Harnessing the Power of SwallowKAN to Advance Molecular Representation Learning with GNNs

Ruifeng Li, Mingqian Li, Wei Liu et al.

Effective molecular representation learning is crucial for advancing molecular property prediction and drug design. Mainstream molecular representation learning approaches are based on Graph Neural Networks (GNNs). However, these approaches struggle with three significant challenges: insufficient annotations, molecular diversity, and architectural limitations such as over-squashing, which leads to the loss of critical structural details. To address these challenges, we introduce a new class of GNNs that integrates the Kolmogorov-Arnold Networks (KANs), known for their robust data-fitting capabilities and high accuracy in small-scale AI + Science tasks. By incorporating KANs into GNNs, our model enhances the representation of molecular structures. We further advance this approach with a variant called SwallowKAN (SKAN), which employs adaptive Radial Basis Functions (RBFs) as the core of the non-linear neurons. This innovation improves both computational efficiency and adaptability to diverse molecular structures. Building on the strengths of SKAN, we propose a new class of GNNs, GNN-SKAN, and its augmented variant, GNN-SKAN+, which incorporates a SKAN-based classifier to further boost performance. To our knowledge, this is the first work to integrate KANs into GNN architectures tailored for molecular representation learning. Experiments across 6 classification datasets, 6 regression datasets, and 4 few-shot learning datasets demonstrate that our approach achieves new state-of-the-art performance in terms of accuracy and computational cost.

AIAug 27, 2024
CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding

Yang Liu, Chuan Zhou, Peng Zhang et al.

Knowledge graph embedding (KGE) constitutes a foundational task, directed towards learning representations for entities and relations within knowledge graphs (KGs), with the objective of crafting representations comprehensive enough to approximate the logical and symbolic interconnections among entities. In this paper, we define a metric Z-counts to measure the difficulty of training each triple ($<$head entity, relation, tail entity$>$) in KGs with theoretical analysis. Based on this metric, we propose \textbf{CL4KGE}, an efficient \textbf{C}urriculum \textbf{L}earning based training strategy for \textbf{KGE}. This method includes a difficulty measurer and a training scheduler that aids in the training of KGE models. Our approach possesses the flexibility to act as a plugin within a wide range of KGE models, with the added advantage of adaptability to the majority of KGs in existence. The proposed method has been evaluated on popular KGE models, and the results demonstrate that it enhances the state-of-the-art methods. The use of Z-counts as a metric has enabled the identification of challenging triples in KGs, which helps in devising effective training strategies.

LGJan 28, 2024Code
Diffusion-based Graph Generative Methods

Hongyang Chen, Can Xu, Lingyu Zheng et al.

Being the most cutting-edge generative methods, diffusion methods have shown great advances in wide generation tasks. Among them, graph generation attracts significant research attention for its broad application in real life. In our survey, we systematically and comprehensively review on diffusion-based graph generative methods. We first make a review on three mainstream paradigms of diffusion methods, which are denoising diffusion probabilistic models, score-based genrative models, and stochastic differential equations. Then we further categorize and introduce the latest applications of diffusion models on graphs. In the end, we point out some limitations of current studies and future directions of future explorations. The summary of existing methods metioned in this survey is in https://github.com/zhejiangzhuque/Diffusion-based-Graph-Generative-Methods.

SDSep 19, 2023
Exploring Sentence Type Effects on the Lombard Effect and Intelligibility Enhancement: A Comparative Study of Natural and Grid Sentences

Hongyang Chen, Yuhong Yang, Zhongyuan Wang et al.

This study explores how sentence types affect the Lombard effect and intelligibility enhancement, focusing on comparisons between natural and grid sentences. Using the Lombard Chinese-TIMIT (LCT) corpus and the Enhanced MAndarin Lombard Grid (EMALG) corpus, we analyze changes in phonetic and acoustic features across different noise levels. Our results show that grid sentences produce more pronounced Lombard effects than natural sentences. Then, we develop and test a normal-to-Lombard conversion model, trained separately on LCT and EMALG corpora. Through subjective and objective evaluations, natural sentences are superior in maintaining speech quality in intelligibility enhancement. In contrast, grid sentences could provide superior intelligibility due to the more pronounced Lombard effect. This study provides a valuable perspective on enhancing speech communication in noisy environments.

AIMay 11
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

Yuanyang Li, Xue Yang, Longyue Wang et al.

Current LLM agents are proficient at calling isolated APIs but struggle with the "last mile" of commercial software automation. In real-world scenarios, tools are not independent; they are atomic, interdependent, and prone to environmental noise. We introduce $\textbf{ComplexMCP}$, a benchmark designed to evaluate agents in these rigorous conditions. Built on the Model Context Protocol (MCP), $\textbf{ComplexMCP}$ provides over 300 meticulously tested tools derived from 7 stateful sandboxes, ranging from office suites to financial systems. Unlike existing datasets, our benchmark utilizes a seed-driven architecture to simulate dynamic environment states and unpredictable API failures, ensuring a deterministic yet diverse evaluation. We evaluate various LLMs across full-context and RAG paradigms, revealing a stark performance gap: even top-tier models fail to exceed a 60% success rate, far trailing human performance 90%. Granular trajectory analysis identifies three fundamental bottlenecks: (1) $\textbf{tool retrieval saturation}$ as action spaces scale; (2) $\textbf{over-confidence}$, where agents skip essential environment verifications; and (3) $\textbf{strategic defeatism}$, a tendency to rationalize failure rather than pursuing recovery. These findings underscore the insufficiency of current agents for interdependent workflows, positioning $\textbf{ComplexMCP}$ as a critical testbed for the next generation of resilient autonomous systems.

CVAug 18, 2025Code
SEDEG:Sequential Enhancement of Decoder and Encoder's Generality for Class Incremental Learning with Small Memory

Hongyang Chen, Shaoling Pu, Lingyu Zheng et al.

In incremental learning, enhancing the generality of knowledge is crucial for adapting to dynamic data inputs. It can develop generalized representations or more balanced decision boundaries, preventing the degradation of long-term knowledge over time and thus mitigating catastrophic forgetting. Some emerging incremental learning methods adopt an encoder-decoder architecture and have achieved promising results. In the encoder-decoder achitecture, improving the generalization capabilities of both the encoder and decoder is critical, as it helps preserve previously learned knowledge while ensuring adaptability and robustness to new, diverse data inputs. However, many existing continual methods focus solely on enhancing one of the two components, which limits their effectiveness in mitigating catastrophic forgetting. And these methods perform even worse in small-memory scenarios, where only a limited number of historical samples can be stored. To mitigate this limitation, we introduces SEDEG, a two-stage training framework for vision transformers (ViT), focusing on sequentially improving the generality of both Decoder and Encoder. Initially, SEDEG trains an ensembled encoder through feature boosting to learn generalized representations, which subsequently enhance the decoder's generality and balance the classifier. The next stage involves using knowledge distillation (KD) strategies to compress the ensembled encoder and develop a new, more generalized encoder. This involves using a balanced KD approach and feature KD for effective knowledge transfer. Extensive experiments on three benchmark datasets show SEDEG's superior performance, and ablation studies confirm the efficacy of its components. The code is available at https://github.com/ShaolingPu/CIL.

IRAug 18, 2025Code
GeoGPT-RAG Technical Report

Fei Huang, Fan Wu, Zeqing Zhang et al.

GeoGPT is an open large language model system built to advance research in the geosciences. To enhance its domain-specific capabilities, we integrated Retrieval Augmented Generation(RAG), which augments model outputs with relevant information retrieved from an external knowledge source. GeoGPT uses RAG to draw from the GeoGPT Library, a specialized corpus curated for geoscientific content, enabling it to generate accurate, context-specific answers. Users can also create personalized knowledge bases by uploading their own publication lists, allowing GeoGPT to retrieve and respond using user-provided materials. To further improve retrieval quality and domain alignment, we fine-tuned both the embedding model and a ranking model that scores retrieved passages by relevance to the query. These enhancements optimize RAG for geoscience applications and significantly improve the system's ability to deliver precise and trustworthy outputs. GeoGPT reflects a strong commitment to open science through its emphasis on collaboration, transparency, and community driven development. As part of this commitment, we have open-sourced two core RAG components-GeoEmbedding and GeoReranker-to support geoscientists, researchers, and professionals worldwide with powerful, accessible AI tools.

CVMar 19
Unsupervised Contrastive Learning for Efficient and Robust Spectral Shape Matching

Feifan Luo, Hongyang Chen

Estimating correspondences between pairs of non-rigid deformable 3D shapes remains a significant challenge in computer vision and graphics. While deep functional map methods have become the go-to solution for addressing this problem, they primarily focus on optimizing pointwise and functional maps either individually or jointly, rather than directly enhancing feature representations in the embedding space, which often results in inadequate feature quality and suboptimal matching performance. Furthermore, these approaches heavily rely on traditional functional map techniques, such as time-consuming functional map solvers, which incur substantial computational costs. In this work, we introduce, for the first time, a novel unsupervised contrastive learning-based approach for efficient and robust 3D shape matching. We begin by presenting an unsupervised contrastive learning framework that promotes feature learning by maximizing consistency within positive similarity pairs and minimizing it within negative similarity pairs, thereby improving both the consistency and discriminability of the learned features.We then design a significantly simplified functional map learning architecture that eliminates the need for computationally expensive functional map solvers and multiple auxiliary functional map losses, greatly enhancing computational efficiency. By integrating these two components into a unified two-branch pipeline, our method achieves state-of-the-art performance in both accuracy and efficiency. Extensive experiments demonstrate that our approach is not only computationally efficient but also outperforms current state-of-the-art methods across various challenging benchmarks, including near-isometric, non-isometric, and topologically inconsistent scenarios, even surpassing supervised techniques.

CLMay 7
Navigating by Old Maps: The Pitfalls of Static Mechanistic Localization in LLM Post-Training

Hang Chen, Jiaying Zhu, Hongyang Chen et al.

The "Locate-then-Update" paradigm has become a predominant approach in the post-training of large language models (LLMs), identifying critical components via mechanistic interpretability for targeted parameter updates. However, this paradigm rests on a fundamental yet unverified assumption: can mechanisms derived from current static parameters reliably guide future dynamic parameter updates? To investigate this, we systematically track the structural evolution of Transformer circuits throughout the supervised fine-tuning (SFT) process, revealing the underlying dynamics of task mechanisms. We introduce three novel metrics-Circuit Distance, Circuit Stability, and Circuit Conflict-to analyze circuit evolution across three dimensions: neural migration, semantic stability, and cross-task interference. Our empirical results reveal that circuits inherently exhibit "Free Evolution" during parameter updates. Consequently, static mechanisms extracted from current states inevitably suffer from temporal latency, making them fundamentally inadequate for guiding future states. Moreover, by deconstructing the "illusion of effectiveness" in existing methods, this work underscores the necessity of "foresight" in mechanistic localization and proposes a predictive framework for future research.

CVJan 13, 2025
Pedestrian Trajectory Prediction Based on Social Interactions Learning With Random Weights

Jiajia Xie, Sheng Zhang, Beihao Xia et al.

Pedestrian trajectory prediction is a critical technology in the evolution of self-driving cars toward complete artificial intelligence. Over recent years, focusing on the trajectories of pedestrians to model their social interactions has surged with great interest in more accurate trajectory predictions. However, existing methods for modeling pedestrian social interactions rely on pre-defined rules, struggling to capture non-explicit social interactions. In this work, we propose a novel framework named DTGAN, which extends the application of Generative Adversarial Networks (GANs) to graph sequence data, with the primary objective of automatically capturing implicit social interactions and achieving precise predictions of pedestrian trajectory. DTGAN innovatively incorporates random weights within each graph to eliminate the need for pre-defined interaction rules. We further enhance the performance of DTGAN by exploring diverse task loss functions during adversarial training, which yields improvements of 16.7\% and 39.3\% on metrics ADE and FDE, respectively. The effectiveness and accuracy of our framework are verified on two public datasets. The experimental results show that our proposed DTGAN achieves superior performance and is well able to understand pedestrians' intentions.

LGMar 3, 2024
Collaborate to Adapt: Source-Free Graph Domain Adaptation via Bi-directional Adaptation

Zhen Zhang, Meihan Liu, Anhui Wang et al.

Unsupervised Graph Domain Adaptation (UGDA) has emerged as a practical solution to transfer knowledge from a label-rich source graph to a completely unlabelled target graph. However, most methods require a labelled source graph to provide supervision signals, which might not be accessible in the real-world settings due to regulations and privacy concerns. In this paper, we explore the scenario of source-free unsupervised graph domain adaptation, which tries to address the domain adaptation problem without accessing the labelled source graph. Specifically, we present a novel paradigm called GraphCTA, which performs model adaptation and graph adaptation collaboratively through a series of procedures: (1) conduct model adaptation based on node's neighborhood predictions in target graph considering both local and global information; (2) perform graph adaptation by updating graph structure and node attributes via neighborhood contrastive learning; and (3) the updated graph serves as an input to facilitate the subsequent iteration of model adaptation, thereby establishing a collaborative loop between model adaptation and graph adaptation. Comprehensive experiments are conducted on various public datasets. The experimental results demonstrate that our proposed model outperforms recent source-free baselines by large margins.

CLFeb 2, 2024
HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text

Han Liu, Zhi Xu, Xiaotong Zhang et al.

Black-box hard-label adversarial attack on text is a practical and challenging task, as the text data space is inherently discrete and non-differentiable, and only the predicted label is accessible. Research on this problem is still in the embryonic stage and only a few methods are available. Nevertheless, existing methods rely on the complex heuristic algorithm or unreliable gradient estimation strategy, which probably fall into the local optimum and inevitably consume numerous queries, thus are difficult to craft satisfactory adversarial examples with high semantic similarity and low perturbation rate in a limited query budget. To alleviate above issues, we propose a simple yet effective framework to generate high quality textual adversarial examples under the black-box hard-label attack scenarios, named HQA-Attack. Specifically, after initializing an adversarial example randomly, HQA-attack first constantly substitutes original words back as many as possible, thus shrinking the perturbation rate. Then it leverages the synonym set of the remaining changed words to further optimize the adversarial example with the direction which can improve the semantic similarity and satisfy the adversarial condition simultaneously. In addition, during the optimizing procedure, it searches a transition synonym word for each changed word, thus avoiding traversing the whole synonym set and reducing the query number to some extent. Extensive experimental results on five text classification datasets, three natural language inference datasets and two real-world APIs have shown that the proposed HQA-Attack method outperforms other strong baselines significantly.

LGJan 5, 2024
Geometric-Facilitated Denoising Diffusion Model for 3D Molecule Generation

Can Xu, Haosen Wang, Weigang Wang et al.

Denoising diffusion models have shown great potential in multiple research areas. Existing diffusion-based generative methods on de novo 3D molecule generation face two major challenges. Since majority heavy atoms in molecules allow connections to multiple atoms through single bonds, solely using pair-wise distance to model molecule geometries is insufficient. Therefore, the first one involves proposing an effective neural network as the denoising kernel that is capable to capture complex multi-body interatomic relationships and learn high-quality features. Due to the discrete nature of graphs, mainstream diffusion-based methods for molecules heavily rely on predefined rules and generate edges in an indirect manner. The second challenge involves accommodating molecule generation to diffusion and accurately predicting the existence of bonds. In our research, we view the iterative way of updating molecule conformations in diffusion process is consistent with molecular dynamics and introduce a novel molecule generation method named Geometric-Facilitated Molecular Diffusion (GFMDiff). For the first challenge, we introduce a Dual-Track Transformer Network (DTN) to fully excevate global spatial relationships and learn high quality representations which contribute to accurate predictions of features and geometries. As for the second challenge, we design Geometric-Facilitated Loss (GFLoss) which intervenes the formation of bonds during the training period, instead of directly embedding edges into the latent space. Comprehensive experiments on current benchmarks demonstrate the superiority of GFMDiff.

SISep 11, 2023
Circle Feature Graphormer: Can Circle Features Stimulate Graph Transformer?

Jingsong Lv, Hongyang Chen, Yao Qi et al.

In this paper, we introduce two local graph features for missing link prediction tasks on ogbl-citation2. We define the features as Circle Features, which are borrowed from the concept of circle of friends. We propose the detailed computing formulas for the above features. Firstly, we define the first circle feature as modified swing for common graph, which comes from bipartite graph. Secondly, we define the second circle feature as bridge, which indicates the importance of two nodes for different circle of friends. In addition, we firstly propose the above features as bias to enhance graph transformer neural network, such that graph self-attention mechanism can be improved. We implement a Circled Feature aware Graph transformer (CFG) model based on SIEG network, which utilizes a double tower structure to capture both global and local structure features. Experimental results show that CFG achieves the state-of-the-art performance on dataset ogbl-citation2.

CLJan 30, 2024
Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity Alignment

Linyao Yang, Hongyang Chen, Xiao Wang et al.

Entity alignment, which is a prerequisite for creating a more comprehensive Knowledge Graph (KG), involves pinpointing equivalent entities across disparate KGs. Contemporary methods for entity alignment have predominantly utilized knowledge embedding models to procure entity embeddings that encapsulate various similarities-structural, relational, and attributive. These embeddings are then integrated through attention-based information fusion mechanisms. Despite this progress, effectively harnessing multifaceted information remains challenging due to inherent heterogeneity. Moreover, while Large Language Models (LLMs) have exhibited exceptional performance across diverse downstream tasks by implicitly capturing entity semantics, this implicit knowledge has yet to be exploited for entity alignment. In this study, we propose a Large Language Model-enhanced Entity Alignment framework (LLMEA), integrating structural knowledge from KGs with semantic knowledge from LLMs to enhance entity alignment. Specifically, LLMEA identifies candidate alignments for a given entity by considering both embedding similarities between entities across KGs and edit distances to a virtual equivalent entity. It then engages an LLM iteratively, posing multiple multi-choice questions to draw upon the LLM's inference capability. The final prediction of the equivalent entity is derived from the LLM's output. Experiments conducted on three public datasets reveal that LLMEA surpasses leading baseline models. Additional ablation studies underscore the efficacy of our proposed framework.

CLMar 6, 2025
Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition

Bin Chen, Yu Zhang, Hongfei Ye et al.

Few-shot multimodal dialogue intention recognition is a critical challenge in the e-commerce domainn. Previous methods have primarily enhanced model classification capabilities through post-training techniques. However, our analysis reveals that training for few-shot multimodal dialogue intention recognition involves two interconnected tasks, leading to a seesaw effect in multi-task learning. This phenomenon is attributed to knowledge interference stemming from the superposition of weight matrix updates during the training process. To address these challenges, we propose Knowledge-Decoupled Synergetic Learning (KDSL), which mitigates these issues by utilizing smaller models to transform knowledge into interpretable rules, while applying the post-training of larger models. By facilitating collaboration between the large and small multimodal large language models for prediction, our approach demonstrates significant improvements. Notably, we achieve outstanding results on two real Taobao datasets, with enhancements of 6.37\% and 6.28\% in online weighted F1 scores compared to the state-of-the-art method, thereby validating the efficacy of our framework.

CLMar 13
Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

Hongyang Chen, Zhongwu Sun, Hongfei Ye et al.

Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensive overview of CL methodologies tailored for LLMs, structured around three core training stages: continual pre-training, continual fine-tuning, and continual alignment.Beyond the canonical taxonomy of rehearsal-, regularization-, and architecture-based methods, we further subdivide each category by its distinct forgetting mitigation mechanisms and conduct a rigorous comparative analysis of the adaptability and critical improvements of traditional CL methods for LLMs. In doing so, we explicitly highlight core distinctions between LLM CL and traditional machine learning, particularly with respect to scale, parameter efficiency, and emergent capabilities. Our analysis covers essential evaluation metrics, including forgetting rates and knowledge transfer efficiency, along with emerging benchmarks for assessing CL performance. This survey reveals that while current methods demonstrate promising results in specific domains, fundamental challenges persist in achieving seamless knowledge integration across diverse tasks and temporal scales. This systematic review contributes to the growing body of knowledge on LLM adaptation, providing researchers and practitioners with a structured framework for understanding current achievements and future opportunities in lifelong learning for language models.

LGFeb 18, 2025
UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery

Ruifeng Li, Mingqian Li, Wei Liu et al.

Drug discovery is crucial for identifying candidate drugs for various diseases.However, its low success rate often results in a scarcity of annotations, posing a few-shot learning problem. Existing methods primarily focus on single-scale features, overlooking the hierarchical molecular structures that determine different molecular properties. To address these issues, we introduce Universal Matching Networks (UniMatch), a dual matching framework that integrates explicit hierarchical molecular matching with implicit task-level matching via meta-learning, bridging multi-level molecular representations and task-level generalization. Specifically, our approach explicitly captures structural features across multiple levels, such as atoms, substructures, and molecules, via hierarchical pooling and matching, facilitating precise molecular representation and comparison. Additionally, we employ a meta-learning strategy for implicit task-level matching, allowing the model to capture shared patterns across tasks and quickly adapt to new ones. This unified matching framework ensures effective molecular alignment while leveraging shared meta-knowledge for fast adaptation. Our experimental results demonstrate that UniMatch outperforms state-of-the-art methods on the MoleculeNet and FS-Mol benchmarks, achieving improvements of 2.87% in AUROC and 6.52% in delta AUPRC. UniMatch also shows excellent generalization ability on the Meta-MolNet benchmark.

LGMay 11, 2024
Input Snapshots Fusion for Scalable Discrete-Time Dynamic Graph Neural Networks

QingGuo Qi, Hongyang Chen, Minhao Cheng et al.

In recent years, there has been a surge in research on dynamic graph representation learning, primarily focusing on modeling the evolution of temporal-spatial patterns in real-world applications. However, within the domain of discrete-time dynamic graphs, the exploration of temporal edges remains underexplored. Existing approaches often rely on additional sequential models to capture dynamics, leading to high computational and memory costs, particularly for large-scale graphs. To address this limitation, we propose the Input {\bf S}napshots {\bf F}usion based {\bf Dy}namic {\bf G}raph Neural Network (SFDyG), which combines Hawkes processes with graph neural networks to capture temporal and structural patterns in dynamic graphs effectively. By fusing multiple snapshots into a single temporal graph, SFDyG decouples computational complexity from the number of snapshots, enabling efficient full-batch and mini-batch training. Experimental evaluations on eight diverse dynamic graph datasets for future link prediction tasks demonstrate that SFDyG consistently outperforms existing methods.

LGDec 3, 2024
GQWformer: A Quantum-based Transformer for Graph Representation Learning

Lei Yu, Hongyang Chen, Jingsong Lv et al.

Graph Transformers (GTs) have demonstrated significant advantages in graph representation learning through their global attention mechanisms. However, the self-attention mechanism in GTs tends to neglect the inductive biases inherent in graph structures, making it chanllenging to effectively capture essential structural information. To address this issue, we propose a novel approach that integrate graph inductive bias into self-attention mechanisms by leveraging quantum technology for structural encoding. In this paper, we introduce the Graph Quantum Walk Transformer (GQWformer), a groundbreaking GNN framework that utilizes quantum walks on attributed graphs to generate node quantum states. These quantum states encapsulate rich structural attributes and serve as inductive biases for the transformer, thereby enabling the generation of more meaningful attention scores. By subsequently incorporating a recurrent neural network, our design amplifies the model's ability to focus on both local and global information. We conducted comprehensive experiments across five publicly available datasets to evaluate the effectiveness of our model. These results clearly indicate that GQWformer outperforms existing state-of-the-art graph classification algorithms. These findings highlight the significant potential of integrating quantum computing methodologies with traditional GNNs to advance the field of graph representation learning, providing a promising direction for future research and applications.

QMApr 16, 2024
Physical formula enhanced multi-task learning for pharmacokinetics prediction

Ruifeng Li, Dongzhan Zhou, Ancheng Shen et al.

Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. In this work, we develop a physical formula enhanced mul-ti-task learning (PEMAL) method that predicts four key parameters of pharmacokinetics simultaneously. By incorporating physical formulas into the multi-task framework, PEMAL facilitates effective knowledge sharing and target alignment among the pharmacokinetic parameters, thereby enhancing the accuracy of prediction. Our experiments reveal that PEMAL significantly lowers the data demand, compared to typical Graph Neural Networks. Moreover, we demonstrate that PEMAL enhances the robustness to noise, an advantage that conventional Neural Networks do not possess. Another advantage of PEMAL is its high flexibility, which can be potentially applied to other multi-task machine learning scenarios. Overall, our work illustrates the benefits and potential of using PEMAL in AIDD and other scenarios with data scarcity and noise.

AIOct 8, 2025
Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning

Wenxun Wu, Yuanyang Li, Guhan Chen et al.

Recent advances in large language models (LLMs) have popularized test-time scaling, where models generate additional reasoning tokens before producing final answers. These approaches have demonstrated significant performance improvements on benchmarks involving mathematical reasoning. However, language models relying solely on direct inference still struggle with tasks demanding up-to-date knowledge or computational tools such as calculators and code interpreters for complex arithmetic operations. To overcome these limitations, we propose Tool-Augmented Policy Optimization (TAPO), a novel reinforcement learning framework that systematically integrates multi-hop reasoning with adaptive tool-calling capabilities. Our approach employs a modified version of Dynamic Sampling Policy Optimization (DAPO), a recently developed RL paradigm, which we adapt specifically for tool invocation scenarios, enabling models to dynamically interleave complex reasoning with on-demand tool usage (including search APIs and Python interpreters). To support this research, we introduce two new datasets: TAPO-easy-60K and TAPO-hard-18K, specifically designed to train and evaluate both fact-based reasoning and mathematical calculation capabilities. Our experiments on Qwen2.5-3B and Qwen2.5-7B models demonstrate the effectiveness of our approach, with both models achieving state-of-the-art performance on tasks requiring external knowledge and mathematical computation among methods with comparable parameters. Notably, TAPO achieves more efficient tool utilization than baseline methods while preventing excessive calls caused by reward hacking. These results highlight the significant potential of combining advanced reasoning with tool usage to enhance model performance in knowledge-intensive and computationally demanding tasks.

CVJul 15, 2025
Assessing Color Vision Test in Large Vision-language Models

Hongfei Ye, Bin Chen, Wenxi Liu et al.

With the widespread adoption of large vision-language models, the capacity for color vision in these models is crucial. However, the color vision abilities of large visual-language models have not yet been thoroughly explored. To address this gap, we define a color vision testing task for large vision-language models and construct a dataset \footnote{Anonymous Github Showing some of the data https://anonymous.4open.science/r/color-vision-test-dataset-3BCD} that covers multiple categories of test questions and tasks of varying difficulty levels. Furthermore, we analyze the types of errors made by large vision-language models and propose fine-tuning strategies to enhance their performance in color vision tests.

SPDec 12, 2024
Residual Channel Boosts Contrastive Learning for Radio Frequency Fingerprint Identification

Rui Pan, Hui Chen, Guanxiong Shen et al.

In order to address the issue of limited data samples for the deployment of pre-trained models in unseen environments, this paper proposes a residual channel-based data augmentation strategy for Radio Frequency Fingerprint Identification (RFFI), coupled with a lightweight SimSiam contrastive learning framework. By applying least square (LS) and minimum mean square error (MMSE) channel estimations followed by equalization, signals with different residual channel effects are generated. These residual channels enable the model to learn more effective representations. Then the pre-trained model is fine-tuned with 1% samples in a novel environment for RFFI. Experimental results demonstrate that our method significantly enhances both feature extraction ability and generalization while requiring fewer samples and less time, making it suitable for practical wireless security applications.

LGOct 28, 2024
Contextual Representation Anchor Network to Alleviate Selection Bias in Few-Shot Drug Discovery

Ruifeng Li, Wei Liu, Xiangxin Zhou et al.

In the drug discovery process, the low success rate of drug candidate screening often leads to insufficient labeled data, causing the few-shot learning problem in molecular property prediction. Existing methods for few-shot molecular property prediction overlook the sample selection bias, which arises from non-random sample selection in chemical experiments. This bias in data representativeness leads to suboptimal performance. To overcome this challenge, we present a novel method named contextual representation anchor Network (CRA), where an anchor refers to a cluster center of the representations of molecules and serves as a bridge to transfer enriched contextual knowledge into molecular representations and enhance their expressiveness. CRA introduces a dual-augmentation mechanism that includes context augmentation, which dynamically retrieves analogous unlabeled molecules and captures their task-specific contextual knowledge to enhance the anchors, and anchor augmentation, which leverages the anchors to augment the molecular representations. We evaluate our approach on the MoleculeNet and FS-Mol benchmarks, as well as in domain transfer experiments. The results demonstrate that CRA outperforms the state-of-the-art by 2.60% and 3.28% in AUC and $Δ$AUC-PR metrics, respectively, and exhibits superior generalization capabilities.

LGOct 18, 2024
Dual-Label Learning With Irregularly Present Labels

Mingqian Li, Qiao Han, Ruifeng Li et al.

In multi-task learning, labels are often missing irregularly across samples, which can be fully labeled, partially labeled or unlabeled. The irregular label presence often appears in scientific studies due to experimental limitations. It triggers a demand for a new training and inference mechanism that could accommodate irregularly present labels and maximize their utility. This work focuses on the two-label learning task and proposes a novel training and inference framework, Dual-Label Learning (DLL). The DLL framework formulates the problem into a dual-function system, in which the two functions should simultaneously satisfy standard supervision, structural duality and probabilistic duality. DLL features a dual-tower model architecture that allows for explicit information exchange between labels, aimed at maximizing the utility of partially available labels. During training, missing labels are imputed as part of the forward propagation process, while during inference, labels are predicted jointly as unknowns of a bivariate system of equations. Our theoretical analysis guarantees the feasibility of DLL, and extensive experiments are conducted to verify that by explicitly modeling label correlation and maximizing label utility, our method makes consistently better prediction than baseline approaches by up to 9.6% gain in F1-score or 10.2% reduction in MAPE. Remarkably, DLL maintains robust performance at a label missing rate of up to 60%, achieving even better results than baseline approaches at lower missing rates down to only 10%.

CVOct 15, 2024
Development and Testing of a Wood Panels Bark Removal Equipment Based on Deep Learning

Rijun Wang, Guanghao Zhang, Hongyang Chen et al.

Attempting to apply deep learning methods to wood panels bark removal equipment to enhance the quality and efficiency of bark removal is a significant and challenging endeavor. This study develops and tests a deep learning-based wood panels bark removal equipment. In accordance with the practical requirements of sawmills, a wood panels bark removal equipment equipped with a vision inspection system is designed. Based on a substantial collection of wood panel images obtained using the visual inspection system, the first general wood panels semantic segmentation dataset is constructed for training the BiSeNetV1 model employed in this study. Furthermore, the calculation methods and processes for the essential key data required in the bark removal process are presented in detail. Comparative experiments of the BiSeNetV1 model and tests of bark removal effectiveness are conducted in both laboratory and sawmill environments. The results of the comparative experiments indicate that the application of the BiSeNetV1 segmentation model is rational and feasible. The results of the bark removal effectiveness tests demonstrate a significant improvement in both the quality and efficiency of bark removal. The developed equipment fully meets the sawmill's requirements for precision and efficiency in bark removal processing.

LGJun 5, 2024
Decision-focused Graph Neural Networks for Combinatorial Optimization

Yang Liu, Chuan Zhou, Peng Zhang et al.

In recent years, there has been notable interest in investigating combinatorial optimization (CO) problems by neural-based framework. An emerging strategy to tackle these challenging problems involves the adoption of graph neural networks (GNNs) as an alternative to traditional algorithms, a subject that has attracted considerable attention. Despite the growing popularity of GNNs and traditional algorithm solvers in the realm of CO, there is limited research on their integrated use and the correlation between them within an end-to-end framework. The primary focus of our work is to formulate a more efficient and precise framework for CO by employing decision-focused learning on graphs. Additionally, we introduce a decision-focused framework that utilizes GNNs to address CO problems with auxiliary support. To realize an end-to-end approach, we have designed two cascaded modules: (a) an unsupervised trained graph predictive model, and (b) a solver for quadratic binary unconstrained optimization. Empirical evaluations are conducted on various classical tasks, including maximum cut, maximum independent set, and minimum vertex cover. The experimental results on classical CO problems (i.e. MaxCut, MIS, and MVC) demonstrate the superiority of our method over both the standalone GNN approach and classical methods.

LGJun 5, 2024
Combinatorial Optimization with Automated Graph Neural Networks

Yang Liu, Peng Zhang, Yang Gao et al.

In recent years, graph neural networks (GNNs) have become increasingly popular for solving NP-hard combinatorial optimization (CO) problems, such as maximum cut and maximum independent set. The core idea behind these methods is to represent a CO problem as a graph and then use GNNs to learn the node/graph embedding with combinatorial information. Although these methods have achieved promising results, given a specific CO problem, the design of GNN architectures still requires heavy manual work with domain knowledge. Existing automated GNNs are mostly focused on traditional graph learning problems, which is inapplicable to solving NP-hard CO problems. To this end, we present a new class of \textbf{AUTO}mated \textbf{G}NNs for solving \textbf{NP}-hard problems, namely \textbf{AutoGNP}. We represent CO problems by GNNs and focus on two specific problems, i.e., mixed integer linear programming and quadratic unconstrained binary optimization. The idea of AutoGNP is to use graph neural architecture search algorithms to automatically find the best GNNs for a given NP-hard combinatorial optimization problem. Compared with existing graph neural architecture search algorithms, AutoGNP utilizes two-hop operators in the architecture search space. Moreover, AutoGNP utilizes simulated annealing and a strict early stopping policy to avoid local optimal solutions. Empirical results on benchmark combinatorial problems demonstrate the superiority of our proposed model.

CVMay 6, 2024
Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing

Han Liu, Siyang Zhao, Xiaotong Zhang et al.

Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all. While prevailing methods have shown promising performance via transferring knowledge from seen classes to unseen classes, they are still limited by (1) Inherent dissimilarities among classes make the transformation of features learned from seen classes to unseen classes both difficult and inefficient. (2) Rare labeled novel samples usually cannot provide enough supervision signals to enable the model to adjust from the source distribution to the target distribution, especially for complicated scenarios. To alleviate the above issues, we propose a simple and effective strategy for few-shot and zero-shot text classification. We aim to liberate the model from the confines of seen classes, thereby enabling it to predict unseen categories without the necessity of training on seen classes. Specifically, for mining more related unseen category knowledge, we utilize a large pre-trained language model to generate pseudo novel samples, and select the most representative ones as category anchors. After that, we convert the multi-class classification task into a binary classification task and use the similarities of query-anchor pairs for prediction to fully leverage the limited supervision signals. Extensive experiments on six widely used public datasets show that our proposed method can outperform other strong baselines significantly in few-shot and zero-shot tasks, even without using any seen class samples.

CVFeb 6, 2024
Deep Frequency-Aware Functional Maps for Robust Shape Matching

Feifan Luo, Qinsong Li, Ling Hu et al.

Deep functional map frameworks are widely employed for 3D shape matching. However, most existing deep functional map methods cannot adaptively capture important frequency information for functional map estimation in specific matching scenarios, i.e., lacking \textit{frequency awareness}, resulting in poor performance when dealing with large deformable shape matching. To this end, we propose a novel unsupervised learning-based framework called Deep Frequency-Aware Functional Maps, which can gracefully cope with various shape matching scenarios. We first introduce a general constraint called Spectral Filter Operator Preservation to compute desirable functional maps, where the spectral filter operator encodes informative frequency information and can promote frequency awareness for deep functional map frameworks by learning a set of filter functions. Then, we directly utilize the proposed constraint as a loss function to supervise functional maps, pointwise maps, and filter functions simultaneously, where the filter functions are derived from the orthonormal Jacobi basis, and the coefficients of the basis are learnable parameters. Finally, we develop an effective refinement strategy to improve the final pointwise map, which incorporates our constraint and learned filter functions, leading to more robust and accurate correspondences during the inference process. Extensive experimental results on various datasets demonstrate that our approach outperforms the existing state-of-the-art methods, especially in challenging settings like datasets with non-isometric deformation and inconsistent topology.

CLJan 26, 2024
Scientific Large Language Models: A Survey on Biological & Chemical Domains

Qiang Zhang, Keyang Ding, Tianwen Lyv et al.

Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. This growing interest has led to the advent of scientific LLMs, a novel subclass specifically engineered for facilitating scientific discovery. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration. However, a systematic and up-to-date survey introducing them is currently lacking. In this paper, we endeavor to methodically delineate the concept of "scientific language", whilst providing a thorough review of the latest advancements in scientific LLMs. Given the expansive realm of scientific disciplines, our analysis adopts a focused lens, concentrating on the biological and chemical domains. This includes an in-depth examination of LLMs for textual knowledge, small molecules, macromolecular proteins, genomic sequences, and their combinations, analyzing them in terms of model architectures, capabilities, datasets, and evaluation. Finally, we critically examine the prevailing challenges and point out promising research directions along with the advances of LLMs. By offering a comprehensive overview of technical developments in this field, this survey aspires to be an invaluable resource for researchers navigating the intricate landscape of scientific LLMs.

LGMay 30, 2023
Graph Neural Processes for Spatio-Temporal Extrapolation

Junfeng Hu, Yuxuan Liang, Zhencheng Fan et al.

We study the task of spatio-temporal extrapolation that generates data at target locations from surrounding contexts in a graph. This task is crucial as sensors that collect data are sparsely deployed, resulting in a lack of fine-grained information due to high deployment and maintenance costs. Existing methods either use learning-based models like Neural Networks or statistical approaches like Gaussian Processes for this task. However, the former lacks uncertainty estimates and the latter fails to capture complex spatial and temporal correlations effectively. To address these issues, we propose Spatio-Temporal Graph Neural Processes (STGNP), a neural latent variable model which commands these capabilities simultaneously. Specifically, we first learn deterministic spatio-temporal representations by stacking layers of causal convolutions and cross-set graph neural networks. Then, we learn latent variables for target locations through vertical latent state transitions along layers and obtain extrapolations. Importantly during the transitions, we propose Graph Bayesian Aggregation (GBA), a Bayesian graph aggregator that aggregates contexts considering uncertainties in context data and graph structure. Extensive experiments show that STGNP has desirable properties such as uncertainty estimates and strong learning capabilities, and achieves state-of-the-art results by a clear margin.

LGOct 24, 2021
SenseMag: Enabling Low-Cost Traffic Monitoring using Non-invasive Magnetic Sensing

Kafeng Wang, Haoyi Xiong, Jie Zhang et al.

The operation and management of intelligent transportation systems (ITS), such as traffic monitoring, relies on real-time data aggregation of vehicular traffic information, including vehicular types (e.g., cars, trucks, and buses), in the critical roads and highways. While traditional approaches based on vehicular-embedded GPS sensors or camera networks would either invade drivers' privacy or require high deployment cost, this paper introduces a low-cost method, namely SenseMag, to recognize the vehicular type using a pair of non-invasive magnetic sensors deployed on the straight road section. SenseMag filters out noises and segments received magnetic signals by the exact time points that the vehicle arrives or departs from every sensor node. Further, SenseMag adopts a hierarchical recognition model to first estimate the speed/velocity, then identify the length of vehicle using the predicted speed, sampling cycles, and the distance between the sensor nodes. With the vehicle length identified and the temporal/spectral features extracted from the magnetic signals, SenseMag classify the types of vehicles accordingly. Some semi-automated learning techniques have been adopted for the design of filters, features, and the choice of hyper-parameters. Extensive experiment based on real-word field deployment (on the highways in Shenzhen, China) shows that SenseMag significantly outperforms the existing methods in both classification accuracy and the granularity of vehicle types (i.e., 7 types by SenseMag versus 4 types by the existing work in comparisons). To be specific, our field experiment results validate that SenseMag is with at least $90\%$ vehicle type classification accuracy and less than 5\% vehicle length classification error.

LGJun 15, 2021
Thompson Sampling for Unimodal Bandits

Long Yang, Zhao Li, Zehong Hu et al.

In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, our algorithm makes decision according to posterior distribution only in the neighborhood of the arm that has the highest empirical mean estimate. We theoretically prove that, for Bernoulli rewards, the regret of our algorithm reaches the lower bound of unimodal bandits, thus it is asymptotically optimal. For Gaussian rewards, the regret of our algorithm is $\mathcal{O}(\log T)$, which is far better than standard Thompson Sampling algorithms. Extensive experiments demonstrate the effectiveness of the proposed algorithm on both synthetic data sets and the real-world applications.