SDSep 21, 2023Code
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event ClassificationMeng Liu, Ke Liang, Dayu Hu et al.
Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. Our code is available at https://github.com/MGitHubL/TMac.
GNNov 28, 2023
Single-cell Multi-view Clustering via Community Detection with Unknown Number of ClustersDayu Hu, Zhibin Dong, Ke Liang et al.
Single-cell multi-view clustering enables the exploration of cellular heterogeneity within the same cell from different views. Despite the development of several multi-view clustering methods, two primary challenges persist. Firstly, most existing methods treat the information from both single-cell RNA (scRNA) and single-cell Assay of Transposase Accessible Chromatin (scATAC) views as equally significant, overlooking the substantial disparity in data richness between the two views. This oversight frequently leads to a degradation in overall performance. Additionally, the majority of clustering methods necessitate manual specification of the number of clusters by users. However, for biologists dealing with cell data, precisely determining the number of distinct cell types poses a formidable challenge. To this end, we introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data, which seamlessly integrates information from different views without the need for a predefined number of clusters. The scUNC method comprises several steps: initially, it employs a cross-view fusion network to create an effective embedding, which is then utilized to generate initial clusters via community detection. Subsequently, the clusters are automatically merged and optimized until no further clusters can be merged. We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets. The results underscored that scUNC outperforms the other baseline methods.
LGNov 28, 2023
Single-Cell Deep Clustering Method Assisted by Exogenous Gene Information: A Novel Approach to Identifying Cell TypesDayu Hu, Ke Liang, Hao Yu et al.
In recent years, the field of single-cell data analysis has seen a marked advancement in the development of clustering methods. Despite advancements, most of these algorithms still concentrate on analyzing the provided single-cell matrix data. However, in medical applications, single-cell data often involves a wealth of exogenous information, including gene networks. Overlooking this aspect could lead to information loss and clustering results devoid of significant clinical relevance. An innovative single-cell deep clustering method, incorporating exogenous gene information, has been proposed to overcome this limitation. This model leverages exogenous gene network information to facilitate the clustering process, generating discriminative representations. Specifically, we have developed an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells. Concurrently, we conducted a random walk on an exogenous Protein-Protein Interaction (PPI) network, thereby acquiring the gene's topological features. Ultimately, during the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation. Extensive experiments have validated the effectiveness of our proposed method. This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
CVNov 14, 2023
Dual-channel Prototype Network for few-shot Classification of Pathological ImagesHao Quan, Xinjia Li, Dayu Hu et al.
In pathology, the rarity of certain diseases and the complexity in annotating pathological images significantly hinder the creation of extensive, high-quality datasets. This limitation impedes the progress of deep learning-assisted diagnostic systems in pathology. Consequently, it becomes imperative to devise a technology that can discern new disease categories from a minimal number of annotated examples. Such a technology would substantially advance deep learning models for rare diseases. Addressing this need, we introduce the Dual-channel Prototype Network (DCPN), rooted in the few-shot learning paradigm, to tackle the challenge of classifying pathological images with limited samples. DCPN augments the Pyramid Vision Transformer (PVT) framework for few-shot classification via self-supervised learning and integrates it with convolutional neural networks. This combination forms a dual-channel architecture that extracts multi-scale, highly precise pathological features. The approach enhances the versatility of prototype representations and elevates the efficacy of prototype networks in few-shot pathological image classification tasks. We evaluated DCPN using three publicly available pathological datasets, configuring small-sample classification tasks that mirror varying degrees of clinical scenario domain shifts. Our experimental findings robustly affirm DCPN's superiority in few-shot pathological image classification, particularly in tasks within the same domain, where it achieves the benchmarks of supervised learning.
CVDec 1, 2025
DB-KAUNet: An Adaptive Dual Branch Kolmogorov-Arnold UNet for Retinal Vessel SegmentationHongyu Xu, Panpan Meng, Meng Wang et al.
Accurate segmentation of retinal vessels is crucial for the clinical diagnosis of numerous ophthalmic and systemic diseases. However, traditional Convolutional Neural Network (CNN) methods exhibit inherent limitations, struggling to capture long-range dependencies and complex nonlinear relationships. To address the above limitations, an Adaptive Dual Branch Kolmogorov-Arnold UNet (DB-KAUNet) is proposed for retinal vessel segmentation. In DB-KAUNet, we design a Heterogeneous Dual-Branch Encoder (HDBE) that features parallel CNN and Transformer pathways. The HDBE strategically interleaves standard CNN and Transformer blocks with novel KANConv and KAT blocks, enabling the model to form a comprehensive feature representation. To optimize feature processing, we integrate several critical components into the HDBE. First, a Cross-Branch Channel Interaction (CCI) module is embedded to facilitate efficient interaction of channel features between the parallel pathways. Second, an attention-based Spatial Feature Enhancement (SFE) module is employed to enhance spatial features and fuse the outputs from both branches. Building upon the SFE module, an advanced Spatial Feature Enhancement with Geometrically Adaptive Fusion (SFE-GAF) module is subsequently developed. In the SFE-GAF module, adaptive sampling is utilized to focus on true vessel morphology precisely. The adaptive process strengthens salient vascular features while significantly reducing background noise and computational overhead. Extensive experiments on the DRIVE, STARE, and CHASE_DB1 datasets validate that DB-KAUNet achieves leading segmentation performance and demonstrates exceptional robustness.
NEDec 23, 2024
Learn from Global Correlations: Enhancing Evolutionary Algorithm via Spectral GNNKaichen Ouyang, Zong Ke, Shengwei Fu et al.
Evolutionary algorithms (EAs) simulate natural selection but have two main limitations: (1) they rarely update individuals based on global correlations, limiting comprehensive learning; (2) they struggle with balancing exploration and exploitation, where excessive exploitation causes premature convergence, and excessive exploration slows down the search. Moreover, EAs often depend on manual parameter settings, which can disrupt the exploration-exploitation balance. To address these issues, we propose Graph Neural Evolution (GNE), a novel EA framework. GNE represents the population as a graph, where nodes represent individuals, and edges capture their relationships, enabling global information usage. GNE utilizes spectral graph neural networks (GNNs) to decompose evolutionary signals into frequency components, applying a filtering function to fuse these components. High-frequency components capture diverse global information, while low-frequency ones capture more consistent information. This explicit frequency filtering strategy directly controls global-scale features through frequency components, overcoming the limitations of manual parameter settings and making the exploration-exploitation control more interpretable and manageable. Tests on nine benchmark functions (e.g., Sphere, Rastrigin, Rosenbrock) show that GNE outperforms classical (GA, DE, CMA-ES) and advanced algorithms (SDAES, RL-SHADE) under various conditions, including noise-corrupted and optimal solution deviation scenarios. GNE achieves solutions several orders of magnitude better (e.g., 3.07e-20 mean on Sphere vs. 1.51e-07).
LGAug 14, 2025
SPHENIC: Topology-Informed Multi-View Clustering for Spatial TranscriptomicsChenkai Guo, Yikai Zhu, Jing Yangum et al.
By incorporating spatial location information, spatial-transcriptomics clustering yields more comprehensive insights into cell subpopulation identification. Despite recent progress, existing methods have at least two limitations: (i) topological learning typically considers only representations of individual cells or their interaction graphs; however, spatial transcriptomic profiles are often noisy, making these approaches vulnerable to low-quality topological signals, and (ii) insufficient modeling of spatial neighborhood information leads to low-quality spatial embeddings. To address these limitations, we propose SPHENIC, a novel Spatial Persistent Homology Enhanced Neighborhood Integrative Clustering method. Specifically, SPHENIC incorporates invariant topological features into the clustering network to achieve stable representation learning. Additionally, to construct high-quality spatial embeddings that reflect the true cellular distribution, we design the Spatial Constraint and Distribution Optimization Module (SCDOM). This module increases the similarity between a cell's embedding and those of its spatial neighbors, decreases similarity with non-neighboring cells, and thereby produces clustering-friendly spatial embeddings. Extensive experiments on 14 benchmark spatial transcriptomic slices demonstrate that SPHENIC achieves superior performance on the spatial clustering task, outperforming existing state-of-the-art methods by 3.31%-6.54% over the best alternative.