Hong Kang

CV
8papers
673citations
Novelty42%
AI Score50

8 Papers

IVMay 31, 2022
Progressive Multi-scale Consistent Network for Multi-class Fundus Lesion Segmentation

Along He, Kai Wang, Tao Li et al.

Effectively integrating multi-scale information is of considerable significance for the challenging multi-class segmentation of fundus lesions because different lesions vary significantly in scales and shapes. Several methods have been proposed to successfully handle the multi-scale object segmentation. However, two issues are not considered in previous studies. The first is the lack of interaction between adjacent feature levels, and this will lead to the deviation of high-level features from low-level features and the loss of detailed cues. The second is the conflict between the low-level and high-level features, this occurs because they learn different scales of features, thereby confusing the model and decreasing the accuracy of the final prediction. In this paper, we propose a progressive multi-scale consistent network (PMCNet) that integrates the proposed progressive feature fusion (PFF) block and dynamic attention block (DAB) to address the aforementioned issues. Specifically, PFF block progressively integrates multi-scale features from adjacent encoding layers, facilitating feature learning of each layer by aggregating fine-grained details and high-level semantics. As features at different scales should be consistent, DAB is designed to dynamically learn the attentive cues from the fused features at different scales, thus aiming to smooth the essential conflicts existing in multi-scale features. The two proposed PFF and DAB blocks can be integrated with the off-the-shelf backbone networks to address the two issues of multi-scale and feature inconsistency in the multi-class segmentation of fundus lesions, which will produce better feature representation in the feature space. Experimental results on three public datasets indicate that the proposed method is more effective than recent state-of-the-art methods.

IVSep 23, 2024Code
TransUKAN:Computing-Efficient Hybrid KAN-Transformer for Enhanced Medical Image Segmentation

Yanlin Wu, Tao Li, Zhihong Wang et al.

U-Net is currently the most widely used architecture for medical image segmentation. Benefiting from its unique encoder-decoder architecture and skip connections, it can effectively extract features from input images to segment target regions. The commonly used U-Net is typically based on convolutional operations or Transformers, modeling the dependencies between local or global information to accomplish medical image analysis tasks. However, convolutional layers, fully connected layers, and attention mechanisms used in this process introduce a significant number of parameters, often requiring the stacking of network layers to model complex nonlinear relationships, which can impact the training process. To address these issues, we propose TransUKAN. Specifically, we have improved the KAN to reduce memory usage and computational load. On this basis, we explored an effective combination of KAN, Transformer, and U-Net structures. This approach enhances the model's capability to capture nonlinear relationships by introducing only a small number of additional parameters and compensates for the Transformer structure's deficiency in local information extraction. We validated TransUKAN on multiple medical image segmentation tasks. Experimental results demonstrate that TransUKAN achieves excellent performance with significantly reduced parameters. The code will be available athttps://github.com/wuyanlin-wyl/TransUKAN.

27.2CLApr 24
Preference Heads in Large Language Models: A Mechanistic Framework for Interpretable Personalization

Weixu Zhang, Ye Yuan, Changjiang Han et al.

Large Language Models (LLMs) exhibit strong implicit personalization ability, yet most existing approaches treat this behavior as a black box, relying on prompt engineering or fine tuning on user data. In this work, we adopt a mechanistic interpretability perspective and hypothesize the existence of a sparse set of Preference Heads, attention heads that encode user specific stylistic and topical preferences and exert a causal influence on generation. We introduce Differential Preference Steering (DPS), a training free framework that (1) identifies Preference Heads through causal masking analysis and (2) leverages them for controllable and interpretable personalization at inference time. DPS computes a Preference Contribution Score (PCS) for each attention head, directly measuring its causal impact on user aligned outputs. During decoding, we contrast model predictions with and without Preference Heads, amplifying the difference between personalized and generic logits to selectively strengthen preference aligned continuations. Experiments on widely used personalization benchmarks across multiple LLMs demonstrate consistent gains in personalization fidelity while preserving content coherence and low computational overhead. Beyond empirical improvements, DPS provides a mechanistic explanation of where and how personalization emerges within transformer architectures. Our implementation is publicly available.

CRFeb 13
TensorCommitments: A Lightweight Verifiable Inference for Language Models

Oguzhan Baser, Elahe Sadeghi, Eric Wang et al.

Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM without any adversarial tampering. We critically ask how to achieve verifiable LLM inference, where a prover (the service) must convince a verifier (the client) that an inference was run correctly without rerunning the LLM. Existing cryptographic works are too slow at the LLM scale, while non-cryptographic ones require a strong verifier GPU. We propose TensorCommitments (TCs), a tensor-native proof-of-inference scheme. TC binds the LLM inference to a commitment, an irreversible tag that breaks under tampering, organized in our multivariate Terkle Trees. For LLaMA2, TC adds only 0.97% prover and 0.12% verifier time over inference while improving robustness to tailored LLM attacks by up to 48% over the best prior work requiring a verifier GPU.

IVJan 25, 2021Code
Applications of Deep Learning in Fundus Images: A Review

Tao Li, Wang Bo, Chunyu Hu et al.

The use of fundus images for the early screening of eye diseases is of great clinical importance. Due to its powerful performance, deep learning is becoming more and more popular in related applications, such as lesion segmentation, biomarkers segmentation, disease diagnosis and image synthesis. Therefore, it is very necessary to summarize the recent developments in deep learning for fundus images with a review paper. In this review, we introduce 143 application papers with a carefully designed hierarchy. Moreover, 33 publicly available datasets are presented. Summaries and analyses are provided for each task. Finally, limitations common to all tasks are revealed and possible solutions are given. We will also release and regularly update the state-of-the-art results and newly-released datasets at https://github.com/nkicsl/Fundus Review to adapt to the rapid development of this field.

CVMar 11, 2018Code
BTS-DSN: Deeply Supervised Neural Network with Short Connections for Retinal Vessel Segmentation

Song Guo, Kai Wang, Hong Kang et al.

Background and Objective: The condition of vessel of the human eye is an important factor for the diagnosis of ophthalmological diseases. Vessel segmentation in fundus images is a challenging task due to complex vessel structure, the presence of similar structures such as microaneurysms and hemorrhages, micro-vessel with only one to several pixels wide, and requirements for finer results. Methods:In this paper, we present a multi-scale deeply supervised network with short connections (BTS-DSN) for vessel segmentation. We used short connections to transfer semantic information between side-output layers. Bottom-top short connections pass low level semantic information to high level for refining results in high-level side-outputs, and top-bottom short connection passes much structural information to low level for reducing noises in low-level side-outputs. In addition, we employ cross-training to show that our model is suitable for real world fundus images. Results: The proposed BTS-DSN has been verified on DRIVE, STARE and CHASE_DB1 datasets, and showed competitive performance over other state-of-the-art methods. Specially, with patch level input, the network achieved 0.7891/0.8212 sensitivity, 0.9804/0.9843 specificity, 0.9806/0.9859 AUC, and 0.8249/0.8421 F1-score on DRIVE and STARE, respectively. Moreover, our model behaves better than other methods in cross-training experiments. Conclusions: BTS-DSN achieves competitive performance in vessel segmentation task on three public datasets. It is suitable for vessel segmentation. The source code of our method is available at https://github.com/guomugong/BTS-DSN.

CVNov 18, 2025
CPSL: Representing Volumetric Video via Content-Promoted Scene Layers

Kaiyuan Hu, Yili Jin, Junhua Liu et al.

Volumetric video enables immersive and interactive visual experiences by supporting free viewpoint exploration and realistic motion parallax. However, existing volumetric representations from explicit point clouds to implicit neural fields, remain costly in capture, computation, and rendering, which limits their scalability for on-demand video and reduces their feasibility for real-time communication. To bridge this gap, we propose Content-Promoted Scene Layers (CPSL), a compact 2.5D video representation that brings the perceptual benefits of volumetric video to conventional 2D content. Guided by per-frame depth and content saliency, CPSL decomposes each frame into a small set of geometry-consistent layers equipped with soft alpha bands and an edge-depth cache that jointly preserve occlusion ordering and boundary continuity. These lightweight, 2D-encodable assets enable parallax-corrected novel-view synthesis via depth-weighted warping and front-to-back alpha compositing, bypassing expensive 3D reconstruction. Temporally, CPSL maintains inter-frame coherence using motion-guided propagation and per-layer encoding, supporting real-time playback with standard video codecs. Across multiple benchmarks, CPSL achieves superior perceptual quality and boundary fidelity compared with layer-based and neural-field baselines while reducing storage and rendering cost by several folds. Our approach offer a practical path from 2D video to scalable 2.5D immersive media.

CVFeb 16, 2021
A Benchmark of Ocular Disease Intelligent Recognition: One Shot for Multi-disease Detection

Ning Li, Tao Li, Chunyu Hu et al.

In ophthalmology, early fundus screening is an economic and effective way to prevent blindness caused by ophthalmic diseases. Clinically, due to the lack of medical resources, manual diagnosis is time-consuming and may delay the condition. With the development of deep learning, some researches on ophthalmic diseases have achieved good results, however, most of them are just based on one disease. During fundus screening, ophthalmologists usually give diagnoses of multi-disease on binocular fundus image, so we release a dataset with 8 diseases to meet the real medical scene, which contains 10,000 fundus images from both eyes of 5,000 patients. We did some benchmark experiments on it through some state-of-the-art deep neural networks. We found simply increasing the scale of network cannot bring good results for multi-disease classification, and a well-structured feature fusion method combines characteristics of multi-disease is needed. Through this work, we hope to advance the research of related fields.