Hongyu Kan

CV
h-index9
3papers
22citations
Novelty43%
AI Score42

3 Papers

IVJul 4, 2023Code
H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Jun Shi, Hongyu Kan, Shulan Ruan et al.

Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer.

CVFeb 6
3D Transport-based Morphometry (3D-TBM) for medical image analysis

Hongyu Kan, Kristofor Pas, Ivan Medri et al.

Transport-Based Morphometry (TBM) has emerged as a new framework for 3D medical image analysis. By embedding images into a transport domain via invertible transformations, TBM facilitates effective classification, regression, and other tasks using transport-domain features. Crucially, the inverse mapping enables the projection of analytic results back into the original image space, allowing researchers to directly interpret clinical features associated with model outputs in a spatially meaningful way. To facilitate broader adoption of TBM in clinical imaging research, we present 3D-TBM, a tool designed for morphological analysis of 3D medical images. The framework includes data preprocessing, computation of optimal transport embeddings, and analytical methods such as visualization of main transport directions, together with techniques for discerning discriminating directions and related analysis methods. We also provide comprehensive documentation and practical tutorials to support researchers interested in applying 3D-TBM in their own medical imaging studies. The source code is publicly available through PyTransKit.

LGSep 29, 2025
FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers

Liang Qiao, Yue Dai, Yeqi Huang et al.

Multi-Modal Diffusion Transformers (DiTs) demonstrate exceptional capabilities in visual synthesis, yet their deployment remains constrained by substantial computational demands. To alleviate this bottleneck, many sparsity-based acceleration methods have been proposed. However, their diverse sparsity patterns often require customized kernels for high-performance inference, limiting universality. We propose FlashOmni, a unified sparse attention engine compatible with arbitrary DiT architectures. FlashOmni introduces flexible sparse symbols to standardize the representation of a wide range of sparsity strategies, such as feature caching and block-sparse skipping. This unified abstraction enables the execution of diverse sparse computations within a single attention kernel. In addition, FlashOmni designs optimized sparse GEMMs for attention blocks, leveraging sparse symbols to eliminate redundant computations and further improve efficiency. Experiments demonstrate that FlashOmni delivers near-linear, closely matching the sparsity ratio speedup (1:1) in attention and GEMM-$Q$, and achieves 2.5$\times$-3.8$\times$ acceleration in GEMM-$O$ (max peaking at about 87.5% of the theoretical limit). Applied with a multi-granularity sparsity strategy, it enables the Hunyuan model (33K) to achieve about 1.5$\times$ end-to-end acceleration without degrading visual quality.