Kun Tian

CV
h-index59
7papers
45citations
Novelty60%
AI Score55

7 Papers

CVMay 27, 2025Code
RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy

Aiyue Chen, Bin Dong, Jingru Li et al.

Video generation using diffusion models is highly computationally intensive, with 3D attention in Diffusion Transformer (DiT) models accounting for over 80\% of the total computational resources. In this work, we introduce {\bf RainFusion}, a novel training-free sparse attention method that exploits inherent sparsity nature in visual data to accelerate attention computation while preserving video quality. Specifically, we identify three unique sparse patterns in video generation attention calculations--Spatial Pattern, Temporal Pattern and Textural Pattern. The sparse pattern for each attention head is determined online with negligible overhead (\textasciitilde\,0.2\%) with our proposed {\bf ARM} (Adaptive Recognition Module) during inference. Our proposed {\bf RainFusion} is a plug-and-play method, that can be seamlessly integrated into state-of-the-art 3D-attention video generation models without additional training or calibration. We evaluate our method on leading open-sourced models including HunyuanVideo, OpenSoraPlan-1.2 and CogVideoX-5B, demonstrating its broad applicability and effectiveness. Experimental results show that RainFusion achieves over {\bf 2\(\times\)} speedup in attention computation while maintaining video quality, with only a minimal impact on VBench scores (-0.2\%).

58.2CRMar 18
Linearly Homomorphic Signature with Tight Security on Lattice

Heng Guo, Fengxia Liu, Kun Tian et al.

Constructing cryptographic schemes with tight or almost-tight security has long been one of the central problems in theoretical cryptography. At ASIACRYPT 2016, Boyen and Li posed an open problem: whether it is possible to construct a homomorphic signature scheme with tight or almost-tight security under the Short Integer Solution (SIS) assumption in the standard model. In 2024, Chen achieved the first construction with almost-tight security under a weaker security model. To further achieve tight security in the standard model, this paper introduces a new security model whose security requirements are weaker than those of the standard adaptive model but stronger than the model adopted by Chen. Under this model, we construct a linearly homomorphic signature scheme with tight security.

88.2CRMar 27
Linearly Homomorphic Ring Signature Scheme over Lattices

Heng Guo, Jia Li, Yanan Wang et al.

Construct the first provably secure linear homomorphic ring signature scheme. Ring signatures allow a signer to anonymously sign a message on behalf of a user group (ring) and are widely applied in areas such as identity protection, electronic voting, and privacy enhancement in blockchain. Homomorphic signatures, on the other hand, support verifiable computations on signed data. The integration of anonymity and computability in homomorphic ring signatures holds the potential to create new application scenarios for privacy-preserving distributed systems. It is worth noting that Choi and Kim first introduced the concept of linear homomorphic ring signatures in 2017 and proposed a specific scheme. However, their scheme lacks a complete security proof, leaving its security theoretically unconfirmed. To address this research gap, this paper presents the first provably secure lattice-based linear homomorphic ring signature scheme, designed for scenarios where the ring size is O(log n). This scheme not only combines the anonymity of ring signatures with the malleability of homomorphic signatures but also achieves resistance against quantum attacks.

IVOct 15, 2024Code
Deep unrolled primal dual network for TOF-PET list-mode image reconstruction

Rui Hu, Chenxu Li, Kun Tian et al.

Time-of-flight (TOF) information provides more accurate location data for annihilation photons, thereby enhancing the quality of PET reconstruction images and reducing noise. List-mode reconstruction has a significant advantage in handling TOF information. However, current advanced TOF PET list-mode reconstruction algorithms still require improvements when dealing with low-count data. Deep learning algorithms have shown promising results in PET image reconstruction. Nevertheless, the incorporation of TOF information poses significant challenges related to the storage space required by deep learning methods, particularly for the advanced deep unrolled methods. In this study, we propose a deep unrolled primal dual network for TOF-PET list-mode reconstruction. The network is unrolled into multiple phases, with each phase comprising a dual network for list-mode domain updates and a primal network for image domain updates. We utilize CUDA for parallel acceleration and computation of the system matrix for TOF list-mode data, and we adopt a dynamic access strategy to mitigate memory consumption. Reconstructed images of different TOF resolutions and different count levels show that the proposed method outperforms the LM-OSEM, LM-EMTV, LM-SPDHG,LM-SPDHG-TV and FastPET method in both visually and quantitative analysis. These results demonstrate the potential application of deep unrolled methods for TOF-PET list-mode data and show better performance than current mainstream TOF-PET list-mode reconstruction algorithms, providing new insights for the application of deep learning methods in TOF list-mode data. The codes for this work are available at https://github.com/RickHH/LMPDnet

39.9QUANT-PHApr 26
Efficient Quantum Fully Homomorphic Encryption

Fengxia Liu, Zixian Gong, Kun Tian et al.

Quantum fully homomorphic encryption (QFHE) promises secure delegated quantum computation but has been impeded by the prohibitive quantum resource demands of existing constructions. This paper introduces a unified framework that achieves an \textbf{exponential improvement} in efficiency by synergistically integrating three theoretical tools: \textbf{modular arithmetic programs (MAP)}, the \textbf{garden-hose model}, and \textbf{measurement-based quantum computation (MBQC)}. Our central innovation is a novel MAP tailored to the algebraic structure of Learning-with-Errors (LWE) decryption. Unlike generic approaches that incur exponential overhead, our MAP computes the inner product $\langle \boldsymbol{sk}, \boldsymbol{c} \rangle \bmod q$ by tracking a partial sum modulo $q$, requiring only $O(\log q)$ bits of state width. This yields branching programs of width $O(\log λ)$ and length $O(λ\log λ)$, thereby reducing the size of the essential quantum gadget from $O(λ^{2.58})$ to $O(λ\log^2 λ)$ EPR pairs -- a concrete improvement factor of $2^{15}$ to $2^{18}$ for standard security parameters. Critically, we demonstrate that LWE decryption is not a \textbf{symmetric function}, necessitating our specialized MAP design beyond prior symmetric-function optimizations. The framework provides a direct mapping from the MAP to an efficient gadget via the garden-hose model, with MBQC furnishing the deterministic control flow for homomorphic evaluation. The resulting QFHE scheme supports \textbf{fully classical clients}, relies solely on the \textbf{classical LWE assumption} (avoiding circular security or quantum hardness assumptions), and maintains compactness. This work dramatically lowers the quantum resource barrier for practical QFHE, paving the way for realistic privacy-preserving quantum cloud computing.

CVMar 30, 2024
Reusable Architecture Growth for Continual Stereo Matching

Chenghao Zhang, Gaofeng Meng, Bin Fan et al.

The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to learn new scenes continually even more crucial. For this purpose, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at inference. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. RAG leverages task-specific neural unit search and architecture growth to learn new scenes continually in both supervised and self-supervised manners. It can maintain high reusability during growth by reusing previous units while obtaining good performance. Additionally, we present a Scene Router module to adaptively select the scene-specific architecture path at inference. Comprehensive experiments on numerous datasets show that our framework performs impressively in various weather, road, and city circumstances and surpasses the state-of-the-art methods in more challenging cross-dataset settings. Further experiments also demonstrate the adaptability of our method to unseen scenes, which can facilitate end-to-end stereo architecture learning and practical deployment.

CVSep 29, 2025
StreamForest: Efficient Online Video Understanding with Persistent Event Memory

Xiangyu Zeng, Kefan Qiu, Qingyu Zhang et al.

Multimodal Large Language Models (MLLMs) have recently achieved remarkable progress in video understanding. However, their effectiveness in real-time streaming scenarios remains limited due to storage constraints of historical visual features and insufficient real-time spatiotemporal reasoning. To address these challenges, we propose StreamForest, a novel architecture specifically designed for streaming video understanding. Central to StreamForest is the Persistent Event Memory Forest, a memory mechanism that adaptively organizes video frames into multiple event-level tree structures. This process is guided by penalty functions based on temporal distance, content similarity, and merge frequency, enabling efficient long-term memory retention under limited computational resources. To enhance real-time perception, we introduce a Fine-grained Spatiotemporal Window, which captures detailed short-term visual cues to improve current scene perception. Additionally, we present OnlineIT, an instruction-tuning dataset tailored for streaming video tasks. OnlineIT significantly boosts MLLM performance in both real-time perception and future prediction. To evaluate generalization in practical applications, we introduce ODV-Bench, a new benchmark focused on real-time streaming video understanding in autonomous driving scenarios. Experimental results demonstrate that StreamForest achieves the state-of-the-art performance, with accuracies of 77.3% on StreamingBench, 60.5% on OVBench, and 55.6% on OVO-Bench. In particular, even under extreme visual token compression (limited to 1024 tokens), the model retains 96.8% of its average accuracy in eight benchmarks relative to the default setting. These results underscore the robustness, efficiency, and generalizability of StreamForest for streaming video understanding.