Cheng Su

AI
h-index18
3papers
35citations
Novelty52%
AI Score29

3 Papers

AIJul 1, 2024
Hybrid RAG-empowered Multi-modal LLM for Secure Data Management in Internet of Medical Things: A Diffusion-based Contract Approach

Cheng Su, Jinbo Wen, Jiawen Kang et al.

Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape, especially with the growing integration of the Internet of Medical Things (IoMT). The rise of generative artificial intelligence has further elevated Multi-modal Large Language Models (MLLMs) as essential tools for managing and optimizing healthcare data in IoMT. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amounts of multi-modal data. However, critical challenges persist in developing medical MLLMs, including security and freshness issues of healthcare data, affecting the output quality of MLLMs. To this end, in this paper, we propose a hybrid Retrieval-Augmented Generation (RAG)-empowered medical MLLM framework for healthcare data management. This framework leverages a hierarchical cross-chain architecture to facilitate secure data training. Moreover, it enhances the output quality of MLLMs through hybrid RAG, which employs multi-modal metrics to filter various unimodal RAG results and incorporates these retrieval results as additional inputs to MLLMs. Additionally, we employ age of information to indirectly evaluate the data freshness impact of MLLMs and utilize contract theory to incentivize healthcare data holders to share their fresh data, mitigating information asymmetry during data sharing. Finally, we utilize a generative diffusion model-based deep reinforcement learning algorithm to identify the optimal contract for efficient data sharing. Numerical results demonstrate the effectiveness of the proposed schemes, which achieve secure and efficient healthcare data management.

DCJan 16, 2025
The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution

Frank Sifei Luan, Ron Yifeng Wang, Yile Gu et al.

While ML model training and inference are both GPU-intensive, CPU-based data processing is often the bottleneck. Distributed data processing systems based on the batch or stream processing models assume homogeneous resource requirements. They excel at CPU-based computation but either under-utilize heterogeneous resources or impose high overheads on failure and reconfiguration. We introduce the streaming batch model, a hybrid of batch and streaming that enables efficient and fault-tolerant heterogeneous execution. The key idea is to use partitions as the unit of execution to achieve elasticity, but to allow partitions to be dynamically created and streamed between heterogeneous operators for memory-efficient pipelining. We present Ray Data, a streaming batch system that improves throughput on heterogeneous batch inference pipelines by 2.5-12$\times$ compared to traditional batch and stream processing systems. By leveraging heterogeneous clusters, Ray Data improves training throughput for multimodal models such as Stable Diffusion by 31% compared to single-node ML data loaders.

CRJun 27, 2018
Verifying Security Protocols using Dynamic Strategies

Yan Xiong, Cheng Su, Wenchao Huang et al.

Current formal approaches have been successfully used to find design flaws in many security protocols. However, it is still challenging to automatically analyze protocols due to their large or infinite state spaces. In this paper, we propose a novel framework that can automatically verifying security protocols without any human intervention. Experimental results show that SmartVerif automatically verifies security protocols that cannot be automatically verified by existing approaches. The case study also validates the effectiveness of our dynamic strategy.