Hao Li

h-index21

6papers

49citations

Novelty56%

AI Score46

Ranked #36,528 of 194,257 authors (top 19%)#2,081 in AI (top 17%)

6 Papers

23.0CLAug 18, 2025Code

Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing

Yiqun Zhang, Hao Li, Jianhao Chen et al.

Balancing performance and efficiency is a central challenge in large language model (LLM) advancement. GPT-5 addresses this with test-time routing, dynamically assigning queries to either an efficient or a high-capacity model during inference. In this work, we present Avengers-Pro, a test-time routing framework that ensembles LLMs of varying capacities and efficiencies, providing a unified solution for all performance-efficiency tradeoffs. The Avengers-Pro embeds and clusters incoming queries, then routes each to the most suitable model based on a performance-efficiency score. Across 6 challenging benchmarks and 8 leading models -- including GPT-5-medium, Gemini-2.5-pro, and Claude-opus-4.1 -- Avengers-Pro achieves state-of-the-art results: by varying a performance-efficiency trade-off parameter, it can surpass the strongest single model (GPT-5-medium) by +7% in average accuracy. Moreover, it can match the average accuracy of the strongest single model at 27% lower cost, and reach ~90% of that performance at 63% lower cost. Last but not least, it achieves a Pareto frontier, consistently yielding the highest accuracy for any given cost, and the lowest cost for any given accuracy, among all single models. Code is available at https://github.com/ZhangYiqun018/AvengersPro.

5.1IVNov 16, 2025Code

DEMIST: \underline{DE}coupled \underline{M}ulti-stream latent d\underline{I}ffusion for Quantitative Myelin Map \underline{S}yn\underline{T}hesis

Jiacheng Wang, Hao Li, Xing Yao et al.

Quantitative magnetization transfer (qMT) imaging provides myelin-sensitive biomarkers, such as the pool size ratio (PSR), which is valuable for multiple sclerosis (MS) assessment. However, qMT requires specialized 20-30 minute scans. We propose DEMIST to synthesize PSR maps from standard T1w and FLAIR images using a 3D latent diffusion model with three complementary conditioning mechanisms. Our approach has two stages: first, we train separate autoencoders for PSR and anatomical images to learn aligned latent representations. Second, we train a conditional diffusion model in this latent space on top of a frozen diffusion foundation backbone. Conditioning is decoupled into: (i) \textbf{semantic} tokens via cross-attention, (ii) \textbf{spatial} per-scale residual hints via a 3D ControlNet branch, and (iii) \textbf{adaptive} LoRA-modulated attention. We include edge-aware loss terms to preserve lesion boundaries and alignment losses to maintain quantitative consistency, while keeping the number of trainable parameters low and retaining the inductive bias of the pretrained model. We evaluate on 163 scans from 99 subjects using 5-fold cross-validation. Our method outperforms VAE, GAN and diffusion baselines on multiple metrics, producing sharper boundaries and better quantitative agreement with ground truth. Our code is publicly available at https://github.com/MedICL-VU/MS-Synthesis-3DcLDM.

14.7AIDec 18, 2024

LLMs can Realize Combinatorial Creativity: Generating Creative Ideas via LLMs for Scientific Research

Tianyang Gu, Jingjin Wang, Zhihao Zhang et al.

Scientific idea generation has been extensively studied in creativity theory and computational creativity research, providing valuable frameworks for understanding and implementing creative processes. However, recent work using Large Language Models (LLMs) for research idea generation often overlooks these theoretical foundations. We present a framework that explicitly implements combinatorial creativity theory using LLMs, featuring a generalization-level retrieval system for cross-domain knowledge discovery and a structured combinatorial process for idea generation. The retrieval system maps concepts across different abstraction levels to enable meaningful connections between disparate domains, while the combinatorial process systematically analyzes and recombines components to generate novel solutions. Experiments on the OAG-Bench dataset demonstrate our framework's effectiveness, consistently outperforming baseline approaches in generating ideas that align with real research developments (improving similarity scores by 7\%-10\% across multiple metrics). Our results provide strong evidence that LLMs can effectively realize combinatorial creativity when guided by appropriate theoretical frameworks, contributing both to practical advancement of AI-assisted research and theoretical understanding of machine creativity.

6.1CLDec 23, 2024

DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak

Hao Wang, Hao Li, Junda Zhu et al.

Large Language Models (LLMs) are susceptible to generating harmful content when prompted with carefully crafted inputs, a vulnerability known as LLM jailbreaking. As LLMs become more powerful, studying jailbreak methods is critical to enhancing security and aligning models with human values. Traditionally, jailbreak techniques have relied on suffix addition or prompt templates, but these methods suffer from limited attack diversity. This paper introduces DiffusionAttacker, an end-to-end generative approach for jailbreak rewriting inspired by diffusion models. Our method employs a sequence-to-sequence (seq2seq) text diffusion model as a generator, conditioning on the original prompt and guiding the denoising process with a novel attack loss. Unlike previous approaches that use autoregressive LLMs to generate jailbreak prompts, which limit the modification of already generated tokens and restrict the rewriting space, DiffusionAttacker utilizes a seq2seq diffusion model, allowing more flexible token modifications. This approach preserves the semantic content of the original prompt while producing harmful content. Additionally, we leverage the Gumbel-Softmax technique to make the sampling process from the diffusion model's output distribution differentiable, eliminating the need for iterative token search. Extensive experiments on Advbench and Harmbench demonstrate that DiffusionAttacker outperforms previous methods across various evaluation metrics, including attack success rate (ASR), fluency, and diversity.

4.2AIApr 8, 2024Code

360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System

Shen Gao, Hao Li, Chengrui Huang et al.

Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance. In this paper, we propose Reusable Experience Accumulation with 360$^\circ$ Assessment (360$^\circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices. The framework employs a novel 360$^\circ$ performance assessment method for multi-perspective performance evaluation with fine-grained assessment. To enhance the capability of agents in addressing complex tasks, we introduce dual-level experience pool for agents to accumulate experience through fine-grained assessment. Extensive experiments on complex task datasets demonstrate the effectiveness of 360$^\circ$REA.

2.0CVDec 30, 2024

E2ED^2:Direct Mapping from Noise to Data for Enhanced Diffusion Models

Zhiyu Tan, WenXu Qian, Hesen Chen et al.

Diffusion models have established themselves as the de facto primary paradigm in visual generative modeling, revolutionizing the field through remarkable success across various diverse applications ranging from high-quality image synthesis to temporal aware video generation. Despite these advancements, three fundamental limitations persist, including 1) discrepancy between training and inference processes, 2) progressive information leakage throughout the noise corruption procedures, and 3) inherent constraints preventing effective integration of modern optimization criteria like perceptual and adversarial loss. To mitigate these critical challenges, we in this paper present a novel end-to-end learning paradigm that establishes direct optimization from the final generated samples to initial noises. Our proposed End-to-End Differentiable Diffusion, dubbed E2ED^2, introduces several key improvements: it eliminates the sequential training-sampling mismatch and intermediate information leakage via conceptualizing training as a direct transformation from isotropic Gaussian noise to the target data distribution. Additionally, such training framework enables seamless incorporation of adversarial and perceptual losses into the core optimization objective. Comprehensive evaluation across standard benchmarks including COCO30K and HW30K reveals that our method achieves substantial performance gains in terms of Fréchet Inception Distance (FID) and CLIP score, even with fewer sampling steps (less than 4). Our findings highlight that the end-to-end mechanism might pave the way for more robust and efficient solutions, \emph{i.e.,} combining diffusion stability with GAN-like discriminative optimization in an end-to-end manner.