CVSep 26, 2024
Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising ScheduleHongtao Huang, Xiaojun Chang, Lina Yao
Diffusion models are cutting-edge generative models adept at producing diverse, high-quality images. Despite their effectiveness, these models often require significant computational resources owing to their numerous sequential denoising steps and the significant inference cost of each step. Recently, Neural Architecture Search (NAS) techniques have been employed to automatically search for faster generation processes. However, NAS for diffusion is inherently time-consuming as it requires estimating thousands of diffusion models to search for the optimal one. In this paper, we introduce Flexiffusion, a novel training-free NAS paradigm designed to accelerate diffusion models by concurrently optimizing generation steps and network structures. Specifically, we partition the generation process into isometric step segments, each sequentially composed of a full step, multiple partial steps, and several null steps. The full step computes all network blocks, while the partial step involves part of the blocks, and the null step entails no computation. Flexiffusion autonomously explores flexible step combinations for each segment, substantially reducing search costs and enabling greater acceleration compared to the state-of-the-art (SOTA) method for diffusion models. Our searched models reported speedup factors of $2.6\times$ and $1.5\times$ for the original LDM-4-G and the SOTA, respectively. The factors for Stable Diffusion V1.5 and the SOTA are $5.1\times$ and $2.0\times$. We also verified the performance of Flexiffusion on multiple datasets, and positive experiment results indicate that Flexiffusion can effectively reduce redundancy in diffusion models.
IROct 29, 2024
Dual Conditional Diffusion Models for Sequential RecommendationHongtao Huang, Chengkai Huang, Tong Yu et al.
Recent advancements in diffusion models have shown promising results in sequential recommendation (SR). Existing approaches predominantly rely on implicit conditional diffusion models, which compress user behaviors into a single representation during the forward diffusion process. While effective to some extent, this oversimplification often leads to the loss of sequential and contextual information, which is critical for understanding user behavior. Moreover, explicit information, such as user-item interactions or sequential patterns, remains underutilized, despite its potential to directly guide the recommendation process and improve precision. However, combining implicit and explicit information is non-trivial, as it requires dynamically integrating these complementary signals while avoiding noise and irrelevant patterns within user behaviors. To address these challenges, we propose Dual Conditional Diffusion Models for Sequential Recommendation (DCRec), which effectively integrates implicit and explicit information by embedding dual conditions into both the forward and reverse diffusion processes. This allows the model to retain valuable sequential and contextual information while leveraging explicit user-item interactions to guide the recommendation process. Specifically, we introduce the Dual Conditional Diffusion Transformer (DCDT), which employs a cross-attention mechanism to dynamically integrate explicit signals throughout the diffusion stages, ensuring contextual understanding and minimizing the influence of irrelevant patterns. This design enables precise and contextually relevant recommendations. Extensive experiments on public benchmark datasets demonstrate that DCRec significantly outperforms state-of-the-art methods in both accuracy and computational efficiency.
IRApr 23, 2025
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic ParadigmsChengkai Huang, Hongtao Huang, Tong Yu et al. · amazon-science
Recommender systems (RS) have become essential in filtering information and personalizing content for users. RS techniques have traditionally relied on modeling interactions between users and items as well as the features of content using models specific to each task. The emergence of foundation models (FMs), large scale models trained on vast amounts of data such as GPT, LLaMA and CLIP, is reshaping the recommendation paradigm. This survey provides a comprehensive overview of the Foundation Models for Recommender Systems (FM4RecSys), covering their integration in three paradigms: (1) Feature-Based augmentation of representations, (2) Generative recommendation approaches, and (3) Agentic interactive systems. We first review the data foundations of RS, from traditional explicit or implicit feedback to multimodal content sources. We then introduce FMs and their capabilities for representation learning, natural language understanding, and multi-modal reasoning in RS contexts. The core of the survey discusses how FMs enhance RS under different paradigms. Afterward, we examine FM applications in various recommendation tasks. Through an analysis of recent research, we highlight key opportunities that have been realized as well as challenges encountered. Finally, we outline open research directions and technical challenges for next-generation FM4RecSys. This survey not only reviews the state-of-the-art methods but also provides a critical analysis of the trade-offs among the feature-based, the generative, and the agentic paradigms, outlining key open issues and future research directions.
IRAug 30, 2025
Beyond Negative Transfer: Disentangled Preference-Guided Diffusion for Cross-Domain Sequential RecommendationXiaoxin Ye, Chengkai Huang, Hongtao Huang et al.
Cross-Domain Sequential Recommendation (CDSR) leverages user behaviors across domains to enhance recommendation quality. However, naive aggregation of sequential signals can introduce conflicting domain-specific preferences, leading to negative transfer. While Sequential Recommendation (SR) already suffers from noisy behaviors such as misclicks and impulsive actions, CDSR further amplifies this issue due to domain heterogeneity arising from diverse item types and user intents. The core challenge is disentangling three intertwined signals: domain-invariant preferences, domain-specific preferences, and noise. Diffusion Models (DMs) offer a generative denoising framework well-suited for disentangling complex user preferences and enhancing robustness to noise. Their iterative refinement process enables gradual denoising, making them effective at capturing subtle preference signals. However, existing applications in recommendation face notable limitations: sequential DMs often conflate shared and domain-specific preferences, while cross-domain collaborative filtering DMs neglect temporal dynamics, limiting their ability to model evolving user preferences. To bridge these gaps, we propose \textbf{DPG-Diff}, a novel Disentangled Preference-Guided Diffusion Model, the first diffusion-based approach tailored for CDSR, to or best knowledge. DPG-Diff decomposes user preferences into domain-invariant and domain-specific components, which jointly guide the reverse diffusion process. This disentangled guidance enables robust cross-domain knowledge transfer, mitigates negative transfer, and filters sequential noise. Extensive experiments on real-world datasets demonstrate that DPG-Diff consistently outperforms state-of-the-art baselines across multiple metrics.
CVJun 3, 2025
Flexiffusion: Training-Free Segment-Wise Neural Architecture Search for Efficient Diffusion ModelsHongtao Huang, Xiaojun Chang, Lina Yao
Diffusion models (DMs) are powerful generative models capable of producing high-fidelity images but are constrained by high computational costs due to iterative multi-step inference. While Neural Architecture Search (NAS) can optimize DMs, existing methods are hindered by retraining requirements, exponential search complexity from step-wise optimization, and slow evaluation relying on massive image generation. To address these challenges, we propose Flexiffusion, a training-free NAS framework that jointly optimizes generation schedules and model architectures without modifying pre-trained parameters. Our key insight is to decompose the generation process into flexible segments of equal length, where each segment dynamically combines three step types: full (complete computation), partial (cache-reused computation), and null (skipped computation). This segment-wise search space reduces the candidate pool exponentially compared to step-wise NAS while preserving architectural diversity. Further, we introduce relative FID (rFID), a lightweight evaluation metric for NAS that measures divergence from a teacher model's outputs instead of ground truth, slashing evaluation time by over $90\%$. In practice, Flexiffusion achieves at least $2\times$ acceleration across LDMs, Stable Diffusion, and DDPMs on ImageNet and MS-COCO, with FID degradation under $5\%$, outperforming prior NAS and caching methods. Notably, it attains $5.1\times$ speedup on Stable Diffusion with near-identical CLIP scores. Our work pioneers a resource-efficient paradigm for searching high-speed DMs without sacrificing quality.
LGFeb 21, 2024
MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via Automating Deep Neural Network Porting for Mobile DeploymentHongtao Huang, Xiaojun Chang, Wen Hu et al.
Recent years have seen the explosion of edge intelligence with powerful Deep Neural Networks (DNNs). One popular scheme is training DNNs on powerful cloud servers and subsequently porting them to mobile devices after being lightweight. Conventional approaches manually specialized DNNs for various edge platforms and retrain them with real-world data. However, as the number of platforms increases, these approaches become labour-intensive and computationally prohibitive. Additionally, real-world data tends to be sparse-label, further increasing the difficulty of lightweight models. In this paper, we propose MatchNAS, a novel scheme for porting DNNs to mobile devices. Specifically, we simultaneously optimise a large network family using both labelled and unlabelled data and then automatically search for tailored networks for different hardware platforms. MatchNAS acts as an intermediary that bridges the gap between cloud-based DNNs and edge-based DNNs.