Di Huang

h-index13

3papers

2,332citations

3 Papers

20.3CVJul 15

Conditioning Residuals for Diffusion Models via Representation Feedback

Weilai Xiang, Hongyu Yang, Di Huang et al.

Diffusion models now serve as a common foundation for multimedia generation, and useful intermediate representations emerge during their generative training. Standard architectures, however, propagate these representations through the main feature stream, without explicitly reintroducing their encoded semantics to later denoising layers. Meanwhile, such backbones already provide a conditioning pathway for global modulation by predefined inputs. This work examines whether this native pathway can also route internally inferred semantics as evolving, sample-dependent cues. We propose Conditioning Residuals, a lightweight feedback mechanism that converts aggregated features into residuals added to condition embeddings. By feeding back compact feature summaries, it provides adaptive generative guidance and encourages a tighter semantic bottleneck, without external encoders, auxiliary objectives, or sampling-time changes. It supports feedback at one or multiple depths in UNet and DiT backbones, with negligible overhead. Across diffusion formulations, backbone configurations, and datasets, experiments show consistent gains in generative performance, along with stronger representations in downstream linear probing and segmentation. Mechanistic analyses reveal improved generative training dynamics and reshaped feature structure, suggesting a grounded, generalizable way to enhance diffusion backbones from within.

2.4AIFeb 1

EvoOpt-LLM: Evolving industrial optimization models with large language models

Yiliu He, Tianle Li, Binghao Ji et al.

Optimization modeling via mixed-integer linear programming (MILP) is fundamental to industrial planning and scheduling, yet translating natural-language requirements into solver-executable models and maintaining them under evolving business rules remains highly expertise-intensive. While large language models (LLMs) offer promising avenues for automation, existing methods often suffer from low data efficiency, limited solver-level validity, and poor scalability to industrial-scale problems. To address these challenges, we present EvoOpt-LLM, a unified LLM-based framework supporting the full lifecycle of industrial optimization modeling, including automated model construction, dynamic business-constraint injection, and end-to-end variable pruning. Built on a 7B-parameter LLM and adapted via parameter-efficient LoRA fine-tuning, EvoOpt-LLM achieves a generation rate of 91% and an executability rate of 65.9% with only 3,000 training samples, with critical performance gains emerging under 1,500 samples. The constraint injection module reliably augments existing MILP models while preserving original objectives, and the variable pruning module enhances computational efficiency, achieving an F1 score of ~0.56 on medium-sized LP models with only 400 samples. EvoOpt-LLM demonstrates a practical, data-efficient approach to industrial optimization modeling, reducing reliance on expert intervention while improving adaptability and solver efficiency.

6.5CVDec 9, 2024

World-Consistent Data Generation for Vision-and-Language Navigation

Yu Zhong, Rui Zhang, Zihao Zhang et al.

Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through photorealistic environments following natural-language instructions. One main obstacle existing in VLN is data scarcity, leading to poor generalization performance over unseen environments. Though data argumentation is a promising way for scaling up the dataset, how to generate VLN data both diverse and world-consistent remains problematic. To cope with this issue, we propose the world-consistent data generation (WCGEN), an efficacious data-augmentation framework satisfying both diversity and world-consistency, aimed at enhancing the generalization of agents to novel environments. Roughly, our framework consists of two stages, the trajectory stage which leverages a point-cloud based technique to ensure spatial coherency among viewpoints, and the viewpoint stage which adopts a novel angle synthesis method to guarantee spatial and wraparound consistency within the entire observation. By accurately predicting viewpoint changes with 3D knowledge, our approach maintains the world-consistency during the generation procedure. Experiments on a wide range of datasets verify the effectiveness of our method, demonstrating that our data augmentation strategy enables agents to achieve new state-of-the-art results on all navigation tasks, and is capable of enhancing the VLN agents' generalization ability to unseen environments.