Xun Gao

QUANT-PH
h-index12
10papers
176citations
Novelty62%
AI Score53

10 Papers

QUANT-PHJul 5, 2022
Many-body localized hidden generative models

Weishun Zhong, Xun Gao, Susanne F. Yelin et al.

Born machines are quantum-inspired generative models that leverage the probabilistic nature of quantum states. Here, we present a new architecture called many-body localized (MBL) hidden Born machine that utilizes both MBL dynamics and hidden units as learning resources. We show that the hidden units act as an effective thermal bath that enhances the trainability of the system, while the MBL dynamics stabilize the training trajectories. We numerically demonstrate that the MBL hidden Born machine is capable of learning a variety of tasks, including a toy version of MNIST handwritten digits, quantum data obtained from quantum many-body states, and non-local parity data. Our architecture and algorithm provide novel strategies of utilizing quantum many-body systems as learning resources, and reveal a powerful connection between disorder, interaction, and learning in quantum many-body systems.

72.8LGMay 6
Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

Yifan F. Zhang, Fangjun Hu, Guangkuo Liu et al.

Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic minima of the energy landscape, whereas the nonlocality picture views the critical window as when local denoising fails. We study whether two notions of such phase transitions are concurrent in modern diffusion transformers. By evaluating the dynamics and outcomes of the generation trajectory, we observe a near-simultaneous occurrence of the non-locality and symmetry breaking critical times. Our work is the first to unify the two notions of phase transitions in practice: it provides a concrete diagnostic for when and why diffusion models rely on conditioning and global denoising, enabling principled evaluation of model efficiency and guiding the design of architectures and sampling schemes that avoid unnecessary computation.

QUANT-PHFeb 13, 2024
Arbitrary Polynomial Separations in Trainable Quantum Machine Learning

Eric R. Anschuetz, Xun Gao

Recent theoretical results in quantum machine learning have demonstrated a general trade-off between the expressive power of quantum neural networks (QNNs) and their trainability; as a corollary of these results, practical exponential separations in expressive power over classical machine learning models are believed to be infeasible as such QNNs take a time to train that is exponential in the model size. We here circumvent these negative results by constructing a hierarchy of efficiently trainable QNNs that exhibit unconditionally provable, polynomial memory separations of arbitrary constant degree over classical neural networks -- including state-of-the-art models, such as Transformers -- in performing a classical sequence modeling task. This construction is also computationally efficient, as each unit cell of the introduced class of QNNs only has constant gate complexity. We show that contextuality -- informally, a quantitative notion of semantic ambiguity -- is the source of the expressivity separation, suggesting that other learning tasks with this property may be a natural setting for the use of quantum learning algorithms.

91.2QUANT-PHApr 1
Learning and Generating Mixed States Prepared by Shallow Channel Circuits

Fangjun Hu, Christian Kokail, Milan Kornjača et al.

Learning quantum states from measurement data is a central problem in quantum information and computational complexity. In this work, we study the problem of learning to generate mixed states on a finite-dimensional lattice. Motivated by recent developments in mixed state phases of matter, we focus on arbitrary states in the trivial phase. A state belongs to the trivial phase if there exists a shallow preparation channel circuit under which local reversibility is preserved throughout the preparation. We prove that any mixed state in this class can be efficiently learned from measurement access alone. Specifically, given copies of an unknown trivial phase mixed state, our algorithm outputs a shallow local channel circuit that approximately generates this state in trace distance. The sample complexity and runtime are polynomial (or quasi-polynomial) in the number of qubits, assuming constant (or polylogarithmic) circuit depth and gate locality. Importantly, the learner is not given the original preparation circuit and relies only on its existence. Our results provide a structural foundation for quantum generative models based on shallow channel circuits. In the classical limit, our framework also inspires an efficient algorithm for classical diffusion models using only a polynomial overhead of training and generation.

LGAug 8, 2025
Local Diffusion Models and Phases of Data Distributions

Fangjun Hu, Guangkuo Liu, Yifan Zhang et al.

As a class of generative artificial intelligence frameworks inspired by statistical physics, diffusion models have shown extraordinary performance in synthesizing complicated data distributions through a denoising process gradually guided by score functions. Real-life data, like images, is often spatially structured in low-dimensional spaces. However, ordinary diffusion models ignore this local structure and learn spatially global score functions, which are often computationally expensive. In this work, we introduce a new perspective on the phases of data distributions, which provides insight into constructing local denoisers with reduced computational costs. We define two distributions as belonging to the same data distribution phase if they can be mutually connected via spatially local operations such as local denoisers. Then, we show that the reverse denoising process consists of an early trivial phase and a late data phase, sandwiching a rapid phase transition where local denoisers must fail. To diagnose such phase transitions, we prove an information-theoretic bound on the fidelity of local denoisers based on conditional mutual information, and conduct numerical experiments in a real-world dataset. This work suggests simpler and more efficient architectures of diffusion models: far from the phase transition point, we can use small local neural networks to compute the score function; global neural networks are only necessary around the narrow time interval of phase transitions. This result also opens up new directions for studying phases of data distributions, the broader science of generative artificial intelligence, and guiding the design of neural networks inspired by physics concepts.

CVMar 29, 2025
FreeInv: Free Lunch for Improving DDIM Inversion

Yuxiang Bao, Huijie Liu, Xun Gao et al.

Naive DDIM inversion process usually suffers from a trajectory deviation issue, i.e., the latent trajectory during reconstruction deviates from the one during inversion. To alleviate this issue, previous methods either learn to mitigate the deviation or design cumbersome compensation strategy to reduce the mismatch error, exhibiting substantial time and computation cost. In this work, we present a nearly free-lunch method (named FreeInv) to address the issue more effectively and efficiently. In FreeInv, we randomly transform the latent representation and keep the transformation the same between the corresponding inversion and reconstruction time-step. It is motivated from a statistical perspective that an ensemble of DDIM inversion processes for multiple trajectories yields a smaller trajectory mismatch error on expectation. Moreover, through theoretical analysis and empirical study, we show that FreeInv performs an efficient ensemble of multiple trajectories. FreeInv can be freely integrated into existing inversion-based image and video editing techniques. Especially for inverting video sequences, it brings more significant fidelity and efficiency improvements. Comprehensive quantitative and qualitative evaluation on PIE benchmark and DAVIS dataset shows that FreeInv remarkably outperforms conventional DDIM inversion, and is competitive among previous state-of-the-art inversion methods, with superior computation efficiency.

LGJun 26, 2025
DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster

Ji Qi, WenPeng Zhu, Li Li et al.

The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we conduct training on slow networks and thereby unleash the power of decentralized clusters when dealing with models exceeding 100 billion parameters? In this paper, we propose DiLoCoX, a low-communication large-scale decentralized cluster training framework. It combines Pipeline Parallelism with Dual Optimizer Policy, One-Step-Delay Overlap of Communication and Local Training, and an Adaptive Gradient Compression Scheme. This combination significantly improves the scale of parameters and the speed of model pre-training. We justify the benefits of one-step-delay overlap of communication and local training, as well as the adaptive gradient compression scheme, through a theoretical analysis of convergence. Empirically, we demonstrate that DiLoCoX is capable of pre-training a 107B foundation model over a 1Gbps network. Compared to vanilla AllReduce, DiLoCoX can achieve a 357x speedup in distributed training while maintaining negligible degradation in model convergence. To the best of our knowledge, this is the first decentralized training framework successfully applied to models with over 100 billion parameters.

CVSep 23, 2021
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark

Xun Gao, Yin Zhao, Jie Zhang et al.

Recognizing the emotional state of people is a basic but challenging task in video understanding. In this paper, we propose a new task in this field, named Pairwise Emotional Relationship Recognition (PERR). This task aims to recognize the emotional relationship between the two interactive characters in a given video clip. It is different from the traditional emotion and social relation recognition task. Varieties of information, consisting of character appearance, behaviors, facial emotions, dialogues, background music as well as subtitles contribute differently to the final results, which makes the task more challenging but meaningful in developing more advanced multi-modal models. To facilitate the task, we develop a new dataset called Emotional RelAtionship of inTeractiOn (ERATO) based on dramas and movies. ERATO is a large-scale multi-modal dataset for PERR task, which has 31,182 video clips, lasting about 203 video hours. Different from the existing datasets, ERATO contains interaction-centric videos with multi-shots, varied video length, and multiple modalities including visual, audio and text. As a minor contribution, we propose a baseline model composed of Synchronous Modal-Temporal Attention (SMTA) unit to fuse the multi-modal information for the PERR task. In contrast to other prevailing attention mechanisms, our proposed SMTA can steadily improve the performance by about 1\%. We expect the ERATO as well as our proposed SMTA to open up a new way for PERR task in video understanding and further improve the research of multi-modal fusion methodology.

QUANT-PHJan 20, 2021
Enhancing Generative Models via Quantum Correlations

Xun Gao, Eric R. Anschuetz, Sheng-Tao Wang et al.

Generative modeling using samples drawn from the probability distribution constitutes a powerful approach for unsupervised machine learning. Quantum mechanical systems can produce probability distributions that exhibit quantum correlations which are difficult to capture using classical models. We show theoretically that such quantum correlations provide a powerful resource for generative modeling. In particular, we provide an unconditional proof of separation in expressive power between a class of widely-used generative models, known as Bayesian networks, and its minimal quantum extension. We show that this expressivity advantage is associated with quantum nonlocality and quantum contextuality. Furthermore, we numerically test this separation on standard machine learning data sets and show that it holds for practical problems. The possibility of quantum advantage demonstrated in this work not only sheds light on the design of useful quantum machine learning protocols but also provides inspiration to draw on ideas from quantum foundations to improve purely classical algorithms.

QUANT-PHNov 6, 2017
An efficient quantum algorithm for generative machine learning

Xun Gao, Zhengyu Zhang, Luming Duan

A central task in the field of quantum computing is to find applications where quantum computer could provide exponential speedup over any classical computer. Machine learning represents an important field with broad applications where quantum computer may offer significant speedup. Several quantum algorithms for discriminative machine learning have been found based on efficient solving of linear algebraic problems, with potential exponential speedup in runtime under the assumption of effective input from a quantum random access memory. In machine learning, generative models represent another large class which is widely used for both supervised and unsupervised learning. Here, we propose an efficient quantum algorithm for machine learning based on a quantum generative model. We prove that our proposed model is exponentially more powerful to represent probability distributions compared with classical generative models and has exponential speedup in training and inference at least for some instances under a reasonable assumption in computational complexity theory. Our result opens a new direction for quantum machine learning and offers a remarkable example in which a quantum algorithm shows exponential improvement over any classical algorithm in an important application field.