Kiyoung Seong

LG
h-index4
4papers
13citations
Novelty51%
AI Score40

4 Papers

LGFeb 23
Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

Kiyoung Seong, Sungsoo Ahn, Sehui Han et al.

Crystal modeling spans a family of conditional and unconditional generation tasks across different modalities, including crystal structure prediction (CSP) and \emph{de novo} generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across different generation tasks. To address this limitation, we propose \emph{Multimodal Crystal Flow (MCFlow)}, a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting strong compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that MCFlow achieves competitive performance against task-specific baselines across multiple crystal generation tasks.

LGJul 10, 2025
Learning Collective Variables from Time-lagged Generation

Seonghyun Park, Kiyoung Seong, Soojung Yang et al.

Rare events such as state transitions are difficult to observe directly with molecular dynamics simulations due to long timescales. Enhanced sampling techniques overcome this by introducing biases along carefully chosen low-dimensional features, known as collective variables (CVs), which capture the slow degrees of freedom. Machine learning approaches (MLCVs) have automated CV discovery, but existing methods typically focus on discriminating meta-stable states without fully encoding the detailed dynamics essential for accurate sampling. We propose TLC, a framework that learns CVs directly from time-lagged conditions of a generative model. Instead of modeling the static Boltzmann distribution, TLC models a time-lagged conditional distribution yielding CVs to capture the slow dynamic behavior. We validate TLC on the Alanine Dipeptide system using two CV-based enhanced sampling tasks: (i) steered molecular dynamics (SMD) and (ii) on-the-fly probability enhanced sampling (OPES), demonstrating equal or superior performance compared to existing MLCV methods in both transition path sampling and state discrimination.

LGMay 26, 2025
Energy-based generator matching: A neural sampler for general state space

Dongyeop Woo, Minsu Kim, Minkyu Kim et al.

We propose Energy-based generator matching (EGM), a modality-agnostic approach to train generative models from energy functions in the absence of data. Extending the recently proposed generator matching, EGM enables training of arbitrary continuous-time Markov processes, e.g., diffusion, flow, and jump, and can generate data from continuous, discrete, and a mixture of two modalities. To this end, we propose estimating the generator matching loss using self-normalized importance sampling with an additional bootstrapping trick to reduce variance in the importance weight. We validate EGM on both discrete and multimodal tasks up to 100 and 20 dimensions, respectively.

LGMay 26, 2025
On scalable and efficient training of diffusion samplers

Minkyu Kim, Kiyoung Seong, Dongyeop Woo et al.

We address the challenge of training diffusion models to sample from unnormalized energy distributions in the absence of data, the so-called diffusion samplers. Although these approaches have shown promise, they struggle to scale in more demanding scenarios where energy evaluations are expensive and the sampling space is high-dimensional. To address this limitation, we propose a scalable and sample-efficient framework that properly harmonizes the powerful classical sampling method and the diffusion sampler. Specifically, we utilize Monte Carlo Markov chain (MCMC) samplers with a novelty-based auxiliary energy as a Searcher to collect off-policy samples, using an auxiliary energy function to compensate for exploring modes the diffusion sampler rarely visits. These off-policy samples are then combined with on-policy data to train the diffusion sampler, thereby expanding its coverage of the energy landscape. Furthermore, we identify primacy bias, i.e., the preference of samplers for early experience during training, as the main cause of mode collapse during training, and introduce a periodic re-initialization trick to resolve this issue. Our method significantly improves sample efficiency on standard benchmarks for diffusion samplers and also excels at higher-dimensional problems and real-world molecular conformer generation.