LGJun 29, 2023Code
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization BenchmarkFederico Berto, Chuanbo Hua, Junyoung Park et al. · pku
Combinatorial optimization (CO) is fundamental to several real-world applications, from logistics and scheduling to hardware design and resource allocation. Deep reinforcement learning (RL) has recently shown significant benefits in solving CO problems, reducing reliance on domain expertise and improving computational efficiency. However, the absence of a unified benchmarking framework leads to inconsistent evaluations, limits reproducibility, and increases engineering overhead, raising barriers to adoption for new researchers. To address these challenges, we introduce RL4CO, a unified and extensive benchmark with in-depth library coverage of 27 CO problem environments and 23 state-of-the-art baselines. Built on efficient software libraries and best practices in implementation, RL4CO features modularized implementation and flexible configurations of diverse environments, policy architectures, RL algorithms, and utilities with extensive documentation. RL4CO helps researchers build on existing successes while exploring and developing their own designs, facilitating the entire research process by decoupling science from heavy engineering. We finally provide extensive benchmark studies to inspire new insights and future work. RL4CO has already attracted numerous researchers in the community and is open-sourced at https://github.com/ai4co/rl4co.
LGJun 5, 2023Code
Equity-Transformer: Solving NP-hard Min-Max Routing Problems as Sequential Generation with Equity ContextJiwoo Son, Minsu Kim, Sanghyeok Choi et al.
Min-max routing problems aim to minimize the maximum tour length among multiple agents by having agents conduct tasks in a cooperative manner. These problems include impactful real-world applications but are known as NP-hard. Existing methods are facing challenges, particularly in large-scale problems that require the coordination of numerous agents to cover thousands of cities. This paper proposes Equity-Transformer to solve large-scale min-max routing problems. First, we employ sequential planning approach to address min-max routing problems, allowing us to harness the powerful sequence generators (e.g., Transformer). Second, we propose key inductive biases that ensure equitable workload distribution among agents. The effectiveness of Equity-Transformer is demonstrated through its superior performance in two representative min-max routing tasks: the min-max multi-agent traveling salesman problem (min-max mTSP) and the min-max multi-agent pick-up and delivery problem (min-max mPDP). Notably, our method achieves significant reductions of runtime, approximately 335 times, and cost values of about 53\% compared to a competitive heuristic (LKH3) in the case of 100 vehicles with 1,000 cities of mTSP. We provide reproducible source code: \url{https://github.com/kaist-silab/equity-transformer}.
96.7LGMay 26Code
Aligning Few-Step Generative Models by Amortizing Sample-based Variational InferenceJaewoo Lee, Hyeongyu Kang, Dohyun Kim et al.
Aligning a few-step generative model is challenging, since existing alignment frameworks typically rely on restrictive assumptions: a tractable likelihood, a specific ODE/SDE solver, or a particular model family. We introduce FAV, Few-step Generative Models Alignment via Sample-based Variational Inference, a general alignment framework that requires only sample access to the generator and the reference distribution. We cast alignment as sampling from a reward-tilted distribution anchored to a reference distribution. We leverage Stein Variational Gradient Descent as a sample-based variational inference scheme and amortize its particle updates into the generator parameters via fixed-point regression. We evaluate FAV on two domains: robotics manipulation and image generator alignment. On generative policy alignment for robotic manipulation, FAV outperforms prevailing policy extraction baselines across 56 offline and 30 offline-to-online RL tasks. For image generator alignment, FAV fine-tunes diverse few-step backbones, including GAN, drifting model, consistency models, and flow maps, scaling from ImageNet-$256$ to 1024$^2$ text-to-image synthesis. Code is available at https://github.com/Jaewoopudding/FAV.
67.6LGMay 28
Solving Integer Linear Programming with Parallel TemperingKyuil Sim, Sanghyeok Choi, Jinkyoo Park
Integer Linear Programming (ILP) serves as a versatile framework for modeling a wide range of combinatorial optimization problems, typically addressed by sophisticated exact solvers or heuristics. While learning-based approaches have recently shown their effectiveness, they suffer from poor generalization to out-of-distribution instances and inherent dependence on external solvers. In this work, we propose a solver-free, sampling-based optimization framework for ILP that directly explores discrete feasible regions without training or external solvers. Exploiting the linear structure of ILP, we employ a Locally-Balanced Proposal to construct a transition kernel, thereby avoiding the gradient approximation. To overcome the highly multimodal nature of ILP energy landscapes, we integrate Parallel Tempering. In addition to standard temperature tempering, we introduce penalty tempering, which modulates constraint barriers while preserving the objective landscape over feasible solutions. Empirically, our method consistently outperforms SCIP across all four benchmarks, matches or exceeds Gurobi on two of four tasks within a 200-second budget, and is substantially more robust to distribution shift than learning-based methods. Furthermore, on MIPLIB 2017 instances, our framework remains competitive with classical solvers without any problem-specific tuning.
LGFeb 5
Discrete diffusion samplers and bridges: Off-policy algorithms and applications in latent spacesArran Carter, Sanghyeok Choi, Kirill Tamogashev et al.
Sampling from a distribution $p(x) \propto e^{-\mathcal{E}(x)}$ known up to a normalising constant is an important and challenging problem in statistics. Recent years have seen the rise of a new family of amortised sampling algorithms, commonly referred to as diffusion samplers, that enable fast and efficient sampling from an unnormalised density. Such algorithms have been widely studied for continuous-space sampling tasks; however, their application to problems in discrete space remains largely unexplored. Although some progress has been made in this area, discrete diffusion samplers do not take full advantage of ideas commonly used for continuous-space sampling. In this paper, we propose to bridge this gap by introducing off-policy training techniques for discrete diffusion samplers. We show that these techniques improve the performance of discrete samplers on both established and new synthetic benchmarks. Next, we generalise discrete diffusion samplers to the task of bridging between two arbitrary distributions, introducing data-to-energy Schrödinger bridge training for the discrete domain for the first time. Lastly, we showcase the application of the proposed diffusion samplers to data-free posterior sampling in the discrete latent spaces of image generative models.
LGMay 24, 2023Code
torchgfn: A PyTorch GFlowNet libraryJoseph D. Viviano, Omar G. Younis, Sanghyeok Choi et al.
The growing popularity of generative flow networks (GFlowNets or GFNs) from a range of researchers with diverse backgrounds and areas of expertise necessitates a library that facilitates the testing of new features (e.g., training losses and training policies) against standard benchmark implementations, or on a set of common environments. We present torchgfn, a PyTorch library that aims to address this need. Its core contribution is a modular and decoupled architecture which treats environments, neural network modules, and training objectives as interchangeable components. This provides users with a simple yet powerful API to facilitate rapid prototyping and novel research. Multiple examples are provided, replicating and unifying published results. The library is available on GitHub (https://github.com/GFNOrg/torchgfn) and on pypi (https://pypi.org/project/torchgfn/).
LGMar 11, 2024
Ant Colony Sampling with GFlowNets for Combinatorial OptimizationMinsu Kim, Sanghyeok Choi, Hyeonah Kim et al.
We present the Generative Flow Ant Colony Sampler (GFACS), a novel meta-heuristic method that hierarchically combines amortized inference and parallel stochastic search. Our method first leverages Generative Flow Networks (GFlowNets) to amortize a \emph{multi-modal} prior distribution over combinatorial solution space that encompasses both high-reward and diversified solutions. This prior is iteratively updated via parallel stochastic search in the spirit of Ant Colony Optimization (ACO), leading to the posterior distribution that generates near-optimal solutions. Extensive experiments across seven combinatorial optimization problems demonstrate GFACS's promising performances.
BMFeb 5, 2024
Genetic-guided GFlowNets for Sample Efficient Molecular OptimizationHyeonah Kim, Minsu Kim, Sanghyeok Choi et al.
The challenge of discovering new molecules with desired properties is crucial in domains like drug discovery and material design. Recent advances in deep learning-based generative methods have shown promise but face the issue of sample efficiency due to the computational expense of evaluating the reward function. This paper proposes a novel algorithm for sample-efficient molecular optimization by distilling a powerful genetic algorithm into deep generative policy using GFlowNets training, the off-policy method for amortized inference. This approach enables the deep generative policy to learn from domain knowledge, which has been explicitly integrated into the genetic algorithm. Our method achieves state-of-the-art performance in the official molecular optimization benchmark, significantly outperforming previous methods. It also demonstrates effectiveness in designing inhibitors against SARS-CoV-2 with substantially fewer reward calls.
CLJan 15
SocraticKG: Knowledge Graph Construction via QA-Driven Fact ExtractionSanghyeok Choi, Woosang Jeon, Kyuseok Yang et al.
Constructing Knowledge Graphs (KGs) from unstructured text provides a structured framework for knowledge representation and reasoning, yet current LLM-based approaches struggle with a fundamental trade-off: factual coverage often leads to relational fragmentation, while premature consolidation causes information loss. To address this, we propose SocraticKG, an automated KG construction method that introduces question-answer pairs as a structured intermediate representation to systematically unfold document-level semantics prior to triple extraction. By employing 5W1H-guided QA expansion, SocraticKG captures contextual dependencies and implicit relational links typically lost in direct KG extraction pipelines, providing explicit grounding in the source document that helps mitigate implicit reasoning errors. Evaluation on the MINE benchmark demonstrates that our approach effectively addresses the coverage-connectivity trade-off, achieving superior factual retention while maintaining high structural cohesion even as extracted knowledge volume substantially expands. These results highlight that QA-mediated semantic scaffolding plays a critical role in structuring semantics prior to KG extraction, enabling more coherent and reliable graph construction in subsequent stages.
NEFeb 9, 2025
Neural Genetic Search in Discrete SpacesHyeonah Kim, Sanghyeok Choi, Jiwoo Son et al.
Effective search methods are crucial for improving the performance of deep generative models at test time. In this paper, we introduce a novel test-time search method, Neural Genetic Search (NGS), which incorporates the evolutionary mechanism of genetic algorithms into the generation procedure of deep models. The core idea behind NGS is its crossover, which is defined as parent-conditioned generation using trained generative models. This approach offers a versatile and easy-to-implement search algorithm for deep generative models. We demonstrate the effectiveness and flexibility of NGS through experiments across three distinct domains: routing problems, adversarial prompt generation for language models, and molecular design.
LGOct 13, 2025
Reinforced sequential Monte Carlo for amortised samplingSanghyeok Choi, Sarthak Mittal, Víctor Elvira et al.
This paper proposes a synergy of amortised and particle-based methods for sampling from distributions defined by unnormalised density functions. We state a connection between sequential Monte Carlo (SMC) and neural sequential samplers trained by maximum-entropy reinforcement learning (MaxEnt RL), wherein learnt sampling policies and value functions define proposal kernels and twist functions. Exploiting this connection, we introduce an off-policy RL training procedure for the sampler that uses samples from SMC -- using the learnt sampler as a proposal -- as a behaviour policy that better explores the target distribution. We describe techniques for stable joint training of proposals and twist functions and an adaptive weight tempering scheme to reduce training signal variance. Furthermore, building upon past attempts to use experience replay to guide the training of neural samplers, we derive a way to combine historical samples with annealed importance sampling weights within a replay buffer. On synthetic multi-modal targets (in both continuous and discrete spaces) and the Boltzmann distribution of alanine dipeptide conformations, we demonstrate improvements in approximating the true distribution as well as training stability compared to both amortised and Monte Carlo methods.
LGOct 1, 2025
Diffusion Alignment as Variational Expectation-MaximizationJaewoo Lee, Minsu Kim, Sanghyeok Choi et al.
Diffusion alignment aims to optimize diffusion models for the downstream objective. While existing methods based on reinforcement learning or direct backpropagation achieve considerable success in maximizing rewards, they often suffer from reward over-optimization and mode collapse. We introduce Diffusion Alignment as Variational Expectation-Maximization (DAV), a framework that formulates diffusion alignment as an iterative process alternating between two complementary phases: the E-step and the M-step. In the E-step, we employ test-time search to generate diverse and reward-aligned samples. In the M-step, we refine the diffusion model using samples discovered by the E-step. We demonstrate that DAV can optimize reward while preserving diversity for both continuous and discrete tasks: text-to-image synthesis and DNA sequence design.
CLFeb 3, 2025
Knowledge Synthesis of Photosynthesis Research Using a Large Language ModelSeungri Yoon, Woosang Jeon, Sanghyeok Choi et al.
The development of biological data analysis tools and large language models (LLMs) has opened up new possibilities for utilizing AI in plant science research, with the potential to contribute significantly to knowledge integration and research gap identification. Nonetheless, current LLMs struggle to handle complex biological data and theoretical models in photosynthesis research and often fail to provide accurate scientific contexts. Therefore, this study proposed a photosynthesis research assistant (PRAG) based on OpenAI's GPT-4o with retrieval-augmented generation (RAG) techniques and prompt optimization. Vector databases and an automated feedback loop were used in the prompt optimization process to enhance the accuracy and relevance of the responses to photosynthesis-related queries. PRAG showed an average improvement of 8.7% across five metrics related to scientific writing, with a 25.4% increase in source transparency. Additionally, its scientific depth and domain coverage were comparable to those of photosynthesis research papers. A knowledge graph was used to structure PRAG's responses with papers within and outside the database, which allowed PRAG to match key entities with 63% and 39.5% of the database and test papers, respectively. PRAG can be applied for photosynthesis research and broader plant science domains, paving the way for more in-depth data analysis and predictive capabilities.