LGAICLNEDec 18, 2025

Topic Modelling Black Box Optimization

arXiv:2512.16445v1
Originality Incremental advance
AI Analysis

This addresses a key design decision in topic modeling for researchers and practitioners, but it is incremental as it applies existing optimization methods to a known bottleneck.

The paper tackles the problem of selecting the number of topics in Latent Dirichlet Allocation by formulating it as a discrete black-box optimization, comparing evolutionary and amortized optimizers, with results showing that amortized methods like SABBO achieve near-optimal topic numbers after essentially one evaluation, while GA and ES require almost the full budget.

Choosing the number of topics $T$ in Latent Dirichlet Allocation (LDA) is a key design decision that strongly affects both the statistical fit and interpretability of topic models. In this work, we formulate the selection of $T$ as a discrete black-box optimization problem, where each function evaluation corresponds to training an LDA model and measuring its validation perplexity. Under a fixed evaluation budget, we compare four families of optimizers: two hand-designed evolutionary methods - Genetic Algorithm (GA) and Evolution Strategy (ES) - and two learned, amortized approaches, Preferential Amortized Black-Box Optimization (PABBO) and Sharpness-Aware Black-Box Optimization (SABBO). Our experiments show that, while GA, ES, PABBO, and SABBO eventually reach a similar band of final perplexity, the amortized optimizers are substantially more sample- and time-efficient. SABBO typically identifies a near-optimal topic number after essentially a single evaluation, and PABBO finds competitive configurations within a few evaluations, whereas GA and ES require almost the full budget to approach the same region.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes