Zizhuo Wang

LG
h-index21
10papers
409citations
Novelty59%
AI Score57

10 Papers

95.8OCApr 28Code
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling

Jianghao Lin, Zi Ling, Chenyu Zhou et al.

Optimization modeling underpins real-world decision-making in logistics, manufacturing, energy, and public services, but reliably solving such problems from natural-language requirements remains challenging for current large language models (LLMs). In this paper, we propose \emph{Agora-Opt}, a modular agentic framework for optimization modeling that combines decentralized debate with a read-write memory bank. Agora-Opt allows multiple agent teams to independently produce end-to-end solutions and reconcile them through an outcome-grounded debate protocol, while memory stores solver-verified artifacts and past disagreement resolutions to support training-free improvement over time. This design is flexible across both backbones and methods: it reduces base-model lock-in, transfers across different LLM families, and can be layered onto existing pipelines with minimal coupling. Across public benchmarks, Agora-Opt achieves the strongest overall performance among all compared methods, outperforming strong zero-shot LLMs, training-centric approaches, and prior agentic baselines. Further analyses show robust gains across backbone choices and component variants, and demonstrate that decentralized debate offers a structural advantage over centralized selection by enabling agents to refine candidate solutions through interaction and even recover correct formulations when all initial candidates are flawed. These results suggest that reliable optimization modeling benefits from combining collaborative cross-checking with reusable experience, and position Agora-Opt as a practical and extensible foundation for trustworthy optimization modeling assistance. Our code and data are available at https://github.com/CHIANGEL/Agora-Opt.

TRMay 18, 2022
Price Interpretability of Prediction Markets: A Convergence Analysis

Dian Yu, Jianjun Gao, Weiping Wu et al.

Prediction markets are long known for prediction accuracy. This study systematically explores the fundamental properties of prediction markets, addressing questions about their information aggregation process and the factors contributing to their remarkable efficacy. We propose a novel multivariate utility (MU) based mechanism that unifies several existing automated market-making schemes. Using this mechanism, we establish the convergence results for markets comprised of risk-averse traders who have heterogeneous beliefs and repeatedly interact with the market maker. We demonstrate that the resulting limiting wealth distribution aligns with the Pareto efficient frontier defined by the utilities of all market participants. With the help of this result, we establish analytical and numerical results for the limiting price in different market models. Specifically, we show that the limiting price converges to the geometric mean of agent beliefs in exponential utility-based markets. In risk-measure-based markets, we construct a family of risk measures that satisfy the convergence criteria and prove that the price can converge to a unique level represented by the weighted power mean of agent beliefs. In broader markets with Constant Relative Risk Aversion (CRRA) utilities, we reveal that the limiting price can be characterized by systems of equations that encapsulate agent beliefs, risk parameters, and wealth. Despite the potential impact of traders' trading sequences on the limiting price, we establish a price invariance result for markets with a large trader population. Using this result, we propose an efficient approximation scheme for the limiting price.

CVMar 2
SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

Chong Xia, Kai Zhu, Zizhuo Wang et al.

Compositional scene reconstruction seeks to create object-centric representations rather than holistic scenes from real-world videos, which is natively applicable for simulation and interaction. Conventional compositional reconstruction approaches primarily emphasize on visual appearance and show limited generalization ability to real-world scenarios. In this paper, we propose SimRecon, a framework that realizes a "Perception-Generation-Simulation" pipeline towards cluttered scene reconstruction, which first conducts scene-level semantic reconstruction from video input, then performs single-object generation, and finally assembles these assets in the simulator. However, naively combining these three stages leads to visual infidelity of generated assets and physical implausibility of the final scene, a problem particularly severe for complex scenes. Thus, we further propose two bridging modules between the three stages to address this problem. To be specific, for the transition from Perception to Generation, critical for visual fidelity, we introduce Active Viewpoint Optimization, which actively searches in 3D space to acquire optimal projected images as conditions for single-object completion. Moreover, for the transition from Generation to Simulation, essential for physical plausibility, we propose a Scene Graph Synthesizer, which guides the construction from scratch in 3D simulators, mirroring the native, constructive principle of the real world. Extensive experiments on the ScanNet dataset validate our method's superior performance over previous state-of-the-art approaches.

DSAug 1, 2024
Infrequent Resolving Algorithm for Online Linear Programming

Guokai Li, Zizhuo Wang, Jingwei Zhang

Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management, order fulfillment and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance. In this work, we bridge the gap between these two extremes by proposing a well-performing algorithm, that solves LPs at a few selected time points and conducts first-order computations at other time points. Specifically, for the case where the inputs are drawn from an unknown finite-support distribution, the proposed algorithm achieves a constant regret (even for the hard "degenerate" case) while solving LPs only O(log log T) times over the time horizon T. Moreover, when we are allowed to solve LPs only M times, we design the corresponding schedule such that the proposed algorithm can guarantee a nearly O(T^((1/2)^(M-1)) regret. Our work highlights the value of resolving both at the beginning and the end of the selling horizon, and provides a novel framework to prove the performance guarantee of the proposed policy under different infrequent resolving schedules. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.

LGJul 14, 2025
From Small to Large: A Graph Convolutional Network Approach for Solving Assortment Optimization Problems

Guokai Li, Pin Gao, Stefanus Jasin et al.

Assortment optimization seeks to select a subset of substitutable products, subject to constraints, to maximize expected revenue. The problem is NP-hard due to its combinatorial and nonlinear nature and arises frequently in industries such as e-commerce, where platforms must solve thousands of such problems each minute. We propose a graph convolutional network (GCN) framework to efficiently solve constrained assortment optimization problems. Our approach constructs a graph representation of the problem, trains a GCN to learn the mapping from problem parameters to optimal assortments, and develops three inference policies based on the GCN's output. Owing to the GCN's ability to generalize across instance sizes, patterns learned from small-scale samples can be transferred to large-scale problems. Numerical experiments show that a GCN trained on instances with 20 products achieves over 85% of the optimal revenue on problems with up to 2,000 products within seconds, outperforming existing heuristics in both accuracy and efficiency. We further extend the framework to settings with an unknown choice model using transaction data and demonstrate similar performance and scalability.

CLOct 5, 2025
CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

Zhengyang Tang, Zihan Ye, Chenyu Huang et al.

Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning, opening new opportunities for automating optimization modeling. However, existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs -- In particular, we show that direct fine-tuning on traditional \textit{non-reflective} datasets leads to limited gains. To fully leverage LRMs' inherent reasoning abilities, we propose \textbf{CALM} (\textit{Corrective Adaptation with Lightweight Modification}), a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks. In CALM, an expert intervener identifies reasoning flaws and provides concise corrective hints, which the LRM incorporates to produce improved reasoning trajectories. These interventions modify fewer than 2.6\% of generated tokens, but generate high-quality data for soft adaptation through supervised fine-tuning. The adapted model is then further improved through reinforcement learning. Building on CALM, we develop \textbf{STORM} (\textit{Smart Thinking Optimization Reasoning Model}), a 4B-parameter LRM that achieves a new state-of-the-art average accuracy of 68.9\% across five popular optimization modeling benchmarks, matching the performance of a 671B LRM. These results demonstrate that dynamic, hint-based data synthesis both preserves and amplifies the native reasoning patterns of modern LRMs, offering a more effective and scalable path towards expert-level performance on challenging optimization modeling tasks.

LGSep 26, 2025
Learning to Price Bundles: A GCN Approach for Mixed Bundling

Liangyu Ding, Chenghan Wu, Guokai Li et al.

Bundle pricing refers to designing several product combinations (i.e., bundles) and determining their prices in order to maximize the expected profit. It is a classic problem in revenue management and arises in many industries, such as e-commerce, tourism, and video games. However, the problem is typically intractable due to the exponential number of candidate bundles. In this paper, we explore the usage of graph convolutional networks (GCNs) in solving the bundle pricing problem. Specifically, we first develop a graph representation of the mixed bundling model (where every possible bundle is assigned with a specific price) and then train a GCN to learn the latent patterns of optimal bundles. Based on the trained GCN, we propose two inference strategies to derive high-quality feasible solutions. A local-search technique is further proposed to improve the solution quality. Numerical experiments validate the effectiveness and efficiency of our proposed GCN-based framework. Using a GCN trained on instances with 5 products, our methods consistently achieve near-optimal solutions (better than 97%) with only a fraction of computational time for problems of small to medium size. It also achieves superior solutions for larger size of problems compared with other heuristic methods such as bundle size pricing (BSP). The method can also provide high quality solutions for instances with more than 30 products even for the challenging cases where product utilities are non-additive.

LGNov 3, 2020
Towards Fundamental Limits of Multi-armed Bandits with Random Walk Feedback

Tianyu Wang, Lin F. Yang, Zizhuo Wang

In this paper, we consider a new Multi-Armed Bandit (MAB) problem where arms are nodes in an unknown and possibly changing graph, and the agent (i) initiates random walks over the graph by pulling arms, (ii) observes the random walk trajectories, and (iii) receives rewards equal to the lengths of the walks. We provide a comprehensive understanding of this problem by studying both the stochastic and the adversarial setting. We show that this problem is not easier than a standard MAB in an information theoretical sense, although additional information is available through random walk trajectories. Behaviors of bandit algorithms on this problem are also studied.

MLJun 11, 2019
Probabilistic Forecasting with Temporal Convolutional Neural Network

Yitian Chen, Yanfei Kang, Yixiong Chen et al.

We present a probabilistic forecasting framework based on convolutional neural network for multiple related time series forecasting. The framework can be applied to estimate probability density under both parametric and non-parametric settings. More specifically, stacked residual blocks based on dilated causal convolutional nets are constructed to capture the temporal dependencies of the series. Combined with representation learning, our approach is able to learn complex patterns such as seasonality, holiday effects within and across series, and to leverage those patterns for more accurate forecasts, especially when historical data is sparse or unavailable. Extensive empirical studies are performed on several real-world datasets, including datasets from JD.com, China's largest online retailer. The results show that our framework outperforms other state-of-the-art methods in both accuracy and efficiency.

DSJul 23, 2013
A Near-Optimal Dynamic Learning Algorithm for Online Matching Problems with Concave Returns

Xiao Alison Chen, Zizhuo Wang

We consider an online matching problem with concave returns. This problem is a significant generalization of the Adwords allocation problem and has vast applications in online advertising. In this problem, a sequence of items arrive sequentially and each has to be allocated to one of the bidders, who bid a certain value for each item. At each time, the decision maker has to allocate the current item to one of the bidders without knowing the future bids and the objective is to maximize the sum of some concave functions of each bidder's aggregate value. In this work, we propose an algorithm that achieves near-optimal performance for this problem when the bids arrive in a random order and the input data satisfies certain conditions. The key idea of our algorithm is to learn the input data pattern dynamically: we solve a sequence of carefully chosen partial allocation problems and use their optimal solutions to assist with the future decision. Our analysis belongs to the primal-dual paradigm, however, the absence of linearity of the objective function and the dynamic feature of the algorithm makes our analysis quite unique.