CRMay 18
Prrr: Personal Random Rewards for Blockchain ReportingHongyin Chen, Yubin Ke, Xiaotie Deng et al.
Smart contracts, the stateful programs running on blockchains, often rely on reports. Publishers are paid to publish these reports on the blockchain. Designing protocols that incentivize timely reporting is the prevalent reporting problem. But existing solutions face a security-performance trade-off: Relying on a small set of trusted publishers introduces centralization risks, while allowing open publication results in an excessive number of reports on the blockchain. We identify the root cause of this trade-off to be the standard symmetric reward design, which treats all reports equally. We prove that no symmetric-reward mechanism can overcome the trade-off. We present Personal Random Rewards for Reporting (Prrr), a protocol that assigns random heterogeneous values to reports. We call this novel mechanism-design concept Ex-Ante Synthetic Asymmetry. To the best of our knowledge, Prrr is the first game-theoretic mechanism (in any context) that deliberately forms participant asymmetry. Prrr employs a second-price-style settlement to allocate rewards, ensuring incentive compatibility and achieving both security and efficiency. Following the protocol constitutes a Subgame-Perfect Nash Equilibrium, robust against collusion and Sybil attacks. Prrr is applicable to numerous smart contracts that rely on timely reports.
GTJun 13, 2023
Coordinated Dynamic Bidding in Repeated Second-Price Auctions with BudgetsYurong Chen, Qian Wang, Zhijian Duan et al. · pku
In online ad markets, a rising number of advertisers are employing bidding agencies to participate in ad auctions. These agencies are specialized in designing online algorithms and bidding on behalf of their clients. Typically, an agency usually has information on multiple advertisers, so she can potentially coordinate bids to help her clients achieve higher utilities than those under independent bidding. In this paper, we study coordinated online bidding algorithms in repeated second-price auctions with budgets. We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding. We show that these algorithms achieve maximal coalition welfare and discuss bidders' incentives to misreport their budgets, in symmetric cases. Our proofs combine the techniques of online learning and equilibrium analysis, overcoming the difficulty of competing with a multi-dimensional benchmark. The performance of our algorithms is further evaluated by experiments on both synthetic and real data. To the best of our knowledge, we are the first to consider bidder coordination in online repeated auctions with constraints.
GTMay 3, 2022
On the Convergence of Fictitious Play: A Decomposition ApproachYurong Chen, Xiaotie Deng, Chenchen Li et al. · pku
Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in $n$-player games, which builds the foundation for modern multi-agent learning algorithms. Although FP has provable convergence guarantees on zero-sum games and potential games, many real-world problems are often a mixture of both and the convergence property of FP has not been fully studied yet. In this paper, we extend the convergence results of FP to the combinations of such games and beyond. Specifically, we derive new conditions for FP to converge by leveraging game decomposition techniques. We further develop a linear relationship unifying cooperation and competition in the sense that these two classes of games are mutually transferable. Finally, we analyze a non-convergent example of FP, the Shapley game, and develop sufficient conditions for FP to converge.
GTNov 3, 2022
Sybil-Proof Diffusion Auction in Social NetworksHongyin Chen, Xiaotie Deng, Ying Wang et al.
A diffusion auction is a market to sell commodities over a social network, where the challenge is to incentivize existing buyers to invite their neighbors in the network to join the market. Existing mechanisms have been designed to solve the challenge in various settings, aiming at desirable properties such as non-deficiency, incentive compatibility and social welfare maximization. Since the mechanisms are employed in dynamic networks with ever-changing structures, buyers could easily generate fake nodes in the network to manipulate the mechanisms for their own benefits, which is commonly known as the Sybil attack. We observe that strategic agents may gain an unfair advantage in existing mechanisms through such attacks. To resist this potential attack, we propose two diffusion auction mechanisms, the Sybil tax mechanism (STM) and the Sybil cluster mechanism (SCM), to achieve both Sybil-proofness and incentive compatibility in the single-item setting. Our proposal provides the first mechanisms to protect the interests of buyers against Sybil attacks with a mild sacrifice of social welfare and revenue.
GTApr 4, 2022
On Convergence Lemma and Convergence Stability for Piecewise Analytic FunctionsXiaotie Deng, Hanyu Li, Ningyuan Li · pku
In this work, a convergence lemma for function $f$ being finite compositions of analytic mappings and the maximum operator is proved. The lemma shows that the set of $δ$-stationary points near an isolated local minimum point $x^*$ is shrinking to $x^*$ as $δ\to 0$. It is a natural extension of the version for strongly convex $C^1$ functions. However, the correctness of the lemma is subtle. Analytic mappings are necessary for the lemma in the sense that replacing it with differentiable or $C^\infty$ mappings makes the lemma false. The proof is based on stratification theorems of semi-analytic sets by Łojasiewicz. An extension of this proof presents a geometric characterization of the set of stationary points of $f$. Finally, a notion of stability on stationary points, called convergence stability, is proposed. It asks, under small numerical errors, whether a reasonable convergent optimization method started near a stationary point should eventually converge to the same stationary point. The concept of convergence stability becomes nontrivial qualitatively only when the objective function is both nonsmooth and nonconvex. Via the convergence lemma, an intuitive equivalent condition for convergence stability of $f$ is proved. These results together provide a new geometric perspective to study the problem of "where-to-converge" in nonsmooth nonconvex optimization.
GTFeb 23, 2023
Learning to Manipulate a Commitment OptimizerYurong Chen, Xiaotie Deng, Jiarui Gan et al. · pku
It is shown in recent studies that in a Stackelberg game the follower can manipulate the leader by deviating from their true best-response behavior. Such manipulations are computationally tractable and can be highly beneficial for the follower. Meanwhile, they may result in significant payoff losses for the leader, sometimes completely defeating their first-mover advantage. A warning to commitment optimizers, the risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on. That is, the follower knows the full information about both players' payoffs whereas the leader only knows their own payoffs. In this paper, we study the manipulation problem with this information advantage relaxed. We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader. The follower can gather necessary information by querying the leader's optimal commitments against contrived best-response behaviors. Our results indicate that the information advantage is not entirely indispensable to the follower's manipulations: the follower can learn the optimal way to manipulate in polynomial time with polynomially many queries of the leader's optimal commitment.
GTJun 27, 2022
Optimal Private Payoff Manipulation against Commitment in Extensive-form GamesYurong Chen, Xiaotie Deng, Yuhao Li · pku
To take advantage of strategy commitment, a useful tactic of playing games, a leader must learn enough information about the follower's payoff function. However, this leaves the follower a chance to provide fake information and influence the final game outcome. Through a carefully contrived payoff function misreported to the learning leader, the follower may induce an outcome that benefits him more, compared to the ones when he truthfully behaves. We study the follower's optimal manipulation via such strategic behaviors in extensive-form games. Followers' different attitudes are taken into account. An optimistic follower maximizes his true utility among all game outcomes that can be induced by some payoff function. A pessimistic follower only considers misreporting payoff functions that induce a unique game outcome. For all the settings considered in this paper, we characterize all the possible game outcomes that can be induced successfully. We show that it is polynomial-time tractable for the follower to find the optimal way of misreporting his private payoff information. Our work completely resolves this follower's optimal manipulation problem on an extensive-form game tree.
GTMay 29, 2022
No-regret Learning in Repeated First-Price Auctions with Budget ConstraintsRui Ai, Chang Wang, Chenchen Li et al.
Recently the online advertising market has exhibited a gradual shift from second-price auctions to first-price auctions. Although there has been a line of works concerning online bidding strategies in first-price auctions, it still remains open how to handle budget constraints in the problem. In the present paper, we initiate the study for a buyer with budgets to learn online bidding strategies in repeated first-price auctions. We propose an RL-based bidding algorithm against the optimal non-anticipating strategy under stationary competition. Our algorithm obtains $\widetilde O(\sqrt T)$-regret if the bids are all revealed at the end of each round. With the restriction that the buyer only sees the winning bid after each round, our modified algorithm obtains $\widetilde O(T^{\frac{7}{12}})$-regret by techniques developed from survival analysis. Our analysis extends to the more general scenario where the buyer has any bounded instantaneous utility function with regrets of the same order.
GTJul 11, 2022
Dynamic Budget Throttling in Repeated Second-Price AuctionsZhaohua Chen, Chang Wang, Qian Wang et al.
In today's online advertising markets, a crucial requirement for an advertiser is to control her total expenditure within a time horizon under some budget. Among various budget control methods, throttling has emerged as a popular choice, managing an advertiser's total expenditure by selecting only a subset of auctions to participate in. This paper provides a theoretical panorama of a single advertiser's dynamic budget throttling process in repeated second-price auctions. We first establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, when the advertiser's values are stochastic and adversarial. Regarding the algorithmic side, we propose the OGD-CB algorithm, which guarantees a near-optimal expected regret with stochastic values. On the other hand, when values are adversarial, we prove that this algorithm also reaches the upper bound on the asymptotic competitive ratio. We further compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we demonstrate that pacing is generally superior to throttling for the advertiser, supporting the well-known result that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is also an asymptotically optimal dynamic bidding strategy. Our results bridge the gaps in theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.
LGNov 25, 2022
Contextual Decision-Making with Knapsacks Beyond the Worst CaseZhaohua Chen, Rui Ai, Mingwei Yang et al.
We study the framework of a dynamic decision-making scenario with resource constraints. In this framework, an agent, whose target is to maximize the total reward under the initial inventory, selects an action in each round upon observing a random request, leading to a reward and resource consumptions that are further associated with an unknown random external factor. While previous research has already established an $\widetilde{O}(\sqrt{T})$ worst-case regret for this problem, this work offers two results that go beyond the worst-case perspective: one for the worst-case gap between benchmarks and another for logarithmic regret rates. We first show that an $Ω(\sqrt{T})$ distance between the commonly used fluid benchmark and the online optimum is unavoidable when the former has a degenerate optimal solution. On the algorithmic side, we merge the re-solving heuristic with distribution estimation skills and propose an algorithm that achieves an $\widetilde{O}(1)$ regret as long as the fluid LP has a unique and non-degenerate solution. Furthermore, we prove that our algorithm maintains a near-optimal $\widetilde{O}(\sqrt{T})$ regret even in the worst cases and extend these results to the setting where the request and external factor are continuous. Regarding information structure, our regret results are obtained under two feedback models, respectively, where the algorithm accesses the external factor at the end of each round and at the end of a round only when a non-null action is executed.
AIFeb 5
RL-VLA$^3$: Reinforcement Learning VLA Accelerating via Full AsynchronismZhong Guan, Haoran Sun, Yongjian Guo et al.
In recent years, Vision-Language-Action (VLA) models have emerged as a crucial pathway towards general embodied intelligence, yet their training efficiency has become a key bottleneck. Although existing reinforcement learning (RL)-based training frameworks like RLinf can enhance model generalization, they still rely on synchronous execution, leading to severe resource underutilization and throughput limitations during environment interaction, policy generation (rollout), and model update phases (actor). To overcome this challenge, this paper, for the first time, proposes and implements a fully-asynchronous policy training framework encompassing the entire pipeline from environment interaction, rollout generation, to actor policy updates. Systematically drawing inspiration from asynchronous optimization ideas in large model RL, our framework designs a multi-level decoupled architecture. This includes asynchronous parallelization of environment interaction and trajectory collection, streaming execution for policy generation, and decoupled scheduling for training updates. We validated the effectiveness of our method across diverse VLA models and environments. On the LIBERO benchmark, the framework achieves throughput improvements of up to 59.25\% compared to existing synchronous strategies. When deeply optimizing separation strategies, throughput can be increased by as much as 126.67\%. We verified the effectiveness of each asynchronous component via ablation studies. Scaling law validation across 8 to 256 GPUs demonstrates our method's excellent scalability under most conditions.
GTJan 27, 2023
Are Equivariant Equilibrium Approximators Beneficial?Zhijian Duan, Yunxuan Ma, Xiaotie Deng
Recently, remarkable progress has been made by approximating Nash equilibrium (NE), correlated equilibrium (CE), and coarse correlated equilibrium (CCE) through function approximation that trains a neural network to predict equilibria from game representations. Furthermore, equivariant architectures are widely adopted in designing such equilibrium approximators in normal-form games. In this paper, we theoretically characterize benefits and limitations of equivariant equilibrium approximators. For the benefits, we show that they enjoy better generalizability than general ones and can achieve better approximations when the payoff distribution is permutation-invariant. For the limitations, we discuss their drawbacks in terms of equilibrium selection and social welfare. Together, our results help to understand the role of equivariance in equilibrium approximators.
IRMay 15
LERA: LLM-Enhanced RAG for Ad Auction in Generative ChatbotsHaoran Sun, Xinrui Song, Xinyu Zhang et al.
The integration of advertising auction mechanisms into large language model (LLM)-based chatbots presents a significant opportunity for commercialization, yet poses unique challenges in balancing relevance, efficiency, and user experience. Recently, Feizi et al.~\citep{feizi2023online} and Hajiaghayi et al.~\citep{hajiaghayi2024ad} outlined a retrieve-then-generate paradigm that decouples retrieval and generation, offering lightweight ad insertion and payment determination. However, current retrieval relies solely on text embedding similarity, which may lead to commercial misinterpretation and issues such as repetitive insertions. In this paper, we propose LERA, a two-stage retrieve-then-generate auction framework tailored for LLM chatbots. In the first stage, embedding-based coarse filtering pre-selects a small set of candidate advertisers. In the second stage, the LLM itself is queried with a carefully designed prompt to produce logits over candidates, which serve as refined organic relevance scores. These scores are combined with bids, and a critical-value payment rule accounts for both the coarse-filtering and fine-ranking thresholds, ensuring truthfulness for utility-maximizing advertisers. The framework naturally extends to multiple ad insertions within dynamic dialogue flows and long responses. Experiments on a synthetic advertiser-query benchmark show that LERA substantially improves ad selection accuracy and insertion diversity while incurring only controllable latency overhead.
GTJan 27
Ad Insertion in LLM-Generated ResponsesShengwei Xu, Zhaohua Chen, Xiaotie Deng et al.
Sustainable monetization of Large Language Models (LLMs) remains a critical open challenge. Traditional search advertising, which relies on static keywords, fails to capture the fleeting, context-dependent user intents--the specific information, goods, or services a user seeks--embedded in conversational flows. Beyond the standard goal of social welfare maximization, effective LLM advertising imposes additional requirements on contextual coherence (ensuring ads align semantically with transient user intents) and computational efficiency (avoiding user interaction latency), as well as adherence to ethical and regulatory standards, including preserving privacy and ensuring explicit ad disclosure. Although various recent solutions have explored bidding on token-level and query-level, both categories of approaches generally fail to holistically satisfy this multifaceted set of constraints. We propose a practical framework that resolves these tensions through two decoupling strategies. First, we decouple ad insertion from response generation to ensure safety and explicit disclosure. Second, we decouple bidding from specific user queries by using ``genres'' (high-level semantic clusters) as a proxy. This allows advertisers to bid on stable categories rather than sensitive real-time response, reducing computational burden and privacy risks. We demonstrate that applying the VCG auction mechanism to this genre-based framework yields approximately dominant strategy incentive compatibility (DSIC) and individual rationality (IR), as well as approximately optimal social welfare, while maintaining high computational efficiency. Finally, we introduce an "LLM-as-a-Judge" metric to estimate contextual coherence. Our experiments show that this metric correlates strongly with human ratings (Spearman's $ρ\approx 0.66$), outperforming 80% of individual human evaluators.
CLDec 19, 2025
Reinforcement Learning for Chain of Thought Compression with One-Domain-to-All GeneralizationHanyu Li, Jiangshan Duo, Bofei Gao et al.
Chain-of-thought reasoning in large language models can trigger an "overthinking trap": longer rollouts raise cost and latency yet often yield unreliable accuracy gains. Existing methods use global, static controls that may suppress needed reasoning. We propose mastery-gated, sample-level, soft reinforcement learning compression that penalizes long rollouts only when the model already solves the problem and has produced a shorter rollout. Across benchmarks, it cuts response length by 20-40% with comparable or higher accuracy and generalizes across domains: a model trained on math spontaneously shortens unseen tasks (code, instruction following, general-knowledge QA) without hurting accuracy. We further show two-way transfer between non-agent CoT and tool-use agents: non-agent training reduces SWE-Bench Verified rounds by 13%, while compressing a thinking agent cuts SWE trajectories by 67% tokens and 52% rounds and shortens non-agent outputs by up to 44%. Compression is thus not cosmetic brevity, but an inherent computation policy -- what to keep, and what to forget.
LGMay 11
An Information-Theoretic Criterion for Efficient Data SynthesisHanyu Li, Zhengqi Sun, Xiaotie Deng
Synthetic data becomes crucial for large language model training, but its effectiveness is highly inconsistent. We provide an information-theoretic account of this inconsistency: synthetic data improves a model only when the generation-training loop is information-open, i.e., shaped by external signals (verifiers, environments, or rubrics) that inject task-relevant information beyond the model's current distribution. When the loop is information-closed (relying on the model's own outputs without such signals), the data processing inequality ensures that task-relevant information can only decrease, making collapse a predicted outcome. Among information-open pipelines, both efficiency and generalization hinge on the meta-level of supervision: a coarser signal such as binary correctness treats all acceptable outputs as equivalent, so the behavior it teaches is not tied to any particular domain or surface form and generalizes naturally across tasks and domains. These observations lead to a guiding thesis: learning preferentially converges to the most information-efficient signal component available, which accelerates learning when that component is the intended one, but causes reward hacking when a spurious pattern happens to be simpler.
SEMar 19
HCAG: Hierarchical Abstraction and Retrieval-Augmented Generation on Theoretical Repositories with LLMsYusen Wu, Xiaotie Deng
Existing Retrieval-Augmented Generation (RAG) methods for code struggle to capture the high-level architectural patterns and cross-file dependencies inherent in complex, theory-driven codebases, such as those in algorithmic game theory (AGT), leading to a persistent semantic and structural gap between abstract concepts and executable implementations. To address this challenge, we propose Hierarchical Code/Architecture-guided Agent Generation (HCAG), a framework that reformulates repository-level code generation as a structured, planning-oriented process over hierarchical knowledge. HCAG adopts a two-phase design: an offline hierarchical abstraction phase that recursively parses code repositories and aligned theoretical texts to construct a multi-resolution semantic knowledge base explicitly linking theory, architecture, and implementation; and an online hierarchical retrieval and scaffolded generation phase that performs top-down, level-wise retrieval to guide LLMs in an architecture-then-module generation paradigm. To further improve robustness and consistency, HCAG integrates a multi-agent discussion inspired by cooperative game. We provide a theoretical analysis showing that hierarchical abstraction with adaptive node compression achieves cost-optimality compared to flat and iterative RAG baselines. Extensive experiments on diverse game-theoretic system generation tasks demonstrate that HCAG substantially outperforms representative repository-level methods in code quality, architectural coherence, and requirement pass rate. In addition, HCAG produces a large-scale, aligned theory-implementation dataset that effectively enhances domain-specific LLMs through post-training. Although demonstrated in AGT, HCAG paradigm also offers a general blueprint for mining, reusing, and generating complex systems from structured codebases in other domains.
AIMar 18
MALLES: A Multi-agent LLMs-based Economic Sandbox with Consumer Preference AlignmentYusen Wu, Yiran Liu, Xiaotie Deng
In the real economy, modern decision-making is fundamentally challenged by high-dimensional, multimodal environments, which are further complicated by agent heterogeneity and combinatorial data sparsity. This paper introduces a Multi-Agent Large Language Model-based Economic Sandbox (MALLES), leveraging the inherent generalization capabilities of large-sacle models to establish a unified simulation framework applicable to cross-domain and cross-category scenarios. Central to our approach is a preference learning paradigm in which LLMs are economically aligned via post-training on extensive, heterogeneous transaction records across diverse product categories. This methodology enables the models to internalize and transfer latent consumer preference patterns, thereby mitigating the data sparsity issues prevalent in individual categories. To enhance simulation stability, we implement a mean-field mechanism designed to model the dynamic interactions between the product environment and customer populations, effectively stabilizing sampling processes within high-dimensional decision spaces. Furthermore, we propose a multi-agent discussion framework wherein specialized agents collaboratively process extensive product information. This architecture distributes cognitive load to alleviate single-agent attention bottlenecks and captures critical decision factors through structured dialogue. Experiments demonstrate that our framework achieves significant improvements in product selection accuracy, purchase quantity prediction, and simulation stability compared to existing economic and financial LLM simulation baselines. Our results substantiate the potential of large language models as a foundational pillar for high-fidelity, scalable decision simulation and latter analysis in the real economy based on foundational database.
GTOct 12, 2023
The Search-and-Mix Paradigm in Approximate Nash Equilibrium AlgorithmsXiaotie Deng, Dongchen Li, Hanyu Li
AI in Math deals with mathematics in a constructive manner so that reasoning becomes automated, less laborious, and less error-prone. For algorithms, the question becomes how to automate analyses for specific problems. For the first time, this work provides an automatic method for approximation analysis on a well-studied problem in theoretical computer science: computing approximate Nash equilibria in two-player games. We observe that such algorithms can be reformulated into a search-and-mix paradigm, which involves a search phase followed by a mixing phase. By doing so, we are able to fully automate the procedure of designing and analyzing the mixing phase. For example, we illustrate how to perform our method with a program to analyze the approximation bounds of all the algorithms in the literature. Same approximation bounds are computed without any hand-written proof. Our automatic method heavily relies on the LP-relaxation structure in approximate Nash equilibria. Since many approximation algorithms and online algorithms adopt the LP relaxation, our approach may be extended to automate the analysis of other algorithms.
GTFeb 10
Enhancing Affine Maximizer Auctions with Correlation-Aware PaymentHaoran Sun, Xuanzhi Xia, Xu Chu et al.
Affine Maximizer Auctions (AMAs), a generalized mechanism family from VCG, are widely used in automated mechanism design due to their inherent dominant-strategy incentive compatibility (DSIC) and individual rationality (IR). However, as the payment form is fixed, AMA's expressiveness is restricted, especially in distributions where bidders' valuations are correlated. In this paper, we propose Correlation-Aware AMA (CA-AMA), a novel framework that augments AMA with a new correlation-aware payment. We show that any CA-AMA preserves the DSIC property and formalize finding optimal CA-AMA as a constraint optimization problem subject to the IR constraint. Then, we theoretically characterize scenarios where classic AMAs can perform arbitrarily poorly compared to the optimal revenue, while the CA-AMA can reach the optimal revenue. For optimizing CA-AMA, we design a practical two-stage training algorithm. We derive that the target function's continuity and the generalization bound on the degree of deviation from strict IR. Finally, extensive experiments showcase that our algorithm can find an approximate optimal CA-AMA in various distributions with improved revenue and a low degree of violation of IR.
AIDec 3, 2025
DeepRule: An Integrated Framework for Automated Business Rule Generation via Deep Predictive Modeling and Hybrid Search OptimizationYusen Wu, Xiaotie Deng
This paper proposes DeepRule, an integrated framework for automated business rule generation in retail assortment and pricing optimization. Addressing the systematic misalignment between existing theoretical models and real-world economic complexities, we identify three critical gaps: (1) data modality mismatch where unstructured textual sources (e.g. negotiation records, approval documents) impede accurate customer profiling; (2) dynamic feature entanglement challenges in modeling nonlinear price elasticity and time-varying attributes; (3) operational infeasibility caused by multi-tier business constraints. Our framework introduces a tri-level architecture for above challenges. We design a hybrid knowledge fusion engine employing large language models (LLMs) for deep semantic parsing of unstructured text, transforming distributor agreements and sales assessments into structured features while integrating managerial expertise. Then a game-theoretic constrained optimization mechanism is employed to dynamically reconcile supply chain interests through bilateral utility functions, encoding manufacturer-distributor profit redistribution as endogenous objectives under hierarchical constraints. Finally an interpretable decision distillation interface leveraging LLM-guided symbolic regression to find and optimize pricing strategies and auditable business rules embeds economic priors (e.g. non-negative elasticity) as hard constraints during mathematical expression search. We validate the framework in real retail environments achieving higher profits versus systematic B2C baselines while ensuring operational feasibility. This establishes a close-loop pipeline unifying unstructured knowledge injection, multi-agent optimization, and interpretable strategy synthesis for real economic intelligence.
CVApr 29
State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement ReadingYuanze Hu, Gen Li, Yuqin Lan et al.
Multimodal large language models (MLLMs) have achieved impressive progress on general multimodal tasks, yet they remain brittle on dial-based measurement reading. In this paper, we study this problem through controlled benchmarks and feature-space probing, and show that current MLLMs not only achieve unsatisfactory accuracy on dial-based readout, but also suffer sharp performance drops under viewpoint and illumination changes even when the underlying dial state remains fixed. Our probing analysis further reveals that same-state samples under appearance variation are not consistently clustered, while neighboring states fail to preserve the local structure implied by continuous dial values. These findings suggest that existing MLLMs largely ignore the intrinsic state geometry of dial measurement tasks and instead rely on superficial appearance cues. Motivated by this diagnosis, we propose TriSCA, a tri-level state-consistent alignment framework for dial-based measurement reading. Specifically, TriSCA consists of state-distance-aware representation alignment, metadata-grounded observation-to-state supervision, and state-aware objective alignment. Extensive ablation studies and evaluation experiments on controlled clock and gauge benchmarks, together with evaluation on an external real-world benchmark, demonstrate the effectiveness of our method.
GTDec 18, 2023
A survey on algorithms for Nash equilibria in finite normal-form gamesHanyu Li, Wenhan Huang, Zhijian Duan et al.
Nash equilibrium is one of the most influential solution concepts in game theory. With the development of computer science and artificial intelligence, there is an increasing demand on Nash equilibrium computation, especially for Internet economics and multi-agent learning. This paper reviews various algorithms computing the Nash equilibrium and its approximation solutions in finite normal-form games from both theoretical and empirical perspectives. For the theoretical part, we classify algorithms in the literature and present basic ideas on algorithm design and analysis. For the empirical part, we present a comprehensive comparison on the algorithms in the literature over different kinds of games. Based on these results, we provide practical suggestions on implementations and uses of these algorithms. Finally, we present a series of open problems from both theoretical and practical considerations.
AIMay 19, 2024
Hummer: Towards Limited Competitive Preference DatasetLi Jiang, Yusen Wu, Junwu Xiong et al.
Preference datasets are essential for incorporating human preferences into pre-trained language models, playing a key role in the success of Reinforcement Learning from Human Feedback. However, these datasets often demonstrate conflicting alignment objectives, leading to increased vulnerability to jailbreak attacks and challenges in adapting downstream tasks to prioritize specific alignment objectives without negatively impacting others. In this work, we introduce a novel statistical metric, Alignment Dimension Conflict, to quantify the degree of conflict within preference datasets. We then present \texttt{Hummer} and its fine-grained variant, \texttt{Hummer-F}, as innovative pairwise preference datasets with reduced-conflict alignment objectives. \texttt{Hummer} is built based on UltraFeedback and is enhanced by AI feedback from GPT-4, marking as the first preference dataset aimed at reducing the competition between alignment objectives. Furthermore, we develop reward models, HummerRM and HummerRM-F, which employ a hybrid sampling approach to balance diverse alignment objectives effectively. This sampling method positions HummerRM as an ideal model for domain-specific further fine-tuning and reducing vulnerabilities to attacks.
GTFeb 22, 2024
Are Bounded Contracts Learnable and Approximately Optimal?Yurong Chen, Zhaohua Chen, Xiaotie Deng et al. · pku
This paper considers the hidden-action model of the principal-agent problem, in which a principal incentivizes an agent to work on a project using a contract. We investigate whether contracts with bounded payments are learnable and approximately optimal. Our main results are two learning algorithms that can find a nearly optimal bounded contract using a polynomial number of queries, under two standard assumptions in the literature: a costlier action for the agent leads to a better outcome distribution for the principal, and the agent's cost/effort has diminishing returns. Our polynomial query complexity upper bound shows that standard assumptions are sufficient for achieving an exponential improvement upon the known lower bound for general instances. Unlike the existing algorithms, which relied on discretizing the contract space, our algorithms directly learn the underlying outcome distributions. As for the approximate optimality of bounded contracts, we find that they could be far from optimal in terms of multiplicative or additive approximation, but satisfy a notion of mixed approximation.
GTJan 26, 2025
An Adaptable Budget Planner for Enhancing Budget-Constrained Auto-Bidding in Online AdvertisingZhijian Duan, Yusen Huo, Tianyu Wang et al.
In online advertising, advertisers commonly utilize auto-bidding services to bid for impression opportunities. A typical objective of the auto-bidder is to optimize the advertiser's cumulative value of winning impressions within specified budget constraints. However, such a problem is challenging due to the complex bidding environment faced by diverse advertisers. To address this challenge, we introduce ABPlanner, a few-shot adaptable budget planner designed to improve budget-constrained auto-bidding. ABPlanner is based on a hierarchical bidding framework that decomposes the bidding process into shorter, manageable stages. Within this framework, ABPlanner allocates the budget across all stages, allowing a low-level auto-bidder to bids based on the budget allocation plan. The adaptability of ABPlanner is achieved through a sequential decision-making approach, inspired by in-context reinforcement learning. For each advertiser, ABPlanner adjusts the budget allocation plan episode by episode, using data from previous episodes as prompt for current decisions. This enables ABPlanner to quickly adapt to different advertisers with few-shot data, providing a sample-efficient solution. Extensive simulation experiments and real-world A/B testing validate the effectiveness of ABPlanner, demonstrating its capability to enhance the cumulative value achieved by auto-bidders.
AIFeb 13, 2025
Game Theory Meets Large Language Models: A Systematic Survey with Taxonomy and New FrontiersHaoran Sun, Yusen Wu, Peng Wang et al.
Game theory is a foundational framework for analyzing strategic interactions, and its intersection with large language models (LLMs) is a rapidly growing field. However, existing surveys mainly focus narrowly on using game theory to evaluate LLM behavior. This paper provides the first comprehensive survey of the bidirectional relationship between Game Theory and LLMs. We propose a novel taxonomy that categorizes the research in this intersection into four distinct perspectives: (1) evaluating LLMs in game-based scenarios; (2) improving LLMs using game-theoretic concepts for better interpretability and alignment; (3) modeling the competitive landscape of LLM development and its societal impact; and (4) leveraging LLMs to advance game models and to solve corresponding game theory problems. Furthermore, we identify key challenges and outline future research directions. By systematically investigating this interdisciplinary landscape, our survey highlights the mutual influence of game theory and LLMs, fostering progress at the intersection of these fields.
CLMay 23, 2025
IDA-Bench: Evaluating LLMs on Interactive Guided Data AnalysisHanyu Li, Haoyu Liu, Tingyu Zhu et al.
Large Language Models (LLMs) show promise as data analysis agents, but existing benchmarks overlook the iterative nature of the field, where experts' decisions evolve with deeper insights of the dataset. To address this, we introduce IDA-Bench, a novel benchmark evaluating LLM agents in multi-round interactive scenarios. Derived from complex Kaggle notebooks, tasks are presented as sequential natural language instructions by an LLM-simulated user. Agent performance is judged by comparing its final numerical output to the human-derived baseline. Initial results show that even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on < 50% of the tasks, highlighting limitations not evident in single-turn tests. This work underscores the need to improve LLMs' multi-round capabilities for building more reliable data analysis agents, highlighting the necessity of achieving a balance between instruction following and reasoning.
LGMay 19, 2025
TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment BottlenecksYuanze Hu, Zhaoxin Fan, Xinyu Wang et al.
Lightweight Vision-Language Models (VLMs) are indispensable for resource-constrained applications. The prevailing approach to aligning vision and language models involves freezing both the vision encoder and the language model while training small connector modules. However, this strategy heavily depends on the intrinsic capabilities of the language model, which can be suboptimal for lightweight models with limited representational capacity. In this work, we investigate this alignment bottleneck through the lens of mutual information, demonstrating that the constrained capacity of the language model inherently limits the Effective Mutual Information (EMI) between multimodal inputs and outputs, thereby compromising alignment quality. To address this challenge, we propose TinyAlign, a novel framework inspired by Retrieval-Augmented Generation, which strategically retrieves relevant context from a memory bank to enrich multimodal inputs and enhance their alignment. Extensive empirical evaluations reveal that TinyAlign significantly reduces training loss, accelerates convergence, and enhances task performance. Remarkably, it allows models to achieve baseline-level performance with only 40\% of the fine-tuning data, highlighting exceptional data efficiency. Our work thus offers a practical pathway for developing more capable lightweight VLMs while introducing a fresh theoretical lens to better understand and address alignment bottlenecks in constrained multimodal systems.
LGDec 7, 2023
Learning Thresholds with Latent Values and Censored FeedbackJiahao Zhang, Tao Lin, Weiqiang Zheng et al. · harvard, pku
In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward $g(γ, v)$ depends on the proposed threshold $γ$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the unknown latent value. This problem has broad applications in practical scenarios, e.g., reserve price optimization in online auctions, online task assignments in crowdsourcing, setting recruiting bars in hiring, etc. We first characterize the query complexity of learning a threshold with the expected reward at most $ε$ smaller than the optimum and prove that the number of queries needed can be infinitely large even when $g(γ, v)$ is monotone with respect to both $γ$ and $v$. On the positive side, we provide a tight query complexity $\tildeΘ(1/ε^3)$ when $g$ is monotone and the CDF of value distribution is Lipschitz. Moreover, we show a tight $\tildeΘ(1/ε^3)$ query complexity can be achieved as long as $g$ satisfies one-sided Lipschitzness, which provides a complete characterization for this problem. Finally, we extend this model to an online learning setting and demonstrate a tight $Θ(T^{2/3})$ regret bound using continuous-arm bandit techniques and the aforementioned query complexity results.
GTDec 17, 2025
Will AI Trade? A Computational Inversion of the No-Trade TheoremHanyu Li, Xiaotie Deng
Classic no-trade theorems attribute trade to heterogeneous beliefs. We re-examine this conclusion for AI agents, asking if trade can arise from computational limitations, under common beliefs. We model agents' bounded computational rationality within an unfolding game framework, where computational power determines the complexity of its strategy. Our central finding inverts the classic paradigm: a stable no-trade outcome (Nash equilibrium) is reached only when "almost rational" agents have slightly different computational power. Paradoxically, when agents possess identical power, they may fail to converge to equilibrium, resulting in persistent strategic adjustments that constitute a form of trade. This instability is exacerbated if agents can strategically under-utilize their computational resources, which eliminates any chance of equilibrium in Matching Pennies scenarios. Our results suggest that the inherent computational limitations of AI agents can lead to situations where equilibrium is not reached, creating a more lively and unpredictable trade environment than traditional models would predict.
CLSep 24, 2025
How Large Language Models Need SymbolismXiaotie Deng, Hanyu Li
We argue that AI's future requires more than scaling. To unlock genuine discovery, large language models need a compass: human-crafted symbols to guide their powerful but blind intuition.
GTAug 16, 2025
Discovering Expert-Level Nash Equilibrium Algorithms with Large Language ModelsHanyu Li, Dongchen Li, Xiaotie Deng
Algorithm design and analysis is a cornerstone of computer science, but it confronts a major challenge. Proving an algorithm's performance guarantee across all inputs has traditionally required extensive and often error-prone human effort. While AI has shown great success in finding solutions to specific problem instances, automating the discovery of general algorithms with such provable guarantees has remained a significant barrier. This challenge stems from the difficulty of integrating the creative process of algorithm design with the rigorous process of formal analysis. To address this gap, we propose LegoNE, a framework that tightly fuses these two processes for the fundamental and notoriously difficult problem of computing approximate Nash equilibria. LegoNE automatically translates any algorithm written by a simple Python-like language into a constrained optimization problem. Solving this problem derives and proves the algorithm's approximation bound. Using LegoNE, a state-of-the-art large language model rediscovered the state-of-the-art algorithm for two-player games within hours, a feat that had taken human researchers 15 years to achieve. For three-player games, the model discovered a novel algorithm surpassing all existing human-designed ones. This work demonstrates a new human-machine collaborative paradigm for theoretical science: humans reason at a higher-abstract level, using symbols to compress the search space, and AI explores within it, achieving what neither could alone.
GTJun 28, 2025
Learning Truthful Mechanisms without DiscretizationYunxuan Ma, Siqiang Wang, Zhijian Duan et al.
This paper introduces TEDI (Truthful, Expressive, and Dimension-Insensitive approach), a discretization-free algorithm to learn truthful and utility-maximizing mechanisms. Existing learning-based approaches often rely on discretization of outcome spaces to ensure truthfulness, which leads to inefficiency with increasing problem size. To address this limitation, we formalize the concept of pricing rules, defined as functions that map outcomes to prices. Based on this concept, we propose a novel menu mechanism, which can be equivalent to a truthful direct mechanism under specific conditions. The core idea of TEDI lies in its parameterization of pricing rules using Partial GroupMax Network, a new network architecture designed to universally approximate partial convex functions. To learn optimal pricing rules, we develop novel training techniques, including covariance trick and continuous sampling, to derive unbiased gradient estimators compatible with first-order optimization. Theoretical analysis establishes that TEDI guarantees truthfulness, full expressiveness, and dimension-insensitivity. Experimental evaluation in the studied auction setting demonstrates that TEDI achieves strong performance, competitive with or exceeding state-of-the-art methods. This work presents the first approaches to learn truthful mechanisms without outcome discretization, thereby enhancing algorithmic efficiency. The proposed concepts, network architecture, and learning techniques might offer potential value and provide new insights for automated mechanism design and differentiable economics.
CLMay 11, 2025
Implementing Long Text Style Transfer with LLMs through Dual-Layered Sentence and Paragraph Structure Extraction and MappingYusen Wu, Xiaotie Deng
This paper addresses the challenge in long-text style transfer using zero-shot learning of large language models (LLMs), proposing a hierarchical framework that combines sentence-level stylistic adaptation with paragraph-level structural coherence. We argue that in the process of effective paragraph-style transfer, to preserve the consistency of original syntactic and semantic information, it is essential to perform style transfer not only at the sentence level but also to incorporate paragraph-level semantic considerations, while ensuring structural coherence across inter-sentential relationships. Our proposed framework, ZeroStylus, operates through two systematic phases: hierarchical template acquisition from reference texts and template-guided generation with multi-granular matching. The framework dynamically constructs sentence and paragraph template repositories, enabling context-aware transformations while preserving inter-sentence logical relationships. Experimental evaluations demonstrate significant improvements over baseline methods, with structured rewriting achieving 6.90 average score compared to 6.70 for direct prompting approaches in tri-axial metrics assessing style consistency, content preservation, and expression quality. Ablation studies validate the necessity of both template hierarchies during style transfer, showing higher content preservation win rate against sentence-only approaches through paragraph-level structural encoding, as well as direct prompting method through sentence-level pattern extraction and matching. The results establish new capabilities for coherent long-text style transfer without requiring parallel corpora or LLM fine-tuning.
CLApr 4, 2025
How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent TasksYusen Wu, Junwu Xiong, Xiaotie Deng
Expanding the application of large language models (LLMs) to societal life, instead of primary function only as auxiliary assistants to communicate with only one person at a time, necessitates LLMs' capabilities to independently play roles in multi-user, multi-turn social agent tasks within complex social settings. However, currently the capability has not been systematically measured with available benchmarks. To address this gap, we first introduce an agent task leveling framework grounded in sociological principles. Concurrently, we propose a novel benchmark, How Social Is It (we call it HSII below), designed to assess LLM's social capabilities in comprehensive social agents tasks and benchmark representative models. HSII comprises four stages: format parsing, target selection, target switching conversation, and stable conversation, which collectively evaluate the communication and task completion capabilities of LLMs within realistic social interaction scenarios dataset, HSII-Dataset. The dataset is derived step by step from news dataset. We perform an ablation study by doing clustering to the dataset. Additionally, we investigate the impact of chain of thought (COT) method on enhancing LLMs' social performance. Since COT cost more computation, we further introduce a new statistical metric, COT-complexity, to quantify the efficiency of certain LLMs with COTs for specific social tasks and strike a better trade-off between measurement of correctness and efficiency. Various results of our experiments demonstrate that our benchmark is well-suited for evaluating social skills in LLMs.
LGFeb 24, 2025
Survey on Strategic Mining in Blockchain: A Reinforcement Learning ApproachJichen Li, Lijia Xie, Hanting Huang et al.
Strategic mining attacks, such as selfish mining, exploit blockchain consensus protocols by deviating from honest behavior to maximize rewards. Markov Decision Process (MDP) analysis faces scalability challenges in modern digital economics, including blockchain. To address these limitations, reinforcement learning (RL) provides a scalable alternative, enabling adaptive strategy optimization in complex dynamic environments. In this survey, we examine RL's role in strategic mining analysis, comparing it to MDP-based approaches. We begin by reviewing foundational MDP models and their limitations, before exploring RL frameworks that can learn near-optimal strategies across various protocols. Building on this analysis, we compare RL techniques and their effectiveness in deriving security thresholds, such as the minimum attacker power required for profitable attacks. Expanding the discussion further, we classify consensus protocols and propose open challenges, such as multi-agent dynamics and real-world validation. This survey highlights the potential of reinforcement learning (RL) to address the challenges of selfish mining, including protocol design, threat detection, and security analysis, while offering a strategic roadmap for researchers in decentralized systems and AI-driven analytics.
GTJun 11, 2024
Large-Scale Contextual Market Equilibrium Computation through Deep LearningYunxuan Ma, Yide Bian, Hao Xu et al.
Market equilibrium is one of the most fundamental solution concepts in economics and social optimization analysis. Existing works on market equilibrium computation primarily focus on settings with relatively few buyers. Motivated by this, our paper investigates the computation of market equilibrium in scenarios with a large-scale buyer population, where buyers and goods are represented by their contexts. Building on this realistic and generalized contextual market model, we introduce MarketFCNet, a deep learning-based method for approximating market equilibrium. We start by parameterizing the allocation of each good to each buyer using a neural network, which depends solely on the context of the buyer and the good. Next, we propose an efficient method to unbiasedly estimate the loss function of the training algorithm, enabling us to optimize the network parameters through gradient. To evaluate the approximated solution, we propose a metric called Nash Gap, which quantifies the deviation of the given allocation and price pair from the market equilibrium. Experimental results indicate that MarketFCNet delivers competitive performance and significantly lower running times compared to existing methods as the market scale expands, demonstrating the potential of deep learning-based methods to accelerate the approximation of large-scale contextual market equilibrium.
GTFeb 19, 2024
Automated Deterministic Auction Design with Objective DecompositionZhijian Duan, Haoran Sun, Yichong Xia et al.
Identifying high-revenue mechanisms that are both dominant strategy incentive compatible (DSIC) and individually rational (IR) is a fundamental challenge in auction design. While theoretical approaches have encountered bottlenecks in multi-item auctions, there has been much empirical progress in automated designing such mechanisms using machine learning. However, existing research primarily focuses on randomized auctions, with less attention given to the more practical deterministic auctions. Therefore, this paper investigates the automated design of deterministic auctions and introduces OD-VVCA, an objective decomposition approach for automated designing Virtual Valuations Combinatorial Auctions (VVCAs). Firstly, we restrict our mechanism to deterministic VVCAs, which are inherently DSIC and IR. Afterward, we utilize a parallelizable dynamic programming algorithm to compute the allocation and revenue outcomes of a VVCA efficiently. We then decompose the revenue objective function into continuous and piecewise constant discontinuous components, optimizing each using distinct methods. Extensive experiments show that OD-VVCA achieves high revenue in multi-item auctions, especially in large-scale settings where it outperforms both randomized and deterministic baselines, indicating its efficacy and scalability.
GTMay 20, 2023
A Scalable Neural Network for DSIC Affine Maximizer Auction DesignZhijian Duan, Haoran Sun, Yurong Chen et al.
Automated auction design aims to find empirically high-revenue mechanisms through machine learning. Existing works on multi item auction scenarios can be roughly divided into RegretNet-like and affine maximizer auctions (AMAs) approaches. However, the former cannot strictly ensure dominant strategy incentive compatibility (DSIC), while the latter faces scalability issue due to the large number of allocation candidates. To address these limitations, we propose AMenuNet, a scalable neural network that constructs the AMA parameters (even including the allocation menu) from bidder and item representations. AMenuNet is always DSIC and individually rational (IR) due to the properties of AMAs, and it enhances scalability by generating candidate allocations through a neural network. Additionally, AMenuNet is permutation equivariant, and its number of parameters is independent of auction scale. We conduct extensive experiments to demonstrate that AMenuNet outperforms strong baselines in both contextual and non-contextual multi-item auctions, scales well to larger auctions, generalizes well to different settings, and identifies useful deterministic allocations. Overall, our proposed approach offers an effective solution to automated DSIC auction design, with improved scalability and strong revenue performance in various settings.
GTFeb 17, 2022
Insightful Mining EquilibriaMengqian Zhang, Yuhao Li, Jichen Li et al.
The selfish mining attack, arguably the most famous game-theoretic attack in blockchain, indicates that the Bitcoin protocol is not incentive-compatible. Most subsequent works mainly focus on strengthening the selfish mining strategy, thus enabling a single strategic agent more likely to deviate. In sharp contrast, little attention has been paid to the resistant behavior against the selfish mining attack, let alone further equilibrium analysis for miners and mining pools in the blockchain as a multi-agent system. In this paper, first, we propose a strategy called insightful mining to counteract selfish mining. By infiltrating an undercover miner into the selfish pool, the insightful pool could acquire the number of its hidden blocks. We prove that, with this extra insight, the utility of the insightful pool could be strictly greater than the selfish pool's when they have the same mining power. Then we investigate the mining game where all pools can either choose to be honest or take the insightful mining strategy. We characterize the Nash equilibrium of this mining game, and derive three corollaries: (a) each mining game has a pure Nash equilibrium; (b) honest mining is a Nash equilibrium if the largest mining pool has a fraction of mining power no more than 1/3; (c) there are at most two insightful pools under equilibrium no matter how the mining power is distributed.
GTJan 29, 2022
A Context-Integrated Transformer-Based Neural Network for Auction DesignZhijian Duan, Jingwu Tang, Yutong Yin et al.
One of the central problems in auction design is developing an incentive-compatible mechanism that maximizes the auctioneer's expected revenue. While theoretical approaches have encountered bottlenecks in multi-item auctions, recently, there has been much progress on finding the optimal mechanism through deep learning. However, these works either focus on a fixed set of bidders and items, or restrict the auction to be symmetric. In this work, we overcome such limitations by factoring \emph{public} contextual information of bidders and items into the auction learning framework. We propose $\mathtt{CITransNet}$, a context-integrated transformer-based neural network for optimal auction design, which maintains permutation-equivariance over bids and contexts while being able to find asymmetric solutions. We show by extensive experiments that $\mathtt{CITransNet}$ can recover the known optimal solutions in single-item settings, outperform strong baselines in multi-item auctions, and generalize well to cases other than those in training.
CRDec 31, 2021
An Efficient and Robust Committee Structure for Sharding BlockchainMengqian Zhang, Jichen Li, Zhaohua Chen et al.
Nowadays, sharding is deemed as a promising way to save traditional blockchain protocols from their low scalability. However, such technique also brings several potential risks and huge communication overheads. An improper design may give rise to the inconsistent state among different committees. Further, the communication overheads arising from cross-shard transactions unfortunately reduce the system's performance. In this paper, we first summarize five essential issues that all sharding blockchain designers face. For each issue, we discuss its key challenge and propose our suggested solutions. In order to break the performance bottlenecks, we propose a reputation mechanism for selecting leaders. The term of reputation in our design reflects each node's honest computation resources. In addition, we introduce a referee committee and partial sets in each committee, and design a recovery procedure in case the leader is malicious. Under the design, we prove that malicious leaders will not hurt the system and will be evicted. Furthermore, we conduct a series of simulations to evaluate our design. The results show that selecting leaders by the reputation can dramatically improve the system performance.
GTOct 8, 2021
Nash Convergence of Mean-Based Learning Algorithms in First-Price AuctionsXiaotie Deng, Xinyan Hu, Tao Lin et al.
The convergence properties of learning dynamics in repeated auctions is a timely and important question, with numerous applications in, e.g., online advertising markets. This work focuses on repeated first-price auctions where bidders with fixed values learn to bid using mean-based algorithms -- a large class of online learning algorithms that include popular no-regret algorithms such as Multiplicative Weights Update and Follow the Perturbed Leader. We completely characterize the learning dynamics of mean-based algorithms, under two notions of convergence: (1) time-average: the fraction of rounds where bidders play a Nash equilibrium converges to 1; (2) last-iterate: the mixed strategy profile of bidders converges to a Nash equilibrium. Specifically, the results depend on the number of bidders with the highest value: - If the number is at least three, the dynamics almost surely converges to a Nash equilibrium of the auction, in both time-average and last-iterate. - If the number is two, the dynamics almost surely converges to a Nash equilibrium in time-average but not necessarily last-iterate. - If the number is one, the dynamics may not converge to a Nash equilibrium in time-average or last-iterate. Our discovery opens up new possibilities in the study of the convergence of learning dynamics.
GTSep 4, 2021
On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic GamesXiaotie Deng, Ningyuan Li, David Mguni et al.
Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we derive that computing an approximate Markov Perfect Equilibrium (MPE) in a finite-state discounted Stochastic Game within the exponential precision is \textbf{PPAD}-complete. We adopt a function with a polynomially bounded description in the strategy space to convert the MPE computation to a fixed-point problem, even though the stochastic game may demand an exponential number of pure strategies, in the number of states, for each agent. The completeness result follows the reduction of the fixed-point problem to {\sc End of the Line}. Our results indicate that finding an MPE in SGs is highly unlikely to be \textbf{NP}-hard unless \textbf{NP}=\textbf{co-NP}. Our work offers confidence for MARL research to study MPE computation on general-sum SGs and to develop fruitful algorithms as currently on zero-sum SGs.
GTAug 17, 2021
Is Nash Equilibrium Approximator Learnable?Zhijian Duan, Wenhan Huang, Dinghuai Zhang et al.
In this paper, we investigate the learnability of the function approximator that approximates Nash equilibrium (NE) for games generated from a distribution. First, we offer a generalization bound using the Probably Approximately Correct (PAC) learning model. The bound describes the gap between the expected loss and empirical loss of the NE approximator. Afterward, we prove the agnostic PAC learnability of the Nash approximator. In addition to theoretical analysis, we demonstrate an application of NE approximator in experiments. The trained NE approximator can be used to warm-start and accelerate classical NE solvers. Together, our results show the practicability of approximating NE through function approximation.
GTOct 12, 2020
A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous SamplingXiaotie Deng, Ron Lavi, Tao Lin et al.
The Empirical Revenue Maximization (ERM) is one of the most important price learning algorithms in auction design: as the literature shows it can learn approximately optimal reserve prices for revenue-maximizing auctioneers in both repeated auctions and uniform-price auctions. However, in these applications the agents who provide inputs to ERM have incentives to manipulate the inputs to lower the outputted price. We generalize the definition of an incentive-awareness measure proposed by Lavi et al (2019), to quantify the reduction of ERM's outputted price due to a change of $m\ge 1$ out of $N$ input samples, and provide specific convergence rates of this measure to zero as $N$ goes to infinity for different types of input distributions. By adopting this measure, we construct an efficient, approximately incentive-compatible, and revenue-optimal learning algorithm using ERM in repeated auctions against non-myopic bidders, and show approximate group incentive-compatibility in uniform-price auctions.
CRFeb 17, 2020
An Efficient Permissioned Blockchain with Provable Reputation MechanismHongyin Chen, Zhaohua Chen, Yukun Cheng et al.
The design of permissioned blockchains places an access control requirement for members to read, access, and write information over the blockchains. In this paper, we study a hierarchical scenario to include three types of participants: providers, collectors, and governors. To be specific, providers forward transactions, collected from terminals, to collectors; collectors upload received transactions to governors after verifying and labeling them; and governors validate a part of received labeled transactions, pack valid ones into a block, and append a new block on the ledger. Collectors in the hierarchical model play a crucial role in the design: they have connections with both providers and governors, and are responsible for collecting, verifying, and uploading transactions. However, collectors are rational and some of them may behave maliciously (not necessarily for their own benefits). In this paper, we introduce a reputation protocol as a measure of the reliability of collectors in the permissioned blockchain environment. Its objective is to encourage collectors to behave truthfully and, in addition, to reduce the verification cost. The verification cost on provider $p$ is defined as the total number of invalid transactions provided by $p$ and checked by governors. Through theoretical analysis, our protocol with the reputation mechanism has a significant improvement in efficiency. Specifically, the verification loss that governors suffer is proved to be asymptotically $O(\sqrt{T_{total}})$ ($T_{total}$, representing the number of transactions verified by governors and provided by $p$), as long as there exists at least one collector who behaves well. At last, two typical cases where our model can be well applied are also demonstrated.
DCJan 19, 2020
CycLedger: A Scalable and Secure Parallel Protocol for Distributed Ledger via ShardingMengqian Zhang, Jichen Li, Zhaohua Chen et al.
Traditional public distributed ledgers have not been able to scale-out well and work efficiently. Sharding is deemed as a promising way to solve this problem. By partitioning all nodes into small committees and letting them work in parallel, we can significantly lower the amount of communication and computation, reduce the overhead on each node's storage, as well as enhance the throughput of the distributed ledger. Existing sharding-based protocols still suffer from several serious drawbacks. The first thing is that all non-faulty nodes must connect well with each other, which demands a huge number of communication channels in the network. Moreover, previous protocols have faced great loss in efficiency in the case where the honesty of each committee's leader is in question. At the same time, no explicit incentive is provided for nodes to actively participate in the protocol. We present CycLedger, a scalable and secure parallel protocol for distributed ledger via sharding. Our protocol selects a leader and a partial set for each committee, who are in charge of maintaining intra-shard consensus and communicating with other committees, to reduce the amortized complexity of communication, computation, and storage on all nodes. We introduce a novel semi-commitment scheme between committees and a recovery procedure to prevent the system from crashing even when leaders of committees are malicious. To add incentive for the network, we use the concept of reputation, which measures each node's trusty computing power. As nodes with a higher reputation receive more rewards, there is an encouragement for nodes with strong computing ability to work honestly to gain reputation. In this way, we strike out a new path to establish scalability, security, and incentive for the sharding-based distributed ledger.
LGNov 26, 2018
Reinforcement Learning for Uplift ModelingChenchen Li, Xiang Yan, Xiaotie Deng et al.
Uplift modeling aims to directly model the incremental impact of a treatment on an individual response. In this work, we address the problem from a new angle and reformulate it as a Markov Decision Process (MDP). We conducted extensive experiments on both a synthetic dataset and real-world scenarios, and showed that our method can achieve significant improvement over previous methods.