Berkay Anahtarcı

h-index5

6papers

84citations

Novelty53%

AI Score35

Ranked #103,522 of 194,257 authors (top 53%)#674 in SY (top 41%)

6 Papers

19.4LGJul 9

NL-PAC: Specification Ambiguity and Certified Minimax Risk Floors in LLM-Mediated Supervision

Berkay Anahtarci

Large language models increasingly provide labels, evaluations, and feedback for tasks specified in natural language. When a specification admits multiple readings but the supervision channel does not reveal which is operative, additional labels reduce sampling error without resolving the resulting identification problem. We introduce Natural Language PAC (NL-PAC), a framework that uses a fixed model's thresholded decoding law to define admissible labels and candidate targets. The probability that multiple labels are admissible equals the diameter of the pointwise-admissible target class, and under target-blind supervision every learner incurs worst-case risk of at least half this diameter, at every sample size; the exact randomized minimax risk over this class is attained by a data-independent strategy. Finite-sample confidence bounds make these quantities certifiable from held-out unlabeled inputs. In a frozen Qwen~2.5--3B audit, one prespecified prompt yields a positive model-relative certificate, whereas a paraphrase and exact-rule controls yield zero. A held-out bridge audit finds that supplied candidate reading clauses fail the admissibility condition needed to transfer the certificate to coherent readings. The guarantee is specific to the audited model, prompt, threshold, and input distribution; extending it to human interpretations requires external validation.

3.9LGJun 15

Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games with Average Reward

Şevket Kaan Alkır, Naci Saldı, Berkay Anahtarcı et al.

We study inverse reinforcement learning for discrete-time, infinite-horizon mean-field games (MFGs) under an average-reward criterion. Expert demonstrations are assumed to arise from a stationary mean-field equilibrium under an unknown reward, and the goal is to recover a policy explaining the observed behaviour via the maximum causal entropy principle. We formulate the inverse problem by enforcing consistency with the expert mean-field term and long-run feature expectations, treating two reward classes within a unified occupation-measure framework. For finite-dimensional linear rewards, we give a convex dual reformulation with an explicit log-partition objective, and prove smoothness and curvature properties justifying constant-step-size gradient descent. For infinite-dimensional RKHS rewards, we develop a Lagrangian relaxation whose inner-maximising policy is characterised by a soft Bellman equation. The main obstacle is the absence of a discount-factor contraction. We resolve this by introducing a minorisation-based sub-stochastic kernel that yields a strict contraction of the soft Bellman operator. We establish Fréchet differentiability and Lipschitz smoothness of the log-likelihood score, leading to a gradient ascent algorithm with convergence guarantees. Two numerical examples, a malware-spread MFG and an RKHS-based consumer-choice model, show that the recovered policies closely match expert behaviour.

2.3SYJan 12, 2024

Maximum Causal Entropy IRL in Mean-Field Games and GNEP Framework for Forward RL

Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi

This paper explores the use of Maximum Causal Entropy Inverse Reinforcement Learning (IRL) within the context of discrete-time stationary Mean-Field Games (MFGs) characterized by finite state spaces and an infinite-horizon, discounted-reward setting. Although the resulting optimization problem is non-convex with respect to policies, we reformulate it as a convex optimization problem in terms of state-action occupation measures by leveraging the linear programming framework of Markov Decision Processes. Based on this convex reformulation, we introduce a gradient descent algorithm with a guaranteed convergence rate to efficiently compute the optimal solution. Moreover, we develop a new method that conceptualizes the MFG problem as a Generalized Nash Equilibrium Problem (GNEP), enabling effective computation of the mean-field equilibrium for forward reinforcement learning (RL) problems and marking an advancement in MFG solution techniques. We further illustrate the practical applicability of our GNEP approach by employing this algorithm to generate data for numerical MFG examples.

4.1LGJul 19, 2025

Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games

Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi

We consider the maximum causal entropy inverse reinforcement learning problem for infinite-horizon stationary mean-field games, in which we model the unknown reward function within a reproducing kernel Hilbert space. This allows the inference of rich and potentially nonlinear reward structures directly from expert demonstrations, in contrast to most existing inverse reinforcement learning approaches for mean-field games that typically restrict the reward function to a linear combination of a fixed finite set of basis functions. We also focus on the infinite-horizon cost structure, whereas prior studies primarily rely on finite-horizon formulations. We introduce a Lagrangian relaxation to this maximum causal entropy inverse reinforcement learning problem that enables us to reformulate it as an unconstrained log-likelihood maximization problem, and obtain a solution \lk{via} a gradient ascent algorithm. To illustrate the theoretical consistency of the algorithm, we establish the smoothness of the log-likelihood objective by proving the Fréchet differentiability of the related soft Bellman operators with respect to the parameters in the reproducing kernel Hilbert space. We demonstrate the effectiveness of our method on a mean-field traffic routing game, where it accurately recovers expert behavior.

24.0OCMar 24, 2020

Q-Learning in Regularized Mean-field Games

Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi

In this paper, we introduce a regularized mean-field game and study learning of this game under an infinite-horizon discounted reward function. Regularization is introduced by adding a strongly concave regularization function to the one-stage reward function in the classical mean-field game model. We establish a value iteration based learning algorithm to this regularized mean-field game using fitted Q-learning. The regularization term in general makes reinforcement learning algorithm more robust to the system components. Moreover, it enables us to establish error analysis of the learning algorithm without imposing restrictive convexity assumptions on the system components, which are needed in the absence of a regularization term.

6.6SYDec 31, 2019

Learning in Discounted-cost and Average-cost Mean-field Games

Berkay Anahtarcı, Can Deha Karıksız, Naci Saldi

We consider learning approximate Nash equilibria for discrete-time mean-field games with nonlinear stochastic state dynamics subject to both average and discounted costs. To this end, we introduce a mean-field equilibrium (MFE) operator, whose fixed point is a mean-field equilibrium (i.e. equilibrium in the infinite population limit). We first prove that this operator is a contraction, and propose a learning algorithm to compute an approximate mean-field equilibrium by approximating the MFE operator with a random one. Moreover, using the contraction property of the MFE operator, we establish the error analysis of the proposed learning algorithm. We then show that the learned mean-field equilibrium constitutes an approximate Nash equilibrium for finite-agent games.