Gal A. Kaminka

h-index43

11papers

194citations

Novelty46%

AI Score53

Ranked #32,700 of 201,326 authors (top 16%)#1,510 in AI (top 11%)

11 Papers

AIJun 7, 2022

Position Paper: Online Modeling for Offline Planning

Eyal Weiss, Gal A. Kaminka

The definition and representation of planning problems is at the heart of AI planning research. A key part is the representation of action models. Decades of advances improving declarative action model representations resulted in numerous theoretical advances, and capable, working, domain-independent planners. However, despite the maturity of the field, AI planning technology is still rarely used outside the research community, suggesting that current representations fail to capture real-world requirements, such as utilizing complex mathematical functions and models learned from data. We argue that this is because the modeling process is assumed to have taken place and completed prior to the planning process, i.e., offline modeling for offline planning. There are several challenges inherent to this approach, including: limited expressiveness of declarative modeling languages; early commitment to modeling choices and computation, that preclude using the most appropriate resolution for each action model -- which can only be known during planning; and difficulty in reliably using non-declarative, learned, models. We therefore suggest to change the AI planning process, such that is carries out online modeling in offline planning, i.e., the use of action models that are computed or even generated as part of the planning process, as they are accessed. This generalizes the existing approach (offline modeling). The proposed definition admits novel planning processes, and we suggest one concrete implementation, demonstrating the approach. We sketch initial results that were obtained as part of a first attempt to follow this approach by planning with action cost estimators. We conclude by discussing open challenges.

DSAug 22, 2022

A Generalization of the Shortest Path Problem to Graphs with Multiple Edge-Cost Estimates

Eyal Weiss, Ariel Felner, Gal A. Kaminka

The shortest path problem in graphs is a cornerstone of AI theory and applications. Existing algorithms generally ignore edge weight computation time. We present a generalized framework for weighted directed graphs, where edge weight can be computed (estimated) multiple times, at increasing accuracy and run-time expense. This raises several generalized variants of the shortest path problem. We introduce the problem of finding a path with the tightest lower-bound on the optimal cost. We then present two complete algorithms for the generalized problem, and empirically demonstrate their efficacy.

AIJun 8, 2022

Planning with Dynamically Estimated Action Costs

Eyal Weiss, Gal A. Kaminka

Information about action costs is critical for real-world AI planning applications. Rather than rely solely on declarative action models, recent approaches also use black-box external action cost estimators, often learned from data, that are applied during the planning phase. These, however, can be computationally expensive, and produce uncertain values. In this paper we suggest a generalization of deterministic planning with action costs that allows selecting between multiple estimators for action cost, to balance computation time against bounded estimation uncertainty. This enables a much richer -- and correspondingly more realistic -- problem representation. Importantly, it allows planners to bound plan accuracy, thereby increasing reliability, while reducing unnecessary computational burden, which is critical for scaling to large problems. We introduce a search algorithm, generalizing $A^*$, that solves such planning problems, and additional algorithmic extensions. In addition to theoretical guarantees, extensive experiments show considerable savings in runtime compared to alternatives.

AIJan 7

Personalized Medication Planning via Direct Domain Modeling and LLM-Generated Heuristics

Yonatan Vernik, Alexander Tuisov, David Izhaki et al.

Personalized medication planning involves selecting medications and determining a dosing schedule to achieve medical goals specific to each individual patient. Previous work successfully demonstrated that automated planners, using general domain-independent heuristics, are able to generate personalized treatments, when the domain and problems are modeled using a general domain description language (\pddlp). Unfortunately, this process was limited in practice to consider no more than seven medications. In clinical terms, this is a non-starter. In this paper, we explore the use of automatically-generated domain- and problem-specific heuristics to be used with general search, as a method of scaling up medication planning to levels allowing closer work with clinicians. Specifically, we specify the domain programmatically (specifying an initial state and a successor generation procedure), and use an LLM to generate a problem specific heuristic that can be used by a fixed search algorithm (GBFS). The results indicate dramatic improvements in coverage and planning time, scaling up the number of medications to at least 28, and bringing medication planning one step closer to practical applications.

AIJan 1

Multiagent Reinforcement Learning for Liquidity Games

Alicia Vidler, Gal A. Kaminka

Making use of swarm methods in financial market modeling of liquidity, and techniques from financial analysis in swarm analysis, holds the potential to advance both research areas. In swarm research, the use of game theory methods holds the promise of explaining observed phenomena of collective utility adherence with rational self-interested swarm participants. In financial markets, a better understanding of how independent financial agents may self-organize for the betterment and stability of the marketplace would be a boon for market design researchers. This paper unifies Liquidity Games, where trader payoffs depend on aggregate liquidity within a trade, with Rational Swarms, where decentralized agents use difference rewards to align self-interested learning with global objectives. We offer a theoretical frameworks where we define a swarm of traders whose collective objective is market liquidity provision while maintaining agent independence. Using difference rewards within a Markov team games framework, we show that individual liquidity-maximizing behaviors contribute to overall market liquidity without requiring coordination or collusion. This Financial Swarm model provides a framework for modeling rational, independent agents where they achieve both individual profitability and collective market efficiency in bilateral asset markets.

DSAug 15, 2023

Tightest Admissible Shortest Path

Eyal Weiss, Ariel Felner, Gal A. Kaminka

The shortest path problem in graphs is fundamental to AI. Nearly all variants of the problem and relevant algorithms that solve them ignore edge-weight computation time and its common relation to weight uncertainty. This implies that taking these factors into consideration can potentially lead to a performance boost in relevant applications. Recently, a generalized framework for weighted directed graphs was suggested, where edge-weight can be computed (estimated) multiple times, at increasing accuracy and run-time expense. We build on this framework to introduce the problem of finding the tightest admissible shortest path (TASP); a path with the tightest suboptimality bound on the optimal cost. This is a generalization of the shortest path problem to bounded uncertainty, where edge-weight uncertainty can be traded for computational cost. We present a complete algorithm for solving TASP, with guarantees on solution quality. Empirical evaluation supports the effectiveness of this approach.

LGMay 6

A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs

Erel Shtossel, Alicia Vidler, Uri Shaham et al.

Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete actions stochastically generate both rewards and durations, and the objective is to optimize the average reward rate. Existing algorithms approach this by optimizing the ratio of rewards to durations. However, when rewards and durations are non-stationary (in the infinite horizon), this can be incorrect. This paper presents a novel modified harmonic mean operator that correctly computes reward rates even under such conditions. This yields model-free learning algorithms that can work with SMDPs, while maintaining robustness to non-stationary reward and duration distributions over time. We prove theoretical properties of the modified harmonic mean operator, and empirically demonstrate its efficacy in comparison to existing algorithms.

ROMay 6

Modular Reinforcement Learning For Cooperative Swarms

Erel Shtossel, Gal A. Kaminka

A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-agent reinforcement learning have demonstrated that it is possible for robots to learn how to interact effectively with others, in a manner that is aligned with the common goal, despite each robot learning independently of others. However, this requires each robot to represent a potentially combinatorial number of interaction states, challenging the memory capabilities of the robots. This paper proposes an alternative approach for representing spatial interaction states for multi-robot reinforcement learning in swarms. A modular (decomposed) representation is used, where each feature of the state is handled by a separate learning procedure, and the results aggregated. We demonstrate the efficacy of the approach in numerous experiments with simulated robot swarms carrying out foraging.

AIMay 3

A Language for Describing Agentic LLM Contexts

Noga Peleg Pelc, Gal A. Kaminka, Yoav Goldberg

Large language models are increasingly used within larger systems ("LLM agents"). These make a sequence of LLM calls, each call providing the LLM with a combination of instructions, observations, and interaction history. The design of the encoded information and its structure play a central role in the quality of the resulting system, leading to efforts spent on context engineering. It is therefore critical to communicate the composition of the LLM context in a system, and how it evolves over time. Yet, no standard exists for doing so: context construction is typically conveyed through informal prose, ad hoc diagrams, or direct inspection of code, none of which precisely capture how a prompt evolves across interaction steps or how two context representation strategies differ. To remedy this, we introduce the Agentic Context Description Language (ACDL), a language for specifying the structure and dynamics of LLM input contexts in a precise, readable, and standard manner, along with visualizations. ACDL provides constructs for specifying context aspects such as role message sequences, dynamic content, time-indexed references, and conditional or iterative structure, capturing the full architecture of a prompt independently of any particular implementation. ACDL diagrams can be hand drawn on a whiteboard, or written in formal language which can then be rendered. We describe the language, demonstrate it by documenting several existing systems and their variants, and encourage the community to adopt it for describing LLM systems context, both in day-to-day communication and in papers. Tooling, examples and documentation are available at www.acdlang.org.

AISep 28, 2017

Heuristic Online Goal Recognition in Continuous Domains

Mor Vered, Gal A. Kaminka

Goal recognition is the problem of inferring the goal of an agent, based on its observed actions. An inspiring approach - plan recognition by planning (PRP) - uses off-the-shelf planners to dynamically generate plans for given goals, eliminating the need for the traditional plan library. However, existing PRP formulation is inherently inefficient in online recognition, and cannot be used with motion planners for continuous spaces. In this paper, we utilize a different PRP formulation which allows for online goal recognition, and for application in continuous spaces. We present an online recognition algorithm, where two heuristic decision points may be used to improve run-time significantly over existing work. We specify heuristics for continuous domains, prove guarantees on their use, and empirically evaluate the algorithm over hundreds of experiments in both a 3D navigational environment and a cooperative robotic team task.

MAJan 16, 2014

Multi-Robot Adversarial Patrolling: Facing a Full-Knowledge Opponent

Noa Agmon, Gal A. Kaminka, Sarit Kraus

The problem of adversarial multi-robot patrol has gained interest in recent years, mainly due to its immediate relevance to various security applications. In this problem, robots are required to repeatedly visit a target area in a way that maximizes their chances of detecting an adversary trying to penetrate through the patrol path. When facing a strong adversary that knows the patrol strategy of the robots, if the robots use a deterministic patrol algorithm, then in many cases it is easy for the adversary to penetrate undetected (in fact, in some of those cases the adversary can guarantee penetration). Therefore this paper presents a non-deterministic patrol framework for the robots. Assuming that the strong adversary will take advantage of its knowledge and try to penetrate through the patrols weakest spot, hence an optimal algorithm is one that maximizes the chances of detection in that point. We therefore present a polynomial-time algorithm for determining an optimal patrol under the Markovian strategy assumption for the robots, such that the probability of detecting the adversary in the patrols weakest spot is maximized. We build upon this framework and describe an optimal patrol strategy for several robotic models based on their movement abilities (directed or undirected) and sensing abilities (perfect or imperfect), and in different environment models - either patrol around a perimeter (closed polygon) or an open fence (open polyline).