CLNov 21, 2024

Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning

arXiv:2411.13904v19 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating trustworthy and adaptive agents for personalized travel planning, representing an incremental improvement in agent behavior evaluation.

The paper tackles the problem of designing LLM-based agents for full delegation in travel planning by proposing the APEC Agent Constitution criteria (Accuracy, Proactivity, Efficiency, Credibility) and developing APEC-Travel, which surpasses baselines by 20.7% on rule-based metrics and 9.1% on LLM-as-a-Judge scores.

How are LLM-based agents used in the future? While many of the existing work on agents has focused on improving the performance of a specific family of objective and challenging tasks, in this work, we take a different perspective by thinking about full delegation: agents take over humans' routine decision-making processes and are trusted by humans to find solutions that fit people's personalized needs and are adaptive to ever-changing context. In order to achieve such a goal, the behavior of the agents, i.e., agentic behaviors, should be evaluated not only on their achievements (i.e., outcome evaluation), but also how they achieved that (i.e., procedure evaluation). For this, we propose APEC Agent Constitution, a list of criteria that an agent should follow for good agentic behaviors, including Accuracy, Proactivity, Efficiency and Credibility. To verify whether APEC aligns with human preferences, we develop APEC-Travel, a travel planning agent that proactively extracts hidden personalized needs via multi-round dialog with travelers. APEC-Travel is constructed purely from synthetic data generated by Llama3.1-405B-Instruct with a diverse set of travelers' persona to simulate rich distribution of dialogs. Iteratively fine-tuned to follow APEC Agent Constitution, APEC-Travel surpasses baselines by 20.7% on rule-based metrics and 9.1% on LLM-as-a-Judge scores across the constitution axes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes