CYMay 11

TourMart: A Parametric Audit Instrument for Commission Steering in LLM Travel Agents

arXiv:2605.1044095.0
AI Analysis

For regulators and OTAs, TourMart provides the first quantitative measure of commission steering in LLM travel agents, addressing a gap in existing audit methods.

TourMart is an audit instrument that detects commission steering in LLM-based travel agents. At a deployed setting, it found +7.69pp steering (p=0.003) for Qwen-14B and +2.96pp (p=0.008) for Llama-3.1-8B, enabling a compliance report to quote 7.7 extra steered recommendations per 100 sessions.

Online travel agents (Booking, Trip.com, Expedia) have replaced ranked-list interfaces with conversational LLM agents that compress many options into one sentence of advice. Each booking earns the OTA commission and different suppliers pay different rates: the agent has a structural incentive to favor higher-margin recommendations. Whether any deployed agent does this, and by how much, no one can currently measure. Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens. We propose TourMart, an applied intelligent-system audit instrument for LLM-OTA commission governance. Two governance levers -- lambda (gain on message-induced perception in the traveler's accept/reject decision) and kappa (budget-normalized cap on how far the message can shift perceived welfare) -- drive a paired counterfactual: holding the traveler and bundle fixed, the steering delta is read off between a commission-aware prompt and a minimum-disclosure factual template. A symmetric six-gate producer audit separates LLM-engineering failures (template collapse, refusal, internal-ID leakage) from genuine commercial steering. At deployed (lambda=1, kappa=0.05), a Qwen-14B reader shows +7.69pp steering (exact McNemar p=0.003); a Llama-3.1-8B reader shows +3.50pp in the same direction at n=143, with an extended-n supplement (n=270) confirming significance (+2.96pp, p=0.008). Across the (lambda, kappa) grid both arms pass family-wise scenario-clustered correction (p<0.001 / p=0.008). TourMart outputs a sentence a compliance report can quote: "at this deployment, 7.7 extra commission-steered recommendations per 100 paired traveler sessions."

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes