SE PLSep 11, 2020

A Principled Approach to GraphQL Query Cost Analysis

Alan Cha, Erik Wittern, Guillaume Baudart, James C. Davis, Louis Mandel, Jim A. Laredo

arXiv:2009.05632v18.916 citations

Originality Highly original

AI Analysis

This provides a principled solution for service providers to manage GraphQL API risks, addressing a specific bottleneck in web API security and performance.

The paper tackles the problem of GraphQL queries potentially requesting excessive data, which can disrupt service availability, by presenting a linear-time query analysis that measures cost without execution, achieving consistently tight upper bounds on response sizes for commercial APIs.

The landscape of web APIs is evolving to meet new client requirements and to facilitate how providers fulfill them. A recent web API model is GraphQL, which is both a query language and a runtime. Using GraphQL, client queries express the data they want to retrieve or mutate, and servers respond with exactly those data or changes. GraphQL's expressiveness is risky for service providers because clients can succinctly request stupendous amounts of data, and responding to overly complex queries can be costly or disrupt service availability. Recent empirical work has shown that many service providers are at risk. Using traditional API management methods is not sufficient, and practitioners lack principled means of estimating and measuring the cost of the GraphQL queries they receive. In this work, we present a linear-time GraphQL query analysis that can measure the cost of a query without executing it. Our approach can be applied in a separate API management layer and used with arbitrary GraphQL backends. In contrast to existing static approaches, our analysis supports common GraphQL conventions that affect query cost, and our analysis is provably correct based on our formal specification of GraphQL semantics. We demonstrate the potential of our approach using a novel GraphQL query-response corpus for two commercial GraphQL APIs. Our query analysis consistently obtains upper cost bounds, tight enough relative to the true response sizes to be actionable for service providers. In contrast, existing static GraphQL query analyses exhibit over-estimates and under-estimates because they fail to support GraphQL conventions.

View on arXiv PDF

Similar