Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports
This work addresses player evaluation for baseball teams and analysts, but it is incremental as it applies existing methods to a new dataset.
The authors tackled the problem of predicting whether minor league baseball players will reach the MLB using scouting reports, by creating a dataset of nearly 10,000 reports and applying deep neural networks, though no concrete prediction results or numbers are provided.
In baseball, a scouting report profiles a player's characteristics and traits, usually intended for use in player valuation. This work presents a first-of-its-kind dataset of almost 10,000 scouting reports for minor league, international, and draft prospects. Compiled from articles posted to MLB.com and Fangraphs.com, each report consists of a written description of the player, numerical grades for several skills, and unique IDs to reference their profiles on popular resources like MLB.com, FanGraphs, and Baseball-Reference. With this dataset, we employ several deep neural networks to predict if minor league players will make the MLB given their scouting report. We open-source this data to share with the community, and present a web application demonstrating language variations in the reports of successful and unsuccessful prospects.