Quantifying A Firm's AI Engagement: Constructing Objective, Data-Driven, AI Stock Indices Using 10-K Filings
This provides a more transparent and cost-effective alternative for investors, asset managers, and policymakers to construct AI-themed portfolios, though it is incremental as it applies existing NLP techniques to a new financial domain.
The paper tackled the problem of opaque and subjective selection criteria in AI-related ETFs by proposing an objective, data-driven NLP method to classify AI stocks using 10-K filings from 3,395 NASDAQ firms (2011-2023), constructing four AI indices that performed on par with or surpassed 14 existing AI ETFs and the Nasdaq Composite Index in risk-return profiles, with higher average daily returns and risk-adjusted metrics without increased volatility.
Following an analysis of existing AI-related exchange-traded funds (ETFs), we reveal the selection criteria for determining which stocks qualify as AI-related are often opaque and rely on vague phrases and subjective judgments. This paper proposes a new, objective, data-driven approach using natural language processing (NLP) techniques to classify AI stocks by analyzing annual 10-K filings from 3,395 NASDAQ-listed firms between 2011 and 2023. This analysis quantifies each company's engagement with AI through binary indicators and weighted AI scores based on the frequency and context of AI-related terms. Using these metrics, we construct four AI stock indices-the Equally Weighted AI Index (AII), the Size-Weighted AI Index (SAII), and two Time-Discounted AI Indices (TAII05 and TAII5X)-offering different perspectives on AI investment. We validate our methodology through an event study on the launch of OpenAI's ChatGPT, demonstrating that companies with higher AI engagement saw significantly greater positive abnormal returns, with analyses supporting the predictive power of our AI measures. Our indices perform on par with or surpass 14 existing AI-themed ETFs and the Nasdaq Composite Index in risk-return profiles, market responsiveness, and overall performance, achieving higher average daily returns and risk-adjusted metrics without increased volatility. These results suggest our NLP-based approach offers a reliable, market-responsive, and cost-effective alternative to existing AI-related ETF products. Our innovative methodology can also guide investors, asset managers, and policymakers in using corporate data to construct other thematic portfolios, contributing to a more transparent, data-driven, and competitive approach.