Machine Learning Fund Categorizations
This work addresses the need for automated fund categorization for applications like portfolio management and marketing, but it is incremental as it replicates an existing human-curated system.
The paper tackled the problem of identifying similar mutual funds in a diverse market by demonstrating that an industry-wide categorization system can be learned and reproduced using machine learning, resulting in a data-driven categorization method.
Given the surge in popularity of mutual funds (including exchange-traded funds (ETFs)) as a diversified financial investment, a vast variety of mutual funds from various investment management firms and diversification strategies have become available in the market. Identifying similar mutual funds among such a wide landscape of mutual funds has become more important than ever because of many applications ranging from sales and marketing to portfolio replication, portfolio diversification and tax loss harvesting. The current best method is data-vendor provided categorization which usually relies on curation by human experts with the help of available data. In this work, we establish that an industry wide well-regarded categorization system is learnable using machine learning and largely reproducible, and in turn constructing a truly data-driven categorization. We discuss the intellectual challenges in learning this man-made system, our results and their implications.