LGFeb 1, 2022

Machine learning to assess relatedness: the advantage of using firm-level data

arXiv:2202.00458v34.615 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of accurately forecasting exports for firms and countries, offering an incremental improvement by leveraging underutilized firm-level data to enhance relatedness assessments.

The study tackled the problem of measuring economic relatedness between products by comparing networks and machine learning algorithms trained on country-level versus firm-level data, finding that using machine learning on the same data type as the prediction target yields the best results, with firm-level data improving country-level predictions and reducing computational time while maintaining performance above benchmarks.

The relatedness between a country or a firm and a product is a measure of the feasibility of that economic activity. As such, it is a driver for investments at a private and institutional level. Traditionally, relatedness is measured using networks derived by country-level co-occurrences of product pairs, that is counting how many countries export both. In this work, we compare networks and machine learning algorithms trained not only on country-level data, but also on firms, that is something not much studied due to the low availability of firm-level data. We quantitatively compare the different measures of relatedness, by using them to forecast the exports at the country and firm-level, assuming that more related products have a higher likelihood to be exported in the future. Our results show that relatedness is scale-dependent: the best assessments are obtained by using machine learning on the same typology of data one wants to predict. Moreover, we found that while relatedness measures based on country data are not suitable for firms, firm-level data are very informative also for the development of countries. In this sense, models built on firm data provide a better assessment of relatedness. We also discuss the effect of using parameter optimization and community detection algorithms to identify clusters of related companies and products, finding that a partition into a higher number of blocks decreases the computational time while maintaining a prediction performance well above the network-based benchmarks.

View on arXiv PDF

Similar