CLAILGJun 11, 2025

Measuring Corporate Human Capital Disclosures: Lexicon, Data, Code, and Research Opportunities

arXiv:2506.10155v110 citationsh-index: 8J Inf Syst
Originality Synthesis-oriented
AI Analysis

This provides a tool for researchers to analyze corporate human capital disclosures, but it is incremental as it applies existing methods to a new domain without major methodological breakthroughs.

The authors tackled the lack of well-defined measurement for corporate human capital disclosures by developing a comprehensive lexicon using a machine learning algorithm (word2vec) trained on confirmed disclosures, resulting in a categorized list of keywords and shared data and code for research use.

Human capital (HC) is increasingly important to corporate value creation. Unlike other assets, however, HC is not currently subject to well-defined measurement or disclosure rules. We use a machine learning algorithm (word2vec) trained on a confirmed set of HC disclosures to develop a comprehensive list of HC-related keywords classified into five subcategories (DEI; health and safety; labor relations and culture; compensation and benefits; and demographics and other) that capture the multidimensional nature of HC management. We share our lexicon, corporate HC disclosures, and the Python code used to develop the lexicon, and we provide detailed examples of using our data and code, including for fine-tuning a BERT model. Researchers can use our HC lexicon (or modify the code to capture another construct of interest) with their samples of corporate communications to address pertinent HC questions. We close with a discussion of future research opportunities related to HC management and disclosure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes