DL CLJan 30, 2018

A Machine Learning Approach to Quantitative Prosopography

Aayushee Gupta, Haimonti Dutta, Srikanta Bedathur, Lipika Dey

arXiv:1801.10080v12.33 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of historical research by enabling automated analysis of ordinary people from documents, though it is incremental as it builds on existing NER methods.

The paper tackles the problem of automating quantitative prosopography by developing a machine learning framework that learns a people gazetteer from noisy newspaper text using a Named Entity Recognizer and identifies influential people with a custom Influential Person Index, applied to a corpus of 14,020 articles from 1896.

Prosopography is an investigation of the common characteristics of a group of people in history, by a collective study of their lives. It involves a study of biographies to solve historical problems. If such biographies are unavailable, surviving documents and secondary biographical data are used. Quantitative prosopography involves analysis of information from a wide variety of sources about "ordinary people". In this paper, we present a machine learning framework for automatically designing a people gazetteer which forms the basis of quantitative prosopographical research. The gazetteer is learnt from the noisy text of newspapers using a Named Entity Recognizer (NER). It is capable of identifying influential people from it by making use of a custom designed Influential Person Index (IPI). Our corpus comprises of 14020 articles from a local newspaper, "The Sun", published from New York in 1896. Some influential people identified by our algorithm include Captain Donald Hankey (an English soldier), Dame Nellie Melba (an Australian operatic soprano), Hugh Allan (a Canadian shipping magnate) and Sir Hugh John McDonald (the first Prime Minister of Canada).

View on arXiv PDF

Similar