Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus
This work identifies demographic gaps in a conversation corpus, which is crucial for building inclusive interactive AI applications, but it is incremental as it focuses on analysis rather than new methods.
The researchers analyzed a Korean nationwide daily conversation corpus to characterize participation by age and sex groups, addressing the underexplored demographic information in such corpora.
A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.