IR CLSep 24, 2018

Recognizing Film Entities in Podcasts

Ahmet Salih Gundogdu, Arjun Sanghvi, Keith Harrigian

arXiv:1809.08711v11.71 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the domain-specific problem of film entity recognition in podcasts, which is incremental as it adapts existing NER methods to a new audio context.

The paper tackles the problem of identifying film titles in podcast audio by proposing a Named Entity Recognition system that is robust to transcription errors and computationally efficient for new titles. The result is a more than 20% increase in F1 score across three baseline approaches when combining fuzzy-matching with a linear model that uses film-specific metadata.

In this paper, we propose a Named Entity Recognition (NER) system to identify film titles in podcast audio. Taking inspiration from NER systems for noisy text in social media, we implement a two-stage approach that is robust to computer transcription errors and does not require significant computational expense to accommodate new film titles/releases. Evaluating on a diverse set of podcasts, we demonstrate more than a 20% increase in F1 score across three baseline approaches when combining fuzzy-matching with a linear model aware of film-specific metadata.

View on arXiv PDF

Similar