EnvId: A Metric Learning Approach for Forensic Few-Shot Identification of Unseen Environments
This addresses the need for practical forensic tools to associate audio recordings with unseen environments in criminal cases, though it is incremental as it builds on existing few-shot and metric learning approaches.
The paper tackles the problem of identifying recording locations from audio in forensic investigations, where candidate locations are case-specific and audio quality varies, by proposing a few-shot classification framework called EnvId that avoids retraining and shows good performance under challenging conditions.
Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of a recorded audio to its recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide supervised classification tools for closed-set recording environment identification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, supervised learning techniques are not applicable without retraining a classifier on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining by modeling the task as a few-shot classification problem. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, out-of-distribution reverberation characteristics or recording position mismatches.