Audio Content based Geotagging in Multimedia
This addresses geotagging for multimedia applications, but appears incremental as it applies existing matrix factorization techniques to a new audio-based context.
The paper tackles the problem of geotagging multimedia recordings by inferring location from the composition of sound events in audio, using matrix factorization to extract semantic sound classes and combining them to identify the recording's location.
In this paper we propose methods to extract geographically relevant information in a multimedia recording using its audio. Our method primarily is based on the fact that urban acoustic environment consists of a variety of sounds. Hence, location information can be inferred from the composition of sound events/classes present in the audio. More specifically, we adopt matrix factorization techniques to obtain semantic content of recording in terms of different sound classes. These semantic information are then combined to identify the location of recording.