CVJan 5, 2019
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker DetectionJoseph Roth, Sourish Chaudhuri, Ondrej Klejch et al.
Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual dataset for this task has constrained algorithm evaluations with respect to data diversity, environments, and accuracy. This has made comparisons and improvements difficult. In this paper, we present the AVA Active Speaker detection dataset (AVA-ActiveSpeaker) that will be released publicly to facilitate algorithm development and enable comparisons. The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible. This dataset contains about 3.65 million human labeled frames or about 38.5 hours of face tracks, and the corresponding audio. We also present a new audio-visual approach for active speaker detection, and analyze its performance, demonstrating both its strength and the contributions of the dataset.
APJun 13, 2013
Crowds, Bluetooth, and Rock-n-Roll. Understanding Music Festival Participant BehaviorJakob Eg Larsen, Piotr Sapiezynski, Arkadiusz Stopczynski et al.
In this paper we present a study of sensing and analyzing an offline social network of participants at a large-scale music festival (8 days, 130,000+ participants). We place 33 fixed-location Bluetooth scanners in strategic spots around the festival area to discover Bluetooth-enabled mobile phones carried by the participants, and thus collect spatio-temporal traces of their mobility and interactions. We subsequently analyze the data on two levels. On the micro level, we run a community detection algorithm to reveal a variety of groups the festival participants form. On the macro level, we employ an Infinite Relational Model (IRM) in order to recover the structure of the social network related to participants' music preferences. The obtained structure in the form of clusters of concerts and participants is then interpreted using meta-information about music genres, band origins, stages, and dates of performances. We show that most of the concerts clusters can be described by one or more of the meta-features, effectively revealing preferences of participants (e.g. a cluster of US bands) and discuss the significance of the findings and the potential and limitations of the used method. Finally, we discuss the possibility of employing the described method and techniques for creating user-oriented applications and extending the sensing capabilities during large-scale events by introducing user involvement.
HCApr 1, 2013
The Smartphone Brain Scanner: A Mobile Real-time Neuroimaging SystemArkadiusz Stopczynski, Carsten Stahlhut, Jakob Eg Larsen et al.
Combining low cost wireless EEG sensors with smartphones offers novel opportunities for mobile brain imaging in an everyday context. We present a framework for building multi-platform, portable EEG applications with real-time 3D source reconstruction. The system - Smartphone Brain Scanner - combines an off-the-shelf neuroheadset or EEG cap with a smartphone or tablet, and as such represents the first fully mobile system for real-time 3D EEG imaging. We discuss the benefits and challenges of a fully portable system, including technical limitations as well as real-time reconstruction of 3D images of brain activity. We present examples of the brain activity captured in a simple experiment involving imagined finger tapping, showing that the acquired signal in a relevant brain region is similar to that obtained with standard EEG lab equipment. Although the quality of the signal in a mobile solution using a off-the-shelf consumer neuroheadset is lower compared to that obtained using high density standard EEG equipment, we propose that mobile application development may offset the disadvantages and provide completely new opportunities for neuroimaging in natural settings.