AIJun 2

The DeepSpeak-Agentic Dataset

arXiv:2606.0368622.2h-index: 2
Predicted impact top 27% in AI · last 90 daysOriginality Synthesis-oriented
AI Analysis

This dataset and benchmark serve researchers studying AI-generated media detection and human-agent interaction, but the contribution is primarily a new resource rather than a novel method or breakthrough.

The authors present DeepSpeak-Agentic, a 37+ hour video dataset of human-AI conversations, and use it to benchmark forensic identification of AI agents in audio, video, and text, while also providing a scalable data-capture system.

We present DeepSpeak-Agentic, a dataset of videos comprising over 37 hours of semi-structured conversations between a human and an embodied AI agent. We use this dataset to evaluate the automatic forensic identification (audio, video, or text) of AI agents, study the nature of human-agent interactions, and provide a benchmark for future advances in the large-language models and AI-generated voices and faces that power embodied AI agents. We also contribute a scalable data-capture system that creates agents, automatically pairs them with human crowd workers, records audiovisual conversations across specified scenarios, and identifies and separates the human and agent in the combined stream.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes