Back to Explore
eess.ASElectrical Engineering

Audio & Speech Processing

Speech recognition, audio signal processing

60.8SDApr 20
VoxSafeBench: Not Just What Is Said, but Who, How, and Where

Yuxiang Wang, Hongyu Liu, Yijiang Xu et al.

For researchers and developers of speech language models, this benchmark exposes a pervasive speech grounding gap where models recognize social norms in text but fail to apply them when cues are grounded in speech.

45.7SDJun 3
Audio Interaction Model

Zhifei Xie, Zihang Liu, Ze An et al.

This work addresses the need for a single model that can handle multiple streaming audio tasks (e.g., voice chatting, ASR) in real time, unifying capabilities that were previously separate.

43.1CLMar 23Code
TiCo: Time-Controllable Training for Spoken Dialogue Models

Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu et al.

This addresses a practical limitation for real-world spoken language systems like voice assistants, where controlling response duration can enhance interaction quality, though it is an incremental improvement.

41.9CVMay 28
Benchmarking Single-Factor Physical Video-to-Audio Generation

Tingle Li, Siddharth Gururani, Kevin J. Shih et al.

For researchers in video-to-audio generation, this work highlights the need to move beyond perceptual quality toward learning physical processes from pixels, though it is incremental in proposing a new evaluation benchmark.